Enterprise E-Commerce Architecture Blueprint (ECS/EKS)
Complete Architecture Overview
This Enterprise E-Commerce Architecture Blueprint is a cloud-native, scalable, and secure design for a global online retail platform, built on AWS with all microservices deployed as Docker containers on **ECS/EKS** clusters for robust orchestration. It includes microservices for Auth, Catalog, Order, Payment, Inventory, and Analytics, each running on ECS or EKS with auto-scaling and multi-AZ deployments. Client applications (web, iOS, Android) access the platform via CloudFront with WAF protection and API Gateway for secure endpoints. An event-driven core using EventBridge, SQS, and Step Functions orchestrates workflows like order fulfillment, inventory updates, and analytics processing. Data is managed with Aurora Global Database, DynamoDB Global Tables, ElastiCache for caching, and OpenSearch for search, ensuring low-latency and global consistency. Security is enforced via Cognito for JWT/OAuth 2.0 authentication, KMS for encryption, QLDB for immutable audits, and PCI-compliant infrastructure. The architecture is detailed across eight diagrams, covering Client, API Gateway, Auth, Catalog, Order, Payment, Inventory, and Analytics layers, providing a comprehensive blueprint for enterprise-scale e-commerce.
Architecture Principles
1. Layered Isolation
- Presentation: CloudFront, Web Apps
- Edge: WAF, Route53
- API: Gateway + ECS/EKS Auth
- Services: ECS/EKS microservices
- Data: Multi-model persistence
2. Event-Driven Core
- EventBridge for state changes
- SQS for async processing
- Step Functions for workflows
- Dead-letter queues for resilience
3. Global Data
- Aurora Global Database
- DynamoDB Global Tables
- ElastiCache Multi-AZ
- S3 Cross-Region Replication
Key Performance Metrics
| Component | Target SLA | Latency | Throughput |
|---|---|---|---|
| API Gateway | 99.95% | <50ms | 10,000 RPS |
| Catalog Service | 99.99% | <100ms (cache) | 5,000 RPS |
| Order Processing | 99.9% | 500ms (sync) | 1,000 TPS |
| Payment Service | 99.95% | 1s (3DSecure) | 500 TPS |
1. Client Architecture
This diagram illustrates the client-side architecture, where Web App (React), iOS (SwiftUI), and Android (Kotlin) applications interact with the platform via CloudFront and API Gateway. CloudFront serves as a global CDN, caching static assets in S3 and using Lambda@Edge for request normalization. Web apps leverage CloudFront for low-latency delivery, while mobile apps connect directly to API Gateway, which routes requests to ECS/EKS-based microservices (Auth, Catalog, Order). The client layer connects to the API Gateway Architecture (Diagram 2) for secure interactions, with JWTs stored in HttpOnly cookies for web apps and secured via certificate pinning for mobile apps. This layer initiates requests that propagate through the Auth Service (Diagram 3) for authentication.
Key Features
Performance Optimization
- Static assets via CloudFront (TTL 1yr)
- Dynamic content with stale-while-revalidate
- Code splitting with Webpack
- Progressive Web App capabilities
Security
- CSP headers for XSS protection
- JWT in HttpOnly cookies
- Certificate pinning (mobile)
- Obfuscated API keys
// Sample React API Client with Retry
const apiClient = axios.create({
baseURL: process.env.API_URL,
timeout: 5000,
headers: {
'Content-Type': 'application/json'
}
});
apiClient.interceptors.response.use(null, (error) => {
if (error.config && error.response && error.response.status >= 500) {
return new Promise((resolve) => {
setTimeout(() => resolve(apiClient(error.config)), 1000);
});
}
return Promise.reject(error);
});
export const getProduct = (id) =>
apiClient.get(`/products/${id}`, {
headers: {
'Cache-Control': 'max-age=60, stale-while-revalidate=3600'
}
});
2. API Gateway Architecture
This diagram details the API Gateway layer, the secure entry point for client requests from the Client Architecture (Diagram 1). It routes requests to ECS/EKS-based microservices like Auth, Catalog, Order, and Payment. The Auth Service (Diagram 3), deployed on ECS/EKS, validates JWTs issued by Cognito User Pool using KMS for key management, ensuring secure authentication. WAF protects against SQL injection, XSS, and bad bots. The gateway connects to the Catalog Service (Diagram 4), Order Processing (Diagram 5), Payment Processing (Diagram 6), Inventory (Diagram 7), and Analytics (Diagram 8), with endpoint-specific throttling and authentication (e.g., MFA for `/orders`).
Gateway Configuration
| Endpoint | Throttling | Cache TTL | Auth |
|---|---|---|---|
| /products | 1000 RPS | 60s | Optional |
| /cart | 500 RPS | 0s | Required |
| /orders | 200 RPS | 0s | Required + MFA |
# Terraform for API Gateway
resource "aws_api_gateway_rest_api" "ecommerce" {
name = "ecommerce-api"
description = "Main API Gateway for E-Commerce Platform"
endpoint_configuration {
types = ["REGIONAL"]
}
}
resource "aws_api_gateway_authorizer" "jwt" {
name = "jwt-authorizer"
rest_api_id = aws_api_gateway_rest_api.ecommerce.id
authorizer_uri = aws_ecs_service.auth_service.invoke_arn
type = "TOKEN"
identity_source = "method.request.header.Authorization"
}
resource "aws_wafv2_web_acl_association" "api" {
resource_arn = aws_api_gateway_stage.prod.arn
web_acl_arn = aws_wafv2_web_acl.api.arn
}
3. Auth Service Deep Dive
This diagram details the Auth Service, deployed on ECS/EKS, which handles JWT/OAuth 2.0 authentication for the platform. Invoked by the API Gateway (Diagram 2), it validates JWTs issued by Cognito User Pool using KMS for cryptographic key management. The service integrates with IAM Roles for secure access to AWS resources and logs authentication events to CloudWatch for monitoring. It supports OAuth 2.0 flows (e.g., Authorization Code Grant) and enforces MFA for sensitive endpoints like `/orders`. The Auth Service connects to all other services (Diagrams 4–8) by providing validated tokens, ensuring secure access across the architecture.
Authentication Features
Security Controls
- OAuth 2.0 with Cognito
- JWT signature validation
- MFA for sensitive endpoints
- KMS-managed keys
Observability
- CloudWatch for auth logs
- Metrics for failed attempts
- Tracing with X-Ray
- Alerting via SNS
// Sample Node.js Auth Service Handler
const jwt = require('jsonwebtoken');
const AWS = require('aws-sdk');
const kms = new AWS.KMS();
exports.handler = async (event) => {
const token = event.headers.Authorization.replace('Bearer ', '');
try {
const jwks = await getCognitoJWKs(process.env.COGNITO_POOL_ID);
const decoded = jwt.verify(token, jwks, { algorithms: ['RS256'] });
// Validate with KMS
const kmsParams = {
KeyId: process.env.KMS_KEY_ID,
CiphertextBlob: Buffer.from(decoded.signature, 'base64')
};
await kms.decrypt(kmsParams).promise();
// Log to CloudWatch
console.log(`Auth success for user: ${decoded.sub}`);
return {
principalId: decoded.sub,
policyDocument: generatePolicy('Allow', event.methodArn),
context: { scope: decoded.scope }
};
} catch (error) {
console.error(`Auth failed: ${error.message}`);
return {
principalId: 'unauthorized',
policyDocument: generatePolicy('Deny', event.methodArn)
};
}
};
function generatePolicy(effect, resource) {
return {
Version: '2012-10-17',
Statement: [{
Action: 'execute-api:Invoke',
Effect: effect,
Resource: resource
}]
};
}
4. Catalog Service Deep Dive
This diagram focuses on the Catalog Service, a microservice deployed on ECS/EKS, handling product listings and search. It receives authenticated requests from the API Gateway (Diagram 2) via the Auth Service (Diagram 3) and interacts with Aurora PostgreSQL for relational data, ElastiCache (Redis) for caching, and OpenSearch for full-text search. The service supports Admin Console for CRUD operations and uses Change Data Capture (CDC) via EventBridge to trigger cache invalidation and search index updates. It connects to the Inventory Service (Diagram 7) for stock updates and the Order Processing System (Diagram 5) for order-related events, scaling containers based on demand.
Data Model
-- PostgreSQL Schema
CREATE TABLE products (
id UUID PRIMARY KEY,
sku VARCHAR(32) UNIQUE,
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10,2) CHECK (price >= 0),
inventory_count INTEGER DEFAULT 0,
categories JSONB,
attributes JSONB,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ
);
-- DynamoDB Schema (for high-traffic reads)
{
"TableName": "ProductCache",
"KeySchema": [
{ "AttributeName": "pk", "KeyType": "HASH" }, // "PROD#123"
{ "AttributeName": "sk", "KeyType": "RANGE" } // "METADATA"
],
"GlobalSecondaryIndexes": [
{
"IndexName": "CategoryIndex",
"KeySchema": [
{ "AttributeName": "category", "KeyType": "HASH" },
{ "AttributeName": "price", "KeyType": "RANGE" }
],
"Projection": { "ProjectionType": "INCLUDE", "NonKeyAttributes": ["name","image"] }
}
]
}
Cache Strategy
Read-Through Cache
- Cache hit: 5ms response
- Cache miss: 50ms (DB + warm cache)
- TTL: 5 minutes (product data)
- TTL: 1 hour (category listings)
Invalidation Triggers
- Price changes
- Inventory updates
- Product description edits
- Scheduled midnight flush
5. Order Processing System
This diagram illustrates the Order Processing System, with the Order Service deployed on ECS/EKS, handling order creation and fulfillment. It receives authenticated requests from the API Gateway (Diagram 2) via the Auth Service (Diagram 3), writes to DynamoDB Orders with ACID transactions, and emits order.placed events via EventBridge. Step Functions orchestrates a saga pattern, coordinating ECS/EKS-based Inventory, Payment, and Logistics services. Failures trigger a compensation flow in the Order Service to rollback changes. The system connects to the Catalog Service (Diagram 4) for product data, Inventory Service (Diagram 7) for stock reservation, and Payment Processing (Diagram 6) for transactions.
Saga Pattern Implementation
// Order Saga Compensation Handler
exports.handleCompensation = async (event) => {
const { orderId, failureStep } = event;
// 1. Update Order Status
await dynamodb.update({
TableName: 'Orders',
Key: { id: orderId },
UpdateExpression: 'SET #status = :status',
ExpressionAttributeNames: { '#status': 'status' },
ExpressionAttributeValues: { ':status': 'FAILED' }
});
// 2. Execute Compensation Based on Failure Point
switch(failureStep) {
case 'PAYMENT_FAILED':
await inventoryService.releaseStock(orderId);
break;
case 'INVENTORY_UNAVAILABLE':
await paymentService.refund(orderId);
break;
case 'SHIPPING_FAILED':
await paymentService.refund(orderId);
await inventoryService.releaseStock(orderId);
break;
}
// 3. Notify User
await sns.publish({
TopicArn: process.env.NOTIFICATIONS_TOPIC,
Message: JSON.stringify({
type: 'ORDER_FAILED',
orderId,
reason: failureStep
})
});
};
Order State Transitions
6. Payment Processing Architecture
This diagram details the Payment Processing Architecture, with the Payment Service deployed on ECS/EKS in a PCI-Compliant VPC. Invoked by the Order Processing System (Diagram 5) via EventBridge, it uses a Payment Vault (HSM) for card tokenization and integrates with external Payment Processor APIs. KMS handles encryption, and QLDB maintains an immutable audit ledger. Async events from processors are processed via SQS, with ECS tasks scaling to handle load. The service connects to the Auth Service (Diagram 3) for token validation and the Inventory Service (Diagram 7) for compensation flows, ensuring secure and compliant transactions.
Security Controls
Data Protection
- PCI-DSS Level 1 Certified
- HSM for card tokenization
- Field-level encryption
- No PAN storage in logs
Fraud Prevention
- 3D Secure 2.0
- Velocity checks
- IP geolocation
- Machine learning scoring
# Terraform for PCI-Compliant Resources
resource "aws_vpc" "pci" {
cidr_block = "10.1.0.0/16"
enable_dns_hostnames = true
tags = {
Name = "PCI-VPC"
Compliance = "PCI-DSS"
}
}
resource "aws_cloudhsm_v2_cluster" "payment_hsm" {
hsm_type = "hsm1.medium"
subnet_ids = aws_subnet.pci[*].id
tags = {
Purpose = "Card Data Tokenization"
}
}
resource "aws_kms_key" "payment" {
description = "Payment Processing Key"
deletion_window_in_days = 30
enable_key_rotation = true
policy = data.aws_iam_policy_document.pci_kms.json
tags = {
Compliance = "PCI-DSS"
}
}
7. Inventory Service Deep Dive
This diagram details the Inventory Service, deployed on ECS/EKS, managing stock levels and availability. It receives events from the Order Processing System (Diagram 5) via EventBridge for stock reservation and release during saga workflows. The service interacts with DynamoDB Inventory for scalable stock data and ElastiCache for caching frequently accessed inventory counts. It connects to the Catalog Service (Diagram 4) for product stock updates and the Payment Service (Diagram 6) for compensation flows (e.g., releasing stock on payment failure). The service emits events to EventBridge for analytics and notifications, scaling containers based on order volume.
Data Model
-- DynamoDB Schema for Inventory
{
"TableName": "Inventory",
"KeySchema": [
{ "AttributeName": "productId", "KeyType": "HASH" },
{ "AttributeName": "warehouseId", "KeyType": "RANGE" }
],
"AttributeDefinitions": [
{ "AttributeName": "productId", "AttributeType": "S" },
{ "AttributeName": "warehouseId", "AttributeType": "S" }
],
"GlobalSecondaryIndexes": [
{
"IndexName": "StockLevelIndex",
"KeySchema": [
{ "AttributeName": "warehouseId", "KeyType": "HASH" },
{ "AttributeName": "stockLevel", "KeyType": "RANGE" }
],
"Projection": { "ProjectionType": "ALL" }
}
],
"BillingMode": "PAY_PER_REQUEST"
}
Inventory Features
Stock Management
- Atomic stock updates
- Multi-warehouse support
- Low-stock alerts
- Restocking workflows
Performance
- Cache hit: 2ms
- Cache miss: 20ms
- TTL: 10 minutes
- Event-driven sync
8. Analytics Service Deep Dive
This diagram details the Analytics Service, deployed on ECS/EKS, processing platform events for business insights. It consumes events from EventBridge generated by services like Catalog (Diagram 4), Order (Diagram 5), Payment (Diagram 6), and Inventory (Diagram 7), storing data in Redshift for analytics and S3 for raw event storage. The service uses Athena for ad-hoc queries and QuickSight for dashboards. It scales ECS tasks based on event volume and connects to all services via EventBridge, enabling real-time and batch analytics for sales, inventory, and user behavior.
Analytics Features
Data Processing
- Real-time event ingestion
- Batch ETL pipelines
- Data partitioning
- Schema evolution
Visualization
- QuickSight dashboards
- Custom SQL queries
- User behavior tracking
- Sales forecasting
# Terraform for Analytics Resources
resource "aws_redshift_cluster" "analytics" {
cluster_identifier = "ecommerce-analytics"
database_name = "analytics"
master_username = "admin"
node_type = "dc2.large"
number_of_nodes = 2
publicly_accessible = false
vpc_security_group_ids = [aws_security_group.redshift_sg.id]
}
resource "aws_s3_bucket" "analytics_raw" {
bucket = "ecommerce-analytics-raw"
acl = "private"
versioning {
enabled = true
}
}
resource "aws_athena_workgroup" "analytics" {
name = "ecommerce-analytics"
configuration {
result_configuration {
output_location = "s3://ecommerce-analytics-raw/athena-results/"
}
}
}
9. ECS/EKS Deployment Configuration
This section provides a sample Terraform configuration for deploying the Catalog Service on an ECS cluster, illustrating how microservices are containerized. The configuration includes an ECS cluster, task definition, service, and ALB for routing. Similar configurations apply to other services (Auth, Order, Payment, Inventory, Analytics) on ECS or EKS, with EKS used for advanced orchestration (e.g., Payment Service in PCI-compliant VPC). Services are deployed with auto-scaling policies based on CPU/memory metrics and integrated with CloudWatch for monitoring.
# Terraform for ECS Deployment of Catalog Service
provider "aws" {
region = "us-west-2"
}
resource "aws_ecs_cluster" "ecommerce_cluster" {
name = "ecommerce-cluster"
tags = {
Environment = "production"
}
}
resource "aws_ecs_task_definition" "catalog_task" {
family = "catalog-service"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "256"
memory = "512"
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
task_role_arn = aws_iam_role.ecs_task_role.arn
container_definitions = jsonencode([
{
name = "catalog-container"
image = "123456789012.dkr.ecr.us-west-2.amazonaws.com/catalog-service:latest"
essential = true
portMappings = [
{
containerPort = 8080
hostPort = 8080
protocol = "tcp"
}
]
environment = [
{ name = "DB_HOST", value = aws_rds_cluster.catalog_db.endpoint },
{ name = "REDIS_HOST", value = aws_elasticache_cluster.catalog_cache.cache_nodes[0].address }
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = "/ecs/catalog-service"
"awslogs-region" = "us-west-2"
"awslogs-stream-prefix" = "catalog"
}
}
}
])
}
resource "aws_ecs_service" "catalog_service" {
name = "catalog-service"
cluster = aws_ecs_cluster.ecommerce_cluster.id
task_definition = aws_ecs_task_definition.catalog_task.arn
desired_count = 2
launch_type = "FARGATE"
network_configuration {
subnets = ["subnet-12345678", "subnet-87654321"]
security_groups = [aws_security_group.ecs_sg.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.catalog_tg.arn
container_name = "catalog-container"
container_port = 8080
}
}
resource "aws_lb" "catalog_alb" {
name = "catalog-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb_sg.id]
subnets = ["subnet-12345678", "subnet-87654321"]
}
resource "aws_lb_target_group" "catalog_tg" {
name = "catalog-tg"
port = 8080
protocol = "HTTP"
vpc_id = "vpc-1234567890abcdef0"
target_type = "ip"
health_check {
path = "/health"
healthy_threshold = 2
unhealthy_threshold = 2
timeout = 5
interval = 30
}
}
resource "aws_security_group" "ecs_sg" {
name = "ecs-sg"
vpc_id = "vpc-1234567890abcdef0"
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.alb_sg.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "alb_sg" {
name = "alb-sg"
vpc_id = "vpc-1234567890abcdef0"
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_iam_role" "ecs_task_execution_role" {
name = "ecs-task-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "ecs_task_execution_policy" {
role = aws_iam_role.ecs_task_execution_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
