Enterprise E-Commerce Architecture Blueprint (ECS/EKS)
Complete Architecture Overview
This Enterprise E-Commerce Architecture Blueprint is a cloud-native, scalable, and secure design for a global online retail platform, built on AWS with all microservices deployed as Docker containers on **ECS/EKS** clusters for robust orchestration. It includes microservices for Auth
, Catalog
, Order
, Payment
, Inventory
, and Analytics
, each running on ECS or EKS with auto-scaling and multi-AZ deployments. Client applications (web, iOS, Android) access the platform via CloudFront
with WAF
protection and API Gateway
for secure endpoints. An event-driven core using EventBridge
, SQS
, and Step Functions
orchestrates workflows like order fulfillment, inventory updates, and analytics processing. Data is managed with Aurora Global Database
, DynamoDB Global Tables
, ElastiCache
for caching, and OpenSearch
for search, ensuring low-latency and global consistency. Security is enforced via Cognito
for JWT/OAuth 2.0 authentication, KMS
for encryption, QLDB
for immutable audits, and PCI-compliant infrastructure. The architecture is detailed across eight diagrams, covering Client, API Gateway, Auth, Catalog, Order, Payment, Inventory, and Analytics layers, providing a comprehensive blueprint for enterprise-scale e-commerce.
Architecture Principles
1. Layered Isolation
- Presentation: CloudFront, Web Apps
- Edge: WAF, Route53
- API: Gateway + ECS/EKS Auth
- Services: ECS/EKS microservices
- Data: Multi-model persistence
2. Event-Driven Core
- EventBridge for state changes
- SQS for async processing
- Step Functions for workflows
- Dead-letter queues for resilience
3. Global Data
- Aurora Global Database
- DynamoDB Global Tables
- ElastiCache Multi-AZ
- S3 Cross-Region Replication
Key Performance Metrics
Component | Target SLA | Latency | Throughput |
---|---|---|---|
API Gateway | 99.95% | <50ms | 10,000 RPS |
Catalog Service | 99.99% | <100ms (cache) | 5,000 RPS |
Order Processing | 99.9% | 500ms (sync) | 1,000 TPS |
Payment Service | 99.95% | 1s (3DSecure) | 500 TPS |
1. Client Architecture
This diagram illustrates the client-side architecture, where Web App (React)
, iOS (SwiftUI)
, and Android (Kotlin)
applications interact with the platform via CloudFront
and API Gateway
. CloudFront
serves as a global CDN, caching static assets in S3
and using Lambda@Edge
for request normalization. Web apps leverage CloudFront for low-latency delivery, while mobile apps connect directly to API Gateway
, which routes requests to ECS/EKS-based microservices (Auth, Catalog, Order). The client layer connects to the API Gateway Architecture (Diagram 2) for secure interactions, with JWTs stored in HttpOnly cookies for web apps and secured via certificate pinning for mobile apps. This layer initiates requests that propagate through the Auth Service (Diagram 3) for authentication.
Key Features
Performance Optimization
- Static assets via CloudFront (TTL 1yr)
- Dynamic content with stale-while-revalidate
- Code splitting with Webpack
- Progressive Web App capabilities
Security
- CSP headers for XSS protection
- JWT in HttpOnly cookies
- Certificate pinning (mobile)
- Obfuscated API keys
// Sample React API Client with Retry const apiClient = axios.create({ baseURL: process.env.API_URL, timeout: 5000, headers: { 'Content-Type': 'application/json' } }); apiClient.interceptors.response.use(null, (error) => { if (error.config && error.response && error.response.status >= 500) { return new Promise((resolve) => { setTimeout(() => resolve(apiClient(error.config)), 1000); }); } return Promise.reject(error); }); export const getProduct = (id) => apiClient.get(`/products/${id}`, { headers: { 'Cache-Control': 'max-age=60, stale-while-revalidate=3600' } });
2. API Gateway Architecture
This diagram details the API Gateway
layer, the secure entry point for client requests from the Client Architecture (Diagram 1). It routes requests to ECS/EKS-based microservices like Auth
, Catalog
, Order
, and Payment
. The Auth Service
(Diagram 3), deployed on ECS/EKS, validates JWTs issued by Cognito User Pool
using KMS
for key management, ensuring secure authentication. WAF
protects against SQL injection, XSS, and bad bots. The gateway connects to the Catalog Service (Diagram 4), Order Processing (Diagram 5), Payment Processing (Diagram 6), Inventory (Diagram 7), and Analytics (Diagram 8), with endpoint-specific throttling and authentication (e.g., MFA for `/orders`).
Gateway Configuration
Endpoint | Throttling | Cache TTL | Auth |
---|---|---|---|
/products | 1000 RPS | 60s | Optional |
/cart | 500 RPS | 0s | Required |
/orders | 200 RPS | 0s | Required + MFA |
# Terraform for API Gateway resource "aws_api_gateway_rest_api" "ecommerce" { name = "ecommerce-api" description = "Main API Gateway for E-Commerce Platform" endpoint_configuration { types = ["REGIONAL"] } } resource "aws_api_gateway_authorizer" "jwt" { name = "jwt-authorizer" rest_api_id = aws_api_gateway_rest_api.ecommerce.id authorizer_uri = aws_ecs_service.auth_service.invoke_arn type = "TOKEN" identity_source = "method.request.header.Authorization" } resource "aws_wafv2_web_acl_association" "api" { resource_arn = aws_api_gateway_stage.prod.arn web_acl_arn = aws_wafv2_web_acl.api.arn }
3. Auth Service Deep Dive
This diagram details the Auth Service
, deployed on ECS/EKS, which handles JWT/OAuth 2.0 authentication for the platform. Invoked by the API Gateway (Diagram 2), it validates JWTs issued by Cognito User Pool
using KMS
for cryptographic key management. The service integrates with IAM Roles
for secure access to AWS resources and logs authentication events to CloudWatch
for monitoring. It supports OAuth 2.0 flows (e.g., Authorization Code Grant) and enforces MFA for sensitive endpoints like `/orders`. The Auth Service connects to all other services (Diagrams 4–8) by providing validated tokens, ensuring secure access across the architecture.
Authentication Features
Security Controls
- OAuth 2.0 with Cognito
- JWT signature validation
- MFA for sensitive endpoints
- KMS-managed keys
Observability
- CloudWatch for auth logs
- Metrics for failed attempts
- Tracing with X-Ray
- Alerting via SNS
// Sample Node.js Auth Service Handler const jwt = require('jsonwebtoken'); const AWS = require('aws-sdk'); const kms = new AWS.KMS(); exports.handler = async (event) => { const token = event.headers.Authorization.replace('Bearer ', ''); try { const jwks = await getCognitoJWKs(process.env.COGNITO_POOL_ID); const decoded = jwt.verify(token, jwks, { algorithms: ['RS256'] }); // Validate with KMS const kmsParams = { KeyId: process.env.KMS_KEY_ID, CiphertextBlob: Buffer.from(decoded.signature, 'base64') }; await kms.decrypt(kmsParams).promise(); // Log to CloudWatch console.log(`Auth success for user: ${decoded.sub}`); return { principalId: decoded.sub, policyDocument: generatePolicy('Allow', event.methodArn), context: { scope: decoded.scope } }; } catch (error) { console.error(`Auth failed: ${error.message}`); return { principalId: 'unauthorized', policyDocument: generatePolicy('Deny', event.methodArn) }; } }; function generatePolicy(effect, resource) { return { Version: '2012-10-17', Statement: [{ Action: 'execute-api:Invoke', Effect: effect, Resource: resource }] }; }
4. Catalog Service Deep Dive
This diagram focuses on the Catalog Service
, a microservice deployed on ECS/EKS, handling product listings and search. It receives authenticated requests from the API Gateway (Diagram 2) via the Auth Service (Diagram 3) and interacts with Aurora PostgreSQL
for relational data, ElastiCache (Redis)
for caching, and OpenSearch
for full-text search. The service supports Admin Console
for CRUD operations and uses Change Data Capture (CDC) via EventBridge
to trigger cache invalidation and search index updates. It connects to the Inventory Service (Diagram 7) for stock updates and the Order Processing System (Diagram 5) for order-related events, scaling containers based on demand.
Data Model
-- PostgreSQL Schema CREATE TABLE products ( id UUID PRIMARY KEY, sku VARCHAR(32) UNIQUE, name VARCHAR(255) NOT NULL, description TEXT, price DECIMAL(10,2) CHECK (price >= 0), inventory_count INTEGER DEFAULT 0, categories JSONB, attributes JSONB, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ ); -- DynamoDB Schema (for high-traffic reads) { "TableName": "ProductCache", "KeySchema": [ { "AttributeName": "pk", "KeyType": "HASH" }, // "PROD#123" { "AttributeName": "sk", "KeyType": "RANGE" } // "METADATA" ], "GlobalSecondaryIndexes": [ { "IndexName": "CategoryIndex", "KeySchema": [ { "AttributeName": "category", "KeyType": "HASH" }, { "AttributeName": "price", "KeyType": "RANGE" } ], "Projection": { "ProjectionType": "INCLUDE", "NonKeyAttributes": ["name","image"] } } ] }
Cache Strategy
Read-Through Cache
- Cache hit: 5ms response
- Cache miss: 50ms (DB + warm cache)
- TTL: 5 minutes (product data)
- TTL: 1 hour (category listings)
Invalidation Triggers
- Price changes
- Inventory updates
- Product description edits
- Scheduled midnight flush
5. Order Processing System
This diagram illustrates the Order Processing System
, with the Order Service
deployed on ECS/EKS, handling order creation and fulfillment. It receives authenticated requests from the API Gateway (Diagram 2) via the Auth Service (Diagram 3), writes to DynamoDB Orders
with ACID transactions, and emits order.placed
events via EventBridge
. Step Functions
orchestrates a saga pattern, coordinating ECS/EKS-based Inventory
, Payment
, and Logistics
services. Failures trigger a compensation flow in the Order Service to rollback changes. The system connects to the Catalog Service (Diagram 4) for product data, Inventory Service (Diagram 7) for stock reservation, and Payment Processing (Diagram 6) for transactions.
Saga Pattern Implementation
// Order Saga Compensation Handler exports.handleCompensation = async (event) => { const { orderId, failureStep } = event; // 1. Update Order Status await dynamodb.update({ TableName: 'Orders', Key: { id: orderId }, UpdateExpression: 'SET #status = :status', ExpressionAttributeNames: { '#status': 'status' }, ExpressionAttributeValues: { ':status': 'FAILED' } }); // 2. Execute Compensation Based on Failure Point switch(failureStep) { case 'PAYMENT_FAILED': await inventoryService.releaseStock(orderId); break; case 'INVENTORY_UNAVAILABLE': await paymentService.refund(orderId); break; case 'SHIPPING_FAILED': await paymentService.refund(orderId); await inventoryService.releaseStock(orderId); break; } // 3. Notify User await sns.publish({ TopicArn: process.env.NOTIFICATIONS_TOPIC, Message: JSON.stringify({ type: 'ORDER_FAILED', orderId, reason: failureStep }) }); };
Order State Transitions
6. Payment Processing Architecture
This diagram details the Payment Processing Architecture
, with the Payment Service
deployed on ECS/EKS in a PCI-Compliant VPC
. Invoked by the Order Processing System (Diagram 5) via EventBridge
, it uses a Payment Vault (HSM)
for card tokenization and integrates with external Payment Processor APIs
. KMS
handles encryption, and QLDB
maintains an immutable audit ledger. Async events from processors are processed via SQS
, with ECS tasks scaling to handle load. The service connects to the Auth Service (Diagram 3) for token validation and the Inventory Service (Diagram 7) for compensation flows, ensuring secure and compliant transactions.
Security Controls
Data Protection
- PCI-DSS Level 1 Certified
- HSM for card tokenization
- Field-level encryption
- No PAN storage in logs
Fraud Prevention
- 3D Secure 2.0
- Velocity checks
- IP geolocation
- Machine learning scoring
# Terraform for PCI-Compliant Resources resource "aws_vpc" "pci" { cidr_block = "10.1.0.0/16" enable_dns_hostnames = true tags = { Name = "PCI-VPC" Compliance = "PCI-DSS" } } resource "aws_cloudhsm_v2_cluster" "payment_hsm" { hsm_type = "hsm1.medium" subnet_ids = aws_subnet.pci[*].id tags = { Purpose = "Card Data Tokenization" } } resource "aws_kms_key" "payment" { description = "Payment Processing Key" deletion_window_in_days = 30 enable_key_rotation = true policy = data.aws_iam_policy_document.pci_kms.json tags = { Compliance = "PCI-DSS" } }
7. Inventory Service Deep Dive
This diagram details the Inventory Service
, deployed on ECS/EKS, managing stock levels and availability. It receives events from the Order Processing System (Diagram 5) via EventBridge
for stock reservation and release during saga workflows. The service interacts with DynamoDB Inventory
for scalable stock data and ElastiCache
for caching frequently accessed inventory counts. It connects to the Catalog Service (Diagram 4) for product stock updates and the Payment Service (Diagram 6) for compensation flows (e.g., releasing stock on payment failure). The service emits events to EventBridge
for analytics and notifications, scaling containers based on order volume.
Data Model
-- DynamoDB Schema for Inventory { "TableName": "Inventory", "KeySchema": [ { "AttributeName": "productId", "KeyType": "HASH" }, { "AttributeName": "warehouseId", "KeyType": "RANGE" } ], "AttributeDefinitions": [ { "AttributeName": "productId", "AttributeType": "S" }, { "AttributeName": "warehouseId", "AttributeType": "S" } ], "GlobalSecondaryIndexes": [ { "IndexName": "StockLevelIndex", "KeySchema": [ { "AttributeName": "warehouseId", "KeyType": "HASH" }, { "AttributeName": "stockLevel", "KeyType": "RANGE" } ], "Projection": { "ProjectionType": "ALL" } } ], "BillingMode": "PAY_PER_REQUEST" }
Inventory Features
Stock Management
- Atomic stock updates
- Multi-warehouse support
- Low-stock alerts
- Restocking workflows
Performance
- Cache hit: 2ms
- Cache miss: 20ms
- TTL: 10 minutes
- Event-driven sync
8. Analytics Service Deep Dive
This diagram details the Analytics Service
, deployed on ECS/EKS, processing platform events for business insights. It consumes events from EventBridge
generated by services like Catalog (Diagram 4), Order (Diagram 5), Payment (Diagram 6), and Inventory (Diagram 7), storing data in Redshift
for analytics and S3
for raw event storage. The service uses Athena
for ad-hoc queries and QuickSight
for dashboards. It scales ECS tasks based on event volume and connects to all services via EventBridge, enabling real-time and batch analytics for sales, inventory, and user behavior.
Analytics Features
Data Processing
- Real-time event ingestion
- Batch ETL pipelines
- Data partitioning
- Schema evolution
Visualization
- QuickSight dashboards
- Custom SQL queries
- User behavior tracking
- Sales forecasting
# Terraform for Analytics Resources resource "aws_redshift_cluster" "analytics" { cluster_identifier = "ecommerce-analytics" database_name = "analytics" master_username = "admin" node_type = "dc2.large" number_of_nodes = 2 publicly_accessible = false vpc_security_group_ids = [aws_security_group.redshift_sg.id] } resource "aws_s3_bucket" "analytics_raw" { bucket = "ecommerce-analytics-raw" acl = "private" versioning { enabled = true } } resource "aws_athena_workgroup" "analytics" { name = "ecommerce-analytics" configuration { result_configuration { output_location = "s3://ecommerce-analytics-raw/athena-results/" } } }
9. ECS/EKS Deployment Configuration
This section provides a sample Terraform configuration for deploying the Catalog Service
on an ECS cluster, illustrating how microservices are containerized. The configuration includes an ECS cluster, task definition, service, and ALB for routing. Similar configurations apply to other services (Auth, Order, Payment, Inventory, Analytics) on ECS or EKS, with EKS used for advanced orchestration (e.g., Payment Service in PCI-compliant VPC). Services are deployed with auto-scaling policies based on CPU/memory metrics and integrated with CloudWatch for monitoring.
# Terraform for ECS Deployment of Catalog Service provider "aws" { region = "us-west-2" } resource "aws_ecs_cluster" "ecommerce_cluster" { name = "ecommerce-cluster" tags = { Environment = "production" } } resource "aws_ecs_task_definition" "catalog_task" { family = "catalog-service" network_mode = "awsvpc" requires_compatibilities = ["FARGATE"] cpu = "256" memory = "512" execution_role_arn = aws_iam_role.ecs_task_execution_role.arn task_role_arn = aws_iam_role.ecs_task_role.arn container_definitions = jsonencode([ { name = "catalog-container" image = "123456789012.dkr.ecr.us-west-2.amazonaws.com/catalog-service:latest" essential = true portMappings = [ { containerPort = 8080 hostPort = 8080 protocol = "tcp" } ] environment = [ { name = "DB_HOST", value = aws_rds_cluster.catalog_db.endpoint }, { name = "REDIS_HOST", value = aws_elasticache_cluster.catalog_cache.cache_nodes[0].address } ] logConfiguration = { logDriver = "awslogs" options = { "awslogs-group" = "/ecs/catalog-service" "awslogs-region" = "us-west-2" "awslogs-stream-prefix" = "catalog" } } } ]) } resource "aws_ecs_service" "catalog_service" { name = "catalog-service" cluster = aws_ecs_cluster.ecommerce_cluster.id task_definition = aws_ecs_task_definition.catalog_task.arn desired_count = 2 launch_type = "FARGATE" network_configuration { subnets = ["subnet-12345678", "subnet-87654321"] security_groups = [aws_security_group.ecs_sg.id] assign_public_ip = false } load_balancer { target_group_arn = aws_lb_target_group.catalog_tg.arn container_name = "catalog-container" container_port = 8080 } } resource "aws_lb" "catalog_alb" { name = "catalog-alb" internal = false load_balancer_type = "application" security_groups = [aws_security_group.alb_sg.id] subnets = ["subnet-12345678", "subnet-87654321"] } resource "aws_lb_target_group" "catalog_tg" { name = "catalog-tg" port = 8080 protocol = "HTTP" vpc_id = "vpc-1234567890abcdef0" target_type = "ip" health_check { path = "/health" healthy_threshold = 2 unhealthy_threshold = 2 timeout = 5 interval = 30 } } resource "aws_security_group" "ecs_sg" { name = "ecs-sg" vpc_id = "vpc-1234567890abcdef0" ingress { from_port = 8080 to_port = 8080 protocol = "tcp" security_groups = [aws_security_group.alb_sg.id] } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } } resource "aws_security_group" "alb_sg" { name = "alb-sg" vpc_id = "vpc-1234567890abcdef0" ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } } resource "aws_iam_role" "ecs_task_execution_role" { name = "ecs-task-execution-role" assume_role_policy = jsonencode({ Version = "2012-10-17" Statement = [ { Action = "sts:AssumeRole" Effect = "Allow" Principal = { Service = "ecs-tasks.amazonaws.com" } } ] }) } resource "aws_iam_role_policy_attachment" "ecs_task_execution_policy" { role = aws_iam_role.ecs_task_execution_role.name policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy" }