AWS Kubernetes Orchestration with EKS
Introduction to EKS Orchestration
Amazon Elastic Kubernetes Service (EKS) is AWS's managed Kubernetes platform that orchestrates containerized applications through a complete control plane (API Server, Scheduler, Controller Manager) and worker nodes (EC2 or Fargate). The architecture manages workloads via Deployments
for stateless apps and StatefulSets
for stateful services, deployed as Pods
running application Containers
. Traffic flows through Services
and ALB Ingress
controllers, secured by IAM
roles and VPC
networking. Persistent storage options include EBS
volumes and EFS
filesystems, while observability is provided through CloudWatch
metrics and Prometheus
monitoring. The platform integrates with Secrets Manager
for credentials and supports hybrid architectures with both EC2 and serverless Fargate
compute options, offering a fully managed solution for microservices, batch processing, and machine learning workloads.
Use Cases
EKS supports a variety of cloud-native workloads:
- Microservices: Deploy decoupled services with Deployments and Ingress for scalable APIs.
- Machine Learning: Orchestrate ML pipelines with Kubeflow on EKS for training and inference.
- Stateful Applications: Run databases like PostgreSQL or MongoDB using StatefulSets with EBS.
- Batch Processing: Manage data processing jobs with Kubernetes Jobs and CronJobs.
- Hybrid Deployments: Use EKS Anywhere for consistent Kubernetes across on-premises and AWS.
EKS Architecture Diagram
The diagram illustrates an Amazon EKS cluster architecture with the following components:
Core Components
Control Plane
: Fully managed by AWS, includes API Server, Scheduler, and Controller ManagerWorker Nodes
: Runs in customerVPC
as either EC2 instances or Fargate podsDeployments
: Manage stateless applications via ReplicaSetsStatefulSets
: Manage stateful applications with persistent storagePods
: Smallest deployable units running one or moreContainers
Networking
Services
: Internal load balancing (ClusterIP) and external exposure (NodePort/LoadBalancer)ALB Ingress
: AWS Application Load Balancer for HTTP traffic routingVPC
: Provides network isolation and security groups
AWS Integrations
IAM
: Provides fine-grained access control via IRSA (IAM Roles for Service Accounts)EBS/EFS
: Block storage (EBS) and shared filesystem (EFS) for persistent volumesCloudWatch
: Centralized monitoring and loggingSecrets Manager
: Secure credential storage and rotation
Key Features
- Multi-tenant architecture with namespace isolation
- Both EC2 and Fargate compute options
- Integration with AWS security services (IAM, Secrets Manager)
- End-to-end observability (CloudWatch, Prometheus)
- High availability through multiple availability zones
Key EKS Components
EKS leverages Kubernetes components with AWS-specific integrations:
- Pods: Smallest deployable units hosting containers with shared storage and network.
- Deployments: Manage stateless pod replicas with rolling updates and auto-scaling.
- StatefulSets: Manage stateful pods with stable identities and ordered scaling for databases.
- Services: Provide stable endpoints (e.g., ClusterIP, LoadBalancer) for pods, integrated with AWS ELB.
- Ingress (ALB): Routes external traffic via AWS Application Load Balancer with path-based routing and TLS.
- EKS Control Plane: Managed API Server, Scheduler, Controller Manager, and etcd for cluster orchestration.
- Worker Nodes: EC2 instances or Auto Scaling Groups running pods, managed via node groups.
- IAM Integration: IAM Roles for Service Accounts (IRSA) secure pod access to AWS services.
- VPC Networking: AWS VPC CNI plugin assigns pod IPs from VPC subnets for seamless networking.
- Storage: EBS and EFS provide persistent volumes via CSI drivers and StorageClasses.
- Observability: CloudWatch, Prometheus, and X-Ray monitor cluster and application metrics/logs.
Benefits of EKS Orchestration
EKS delivers significant advantages for cloud-native applications:
- Managed Control Plane: AWS ensures high availability, upgrades, and patching of Kubernetes masters.
- Dynamic Scaling: Horizontal Pod Autoscaler and Cluster Autoscaler adjust pods and nodes based on demand.
- Security: IRSA, RBAC, Network Policies, and VPC isolation secure workloads.
- AWS Integration: Native support for S3, RDS, SQS, and other services via SDKs and IRSA.
- Self-Healing: Automatically restarts, reschedules, or replaces failed pods/nodes.
- Observability: Comprehensive monitoring with CloudWatch, Prometheus, and X-Ray.
- Portability: Kubernetes standards and EKS Anywhere enable multi-cloud and hybrid deployments.
- Developer Productivity: Managed infrastructure reduces operational burden for developers.
Implementation Considerations
Deploying EKS effectively requires addressing key considerations:
- Cluster Sizing: Configure node groups and Cluster Autoscaler for workload variability.
- Security Hardening: Use IRSA, RBAC, Network Policies, Pod Security Standards, and KMS encryption.
- Networking Design: Plan VPC subnets, CNI settings, and security groups for pod communication.
- Resource Optimization: Set pod requests/limits, use spot instances, and apply Savings Plans.
- Observability Setup: Deploy Prometheus/Grafana, CloudWatch, and X-Ray for metrics, logs, and traces.
- CI/CD Integration: Use ArgoCD, GitHub Actions, or CodePipeline for automated deployments.
- Storage Management: Configure StorageClasses for EBS/EFS and manage persistent volume claims.
- Cost Monitoring: Track usage with AWS Cost Explorer and optimize node types/sizes.
- Resilience Testing: Use chaos engineering (e.g., Chaos Mesh) to validate failover mechanisms.
- Compliance Requirements: Enable CloudTrail, encrypt data, and align with standards like SOC 2 or HIPAA.
Troubleshooting Common EKS Issues
Common EKS issues and their resolutions:
- Pod Pending: Check node capacity, resource limits, or CNI IP exhaustion; scale nodes or adjust VPC CNI settings.
- Access Denied: Verify IRSA or RBAC configurations; ensure IAM roles have correct trust policies.
- Network Errors: Inspect security groups, Network Policies, or VPC routing; validate CNI plugin.
- High Latency: Analyze X-Ray traces or Prometheus metrics; optimize pod resources or scale nodes.
- Storage Failures: Confirm EBS CSI driver installation and StorageClass definitions; check volume attachments.
Example Configuration: StatefulSet for MongoDB
Below is a Kubernetes StatefulSet
, Service
, and StorageClass
for a MongoDB deployment on EKS.
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ebs-sc provisioner: ebs.csi.aws.com parameters: type: gp3 encrypted: "true" reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer --- apiVersion: v1 kind: Service metadata: name: mongodb-service namespace: default spec: selector: app: mongodb ports: - protocol: TCP port: 27017 targetPort: 27017 clusterIP: None --- apiVersion: apps/v1 kind: StatefulSet metadata: name: mongodb namespace: default spec: serviceName: mongodb-service replicas: 3 selector: matchLabels: app: mongodb template: metadata: labels: app: mongodb spec: containers: - name: mongodb image: mongo:5.0 ports: - containerPort: 27017 resources: requests: cpu: "200m" memory: "256Mi" limits: cpu: "1000m" memory: "1Gi" volumeMounts: - name: mongo-data mountPath: /data/db livenessProbe: exec: command: ["mongo", "--eval", "db.adminCommand('ping')"] initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: exec: command: ["mongo", "--eval", "db.adminCommand('ping')"] initialDelaySeconds: 5 periodSeconds: 5 serviceAccountName: mongodb-sa volumeClaimTemplates: - metadata: name: mongo-data spec: accessModes: ["ReadWriteOnce"] storageClassName: ebs-sc resources: requests: storage: 10Gi
Example Configuration: Horizontal Pod Autoscaler
Below is a Kubernetes HorizontalPodAutoscaler
configuration to scale a Deployment based on CPU usage.
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app-hpa namespace: default spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app-deployment minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70
Example Configuration: IAM Role for Service Account (IRSA)
Below is a Terraform configuration to create an IAM role for an EKS service account.
provider "aws" { region = "us-west-2" } data "aws_eks_cluster" "eks_cluster" { name = "my-eks-cluster" } data "aws_caller_identity" "current" {} resource "aws_iam_role" "my_app_sa_role" { name = "my-app-sa-role" assume_role_policy = jsonencode({ Version = "2012-10-17" Statement = [ { Effect = "Allow" Principal = { Federated = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${replace(data.aws_eks_cluster.eks_cluster.identity[0].oidc[0].issuer, "https://", "")}" } Action = "sts:AssumeRoleWithWebIdentity" Condition = { StringEquals = { "${replace(data.aws_eks_cluster.eks_cluster.identity[0].oidc[0].issuer, "https://", "")}:sub" = "system:serviceaccount:default:my-app-sa" } } } ] }) } resource "aws_iam_role_policy" "my_app_sa_policy" { name = "my-app-sa-policy" role = aws_iam_role.my_app_sa_role.id policy = jsonencode({ Version = "2012-10-17" Statement = [ { Effect = "Allow" Action = [ "s3:GetObject", "s3:PutObject" ] Resource = "arn:aws:s3:::my-app-bucket/*" } ] }) } resource "kubernetes_service_account" "my_app_sa" { provider = kubernetes metadata { name = "my-app-sa" namespace = "default" annotations = { "eks.amazonaws.com/role-arn" = aws_iam_role.my_app_sa_role.arn } } }
Example Configuration: Cluster Autoscaler
Below is a Kubernetes configuration to deploy the Cluster Autoscaler on EKS.
apiVersion: v1 kind: ServiceAccount metadata: name: cluster-autoscaler namespace: kube-system annotations: eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/ClusterAutoscalerRole --- apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system spec: replicas: 1 selector: matchLabels: app: cluster-autoscaler template: metadata: labels: app: cluster-autoscaler spec: serviceAccountName: cluster-autoscaler containers: - name: cluster-autoscaler image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0 resources: requests: cpu: "100m" memory: "300Mi" limits: cpu: "500m" memory: "600Mi" command: - ./cluster-autoscaler - --v=4 - --cloud-provider=aws - --aws-region=us-west-2 - --expander=least-waste - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-eks-cluster env: - name: AWS_REGION value: us-west-2
Example Configuration: CI/CD with GitHub Actions
Below is a GitHub Actions workflow to deploy a Kubernetes application to EKS.
name: Deploy to EKS on: push: branches: - main jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v2 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: us-west-2 - name: Update kubeconfig run: aws eks update-kubeconfig --name my-eks-cluster --region us-west-2 - name: Deploy to EKS run: kubectl apply -f k8s/deployment.yaml
Example Configuration: Network Policy
Below is a Kubernetes NetworkPolicy
to restrict pod traffic.
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: my-app-network-policy namespace: default spec: podSelector: matchLabels: app: my-app policyTypes: - Ingress - Egress ingress: - from: - podSelector: matchLabels: app: frontend ports: - protocol: TCP port: 8080 egress: - to: - podSelector: matchLabels: app: mongodb ports: - protocol: TCP port: 27017
Example Configuration: CloudWatch and Prometheus Monitoring
Below is a Helm values file to deploy Prometheus for EKS monitoring, integrated with CloudWatch.
prometheus: serviceMonitor: enabled: true prometheusSpec: serviceMonitorSelectorNilUsesHelmValues: false additionalScrapeConfigs: - job_name: kubernetes-pods kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - job_name: cloudwatch static_configs: - targets: ['cloudwatch.amazonaws.com'] metrics_path: /metrics scheme: https authorization: credentials: ${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY} params: region: [us-west-2] grafana: enabled: true adminPassword: admin service: type: LoadBalancer