DEV Community

Cover image for Security Considerations for Multi-Cluster Cloud Architecture (HA EKS with Databases)
Nowsath for AWS Community Builders

Posted on • Edited on

Security Considerations for Multi-Cluster Cloud Architecture (HA EKS with Databases)

Running a highly available multi-cluster EKS architecture brings powerful benefits—zero downtime, disaster recovery, and global scalability. But it also multiplies your security challenges.

Securing a single EKS cluster is already complex. Add multiple clusters across regions, databases with sensitive data, and cross-cluster communication, and the attack surface grows significantly. One misconfigured security group or exposed secret can compromise your entire infrastructure.

This guide covers essential security considerations for multi-cluster architectures: network isolation, encryption, IAM management, secrets handling, and incident response. We'll focus on practical measures that protect your infrastructure without sacrificing performance or availability.

Let's build a secure, highly available system.

1. Network Security & Isolation

VPC Architecture

  • Separate VPCs per cluster or use shared VPC with isolated subnets
  • Private subnets for EKS nodes and databases (no direct internet access)
  • Public subnets only for load balancers and NAT gateways
  • Implement VPC peering or AWS Transit Gateway for inter-cluster communication
  • Use separate VPCs per environment (dev, staging, production)

Network Segmentation

  • Production VPC-1 (Region A)

    • Public Subnets: ALB only
    • Private Subnets: EKS Nodes
    • Database Subnets: RDS/Aurora (isolated)
  • Production VPC-2 (Region B)

    • Public Subnets: ALB only
    • Private Subnets: EKS Nodes
    • Database Subnets: RDS/Aurora (isolated)

Security Groups

  • Principle of least privilege - only allow necessary ports
  • Database security groups: Only allow traffic from EKS node security groups
  • EKS control plane: Restrict API access to specific CIDR ranges
  • No 0.0.0.0/0 rules except for outbound NAT traffic
  • Document and regularly audit security group rules

Network Policies

apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: deny-all-default spec: podSelector: {} policyTypes: - Ingress - Egress --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-app-to-db spec: podSelector: matchLabels: app: backend egress: - to: - namespaceSelector: matchLabels: name: database ports: - protocol: TCP port: 5432 
Enter fullscreen mode Exit fullscreen mode

2. Identity & Access Management (IAM)

Cluster Access Control

  • AWS IAM authentication for cluster access via aws-auth ConfigMap
  • Never use permanent credentials in pods or containers
  • Implement IAM Roles for Service Accounts (IRSA) for pod-level permissions
  • Use AWS SSO/IAM Identity Center for human access
  • Separate IAM roles for different teams/applications
  • Enable MFA for all human users

IRSA (IAM Roles for Service Accounts)

apiVersion: v1 kind: ServiceAccount metadata: name: app-sa annotations: eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/app-role --- apiVersion: apps/v1 kind: Deployment spec: template: spec: serviceAccountName: app-sa containers: - name: app image: myapp:latest 
Enter fullscreen mode Exit fullscreen mode

Kubernetes RBAC

  • Role-Based Access Control (RBAC) for fine-grained permissions
  • Namespace isolation - separate namespaces per team/application
  • Principle of least privilege - minimal permissions needed
  • ClusterRoles for cluster-wide resources (use sparingly)
  • Roles for namespace-scoped resources
  • Regular RBAC audits
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: developer namespace: app-prod rules: - apiGroups: ["", "apps"] resources: ["pods", "deployments", "services"] verbs: ["get", "list", "watch"] # Read-only access, no delete/update 
Enter fullscreen mode Exit fullscreen mode

Database Access

  • IAM Database Authentication for RDS/Aurora (where possible)
  • Avoid hardcoded credentials - use Secrets Manager or Parameter Store
  • Rotate credentials regularly (automated rotation)
  • Separate database users per application/service
  • Read-only replicas for non-critical workloads

3. Secrets Management

Never Store Secrets in Code or ConfigMaps
❌ Bad: Secrets in environment variables or ConfigMaps
✅ Good: External secrets management

AWS Secrets Manager / Parameter Store

  • Use External Secrets Operator or Secrets Store CSI Driver
  • Automatic rotation enabled
  • Encryption at rest with KMS
  • Audit access via CloudTrail
apiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: name: db-credentials spec: refreshInterval: 1h secretStoreRef: name: aws-secrets-manager target: name: db-secret data: - secretKey: password remoteRef: key: prod/db/password 
Enter fullscreen mode Exit fullscreen mode

Alternative: HashiCorp Vault

  • Centralized secrets management across clusters
  • Dynamic secrets generation
  • Lease-based credentials
  • Fine-grained access policies

4. Encryption

Data at Rest

  • EKS etcd encryption using AWS KMS
  • EBS volumes encrypted (gp3 with KMS)
  • RDS/Aurora encryption enabled with KMS
  • S3 encryption (SSE-S3 or SSE-KMS)
  • Use customer-managed KMS keys for compliance requirements
  • Separate KMS keys per environment/cluster

Data in Transit

  • TLS/SSL everywhere:

    • ALB → Pods (via Ingress with TLS)
    • Pod → Pod (service mesh or mTLS)
    • Application → Database (SSL/TLS enforced)
  • Certificate management with AWS Certificate Manager or cert-manager

  • mTLS with service mesh (Istio, Linkerd, AWS App Mesh)

apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: secure-ingress annotations: alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:... alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-2-2017-01 spec: tls: - hosts: - app.example.com 
Enter fullscreen mode Exit fullscreen mode

5. Pod Security

Pod Security Standards

  • Enforce restricted pod security standards
  • No privileged containers unless absolutely necessary
  • Read-only root filesystem where possible
  • Non-root users for containers
  • Drop all capabilities and add only required ones
apiVersion: v1 kind: Pod metadata: name: secure-pod spec: securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 2000 seccompProfile: type: RuntimeDefault containers: - name: app image: myapp:latest securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: - ALL 
Enter fullscreen mode Exit fullscreen mode

Image Security

  • Scan images for vulnerabilities (Amazon ECR scanning, Trivy, Snyk)
  • Use minimal base images (distroless, Alpine)
  • Pin image versions - never use :latest
  • Sign and verify images (Sigstore/Cosign, Notary)
  • Private container registry (Amazon ECR with VPC endpoints)
  • Image pull secrets for private registries

Admission Controllers

  • OPA/Gatekeeper or Kyverno for policy enforcement
  • Enforce security policies:
  • No privileged pods
  • Required resource limits
  • Approved registries only
  • Required security contexts
apiVersion: kyverno.io/v1 kind: ClusterPolicy metadata: name: disallow-privileged spec: validationFailureAction: enforce rules: - name: check-privileged match: resources: kinds: - Pod validate: message: "Privileged containers are not allowed" pattern: spec: containers: - securityContext: privileged: false 
Enter fullscreen mode Exit fullscreen mode

6. Multi-Cluster Security

Cluster Isolation

  • Separate clusters for different security zones (DMZ, internal, data)
  • Separate clusters per environment (never share prod and non-prod)
  • Separate AWS accounts per environment (AWS Organizations)
  • Service Control Policies (SCPs) to restrict actions at account level

Cross-Cluster Communication

  • Service mesh for secure cross-cluster communication (Istio multi-cluster)
  • VPC peering or Transit Gateway with strict security groups
  • mTLS for service-to-service authentication
  • API Gateway or Internal Load Balancer as entry points
  • Zero-trust networking - verify every request

DNS Security

  • Private Route53 hosted zones for internal services
  • DNSSEC where applicable
  • Avoid DNS-based service discovery across clusters (security risk)

7. Database Security

RDS/Aurora Security

  • Multi-AZ deployment for availability
  • Private subnets only - no public access
  • Encryption at rest (KMS) and in transit (SSL/TLS enforce)
  • Automated backups with encryption
  • Point-in-time recovery enabled
  • Enhanced monitoring enabled
  • Performance Insights with encryption

Connection Security

  • RDS Proxy for connection pooling and IAM authentication
  • SSL/TLS enforcement on database side
  • Certificate validation on client side
  • No hardcoded connection strings
# Database connection with SSL DATABASE_URL: "postgresql://user@host:5432/db?sslmode=require" 
Enter fullscreen mode Exit fullscreen mode

Database Access Control

  • Separate database users per service
  • Minimal privileges (SELECT only for read-only services)
  • No superuser access from applications
  • Parameter groups to enforce security settings
  • Audit logging enabled (PostgreSQL pgaudit, MySQL audit log)

Database Activity Monitoring

  • AWS Database Activity Streams for real-time monitoring
  • Alert on suspicious queries or access patterns
  • Log all DDL and privilege changes

8. Logging & Monitoring

Comprehensive Logging

  • EKS Control Plane logs to CloudWatch (API, audit, authenticator)
  • Application logs via FluentBit/Fluentd to centralized location
  • Database logs (query logs, error logs, slow query logs)
  • VPC Flow Logs for network traffic analysis
  • CloudTrail for all API calls
  • Immutable logs - prevent tampering

Security Monitoring

  • Amazon GuardDuty - threat detection
  • AWS Security Hub - centralized security findings
  • Amazon Detective - security investigation
  • Falco - runtime security monitoring in Kubernetes
  • Prometheus + Grafana for metrics and alerting

Audit Logging

# Enable EKS audit logs apiVersion: v1 kind: ConfigMap metadata: name: audit-policy data: policy.yaml: | apiVersion: audit.k8s.io/v1 kind: Policy rules: - level: Metadata omitStages: - RequestReceived 
Enter fullscreen mode Exit fullscreen mode

Critical Alerts

  • Failed authentication attempts
  • Privileged container creation
  • Security group changes
  • Database connection failures
  • Unusual API calls
  • Resource exhaustion

9. Compliance & Governance

Compliance Frameworks

  • AWS Config - track configuration changes
  • AWS Audit Manager - compliance reporting
  • CIS Kubernetes Benchmark - security hardening
  • PCI-DSS, HIPAA, SOC 2 compliance where required
  • Regular penetration testing

Policy as Code

  • AWS Organizations with SCPs
  • CloudFormation/Terraform for infrastructure
  • GitOps for cluster configuration (ArgoCD/FluxCD)
  • OPA/Kyverno for admission control
  • Version control and peer review for all changes

Tagging Strategy

  • Mandatory tags: Environment, Owner, Project, CostCenter
  • Enforce tagging via AWS Config rules
  • Use tags for resource-level IAM policies
  • Cost allocation and chargeback

10. Disaster Recovery & Backup

Backup Strategy

  • Automated RDS snapshots (daily, 7-30 day retention)
  • Cross-region snapshot copies for DR
  • EBS snapshots for persistent volumes
  • etcd backups (Velero for cluster backups)
  • GitOps - cluster configuration in Git

Disaster Recovery

  • Multi-region setup for critical applications
  • RTO/RPO requirements documented
  • Failover procedures tested regularly
  • Regular DR drills (quarterly minimum)
  • Automated failover where possible (Route53 health checks)

11. Supply Chain Security

Container Supply Chain

  • Verify base images from trusted sources
  • SBOM (Software Bill of Materials) for dependencies
  • Vulnerability scanning in CI/CD pipeline
  • Sign images (Cosign/Notary)
  • Admission controller to verify signatures

Dependency Management

  • Dependabot or Renovate for automated updates
  • Regular security patching
  • Monitor CVEs for used software
  • Minimal dependencies principle

12. Incident Response

Preparation

  • Incident response plan documented
  • Runbooks for common scenarios
  • On-call rotation defined
  • Communication channels established
  • Post-mortem process defined

Detection & Response

  • Automated alerting for security events
  • Isolate compromised pods/nodes immediately
  • Forensics capability (preserve logs and state)
  • Contact AWS Support for suspected breaches
  • Notify stakeholders per incident severity

13. API Gateway & Service Mesh

API Security

  • AWS API Gateway or Kong/Envoy for API management
  • Rate limiting to prevent abuse
  • API keys or OAuth2 for authentication
  • WAF (Web Application Firewall) rules
  • DDoS protection via AWS Shield

Service Mesh Benefits

  • mTLS for all service-to-service communication
  • Zero-trust networking model
  • Fine-grained authorization policies
  • Observability and traffic monitoring
  • Circuit breaking and fault injection

14. Update & Patch Management

EKS Updates

  • Regular cluster updates (Kubernetes version support: ~14 months)
  • Test in non-prod first
  • Blue-green cluster upgrades for zero downtime
  • Node group rolling updates

Security Patching

  • Automated node updates (EKS managed node groups)
  • Bottlerocket OS for minimal attack surface
  • Container image updates (rebuild regularly)
  • Database patching during maintenance windows

Security Checklist

  • Private subnets for EKS nodes and databases
  • Security groups with least privilege
  • Network policies enforced
  • IRSA configured for all pods needing AWS access
  • No hardcoded credentials anywhere
  • Secrets Manager/Parameter Store with rotation
  • All encryption at rest enabled (etcd, EBS, RDS, S3)
  • TLS/SSL enforced everywhere
  • Pod security standards enforced (restricted)
  • Image scanning in CI/CD
  • Admission controllers (OPA/Kyverno) configured
  • GuardDuty and Security Hub enabled
  • Comprehensive logging to CloudWatch
  • CloudTrail enabled in all regions
  • VPC Flow Logs enabled
  • Regular backups with cross-region copies
  • Multi-factor authentication enforced
  • RBAC properly configured
  • Regular security audits scheduled
  • Incident response plan documented

Summary
Security in multi-cluster architectures requires a defense-in-depth approach:

  • Network isolation at every layer
  • Zero-trust model - verify everything
  • Encryption everywhere (at rest and in transit)
  • Least privilege access for humans and workloads
  • Continuous monitoring and alerting
  • Regular audits and compliance checks
  • Automation for consistency and reliability

Security is not a one-time setup but an ongoing process requiring continuous improvement and vigilance.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.