While you're getting in shape for the daily challenges handling productive AWS solutions, these two (confusing?)interesting definitions may pop up in your team discussions, so let's dive a bit into these two topics.
High Availability
High Availability can be defined as the percentage of uptime which maintains operational performance, often aligned to a service's SLA. AWS has many SLAs for its services where they implement their own level of resilience and management to maintain that level of high availability. Find below the following SLA examples:
- S3 Standard
- 99.9%
- EC2
- 99.95%
- RDS
- 99.95%
High Availability - Example Design
- 1: High Availability through the presence of 2 Availability Zones in a single Region
- 2: High Availability through multiple EC2 instances, which guarantee a minimum of available nodes to handle necessary traffic load.
- 3: High Availability achieved through the use of a Load Balancer.
Let's implement this solution through an AWS CloudFormation template!
Note: Consider your AWS Free-tier availability to avoid hidden charges
About CloudFormation:
CloudFormation is a way of defining your AWS Infrastructure as Code. All the necessary resources and their dependencies can be defined as code in a CloudFormation Template (JSON or YAML file), which is then launched as a stack. Some definitions to keep in mind:
Resources : Allow us to define the required AWS resources. Mandatory section.
Parameters : To enter Dynamic inputs to your template. You can customize it based on your specific needs or use cases.
Mappings : To define static variables, following a key:value pair definition.
Outputs : To define the output values that can be referred by another stack through import.
Conditions : Situations under a specific resource can, or cannot, be created.
Without further due, the below CloudFormation template will provide a ELB
--- Parameters: SecurityGroupDescription: Description: Security Group Description Type: String KeyName: Description: Key Pair for EC2 Type: 'AWS::EC2::KeyPair::KeyName' Resources: EC2Instance1: Type: AWS::EC2::Instance Properties: AvailabilityZone: us-east-1a ImageId: ami-0233c2d874b811deb InstanceType: t2.micro SecurityGroups: - !Ref EC2SecurityGroup KeyName: !Ref KeyName UserData: Fn::Base64: !Sub | #!/bin/bash yum update -y yum install -y httpd systemctl start httpd systemctl enable httpd #echo "<h1>Hello from Region us-east-1a</h1>" > /var/www/html/index.html EC2Instance2: Type: AWS::EC2::Instance Properties: AvailabilityZone: us-east-1b ImageId: ami-0233c2d874b811deb InstanceType: t2.micro SecurityGroups: - !Ref EC2SecurityGroup KeyName: !Ref KeyName UserData: Fn::Base64: !Sub | #!/bin/bash yum update -y yum install -y httpd systemctl start httpd systemctl enable httpd #echo "<h1>Hello from Region us-east-1b</h1>" > /var/www/html/index.html # security group ELBSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: ELB Security Group SecurityGroupIngress: - IpProtocol: tcp FromPort: 80 ToPort: 80 CidrIp: 0.0.0.0/0 EC2SecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: !Ref SecurityGroupDescription SecurityGroupIngress: - IpProtocol: tcp FromPort: 80 ToPort: 80 SourceSecurityGroupId: Fn::GetAtt: - ELBSecurityGroup - GroupId - IpProtocol: tcp FromPort: 22 ToPort: 22 CidrIp: 0.0.0.0/0 # Load Balancer for EC2 LoadBalancerforEC2: Type: AWS::ElasticLoadBalancing::LoadBalancer Properties: AvailabilityZones: [us-east-1a, us-east-1b] Instances: - !Ref EC2Instance1 - !Ref EC2Instance2 Listeners: - LoadBalancerPort: '80' InstancePort: '80' Protocol: HTTP HealthCheck: Target: HTTP:80/ HealthyThreshold: '3' UnhealthyThreshold: '5' Interval: '30' Timeout: '5' SecurityGroups: - !GetAtt ELBSecurityGroup.GroupId
Fault Tolerance
Fault Tolerance has the solely goal to expand on High Availability to offer the greatest level of protection, aiming for a zero-downtime solution. This approach will certainly imply additional costs implications, with the upside of a higher uptime percentage and no interruption should 1 or even many components fails at different levels.
Here we can see the following:
1: Regional-redundancy is achieved through the use of AWS Route53 DNS service.
2: Availability-Zone redundancy level can be achieved by ELB, same as HA approach.
3: EC2 compute node is achieved either by multiple EC2 instances or Auto Scaling Groups (ASG).
What about Microservices?
Certainly above definitions apply to long-time existing Web applications, but what about Microservices architectures? what additional layers of HA or FT can we add here?
To give you an example, AWS EKS solution runs and scales Kubernetes control plane across multiple Availability Zones to guarantee HA. Unhealthy control plane instances detection and replacement are among the key feature AWS provides to maintain HA of the control plane during its operation. Along with this resiliency layer, we can use the existing ones we discussed before.
As we did before, let's have a look at a sample CloudFormation template we can use to deploy EKS Control-Plane, including IAM Roles, Network architecture and redundant control plane for EKS Cluster:
AWSTemplateFormatVersion: '2010-09-09' Parameters: EKSIAMRoleName: Type: String Description: The name of the IAM role for the EKS service to assume. EKSClusterName: Type: String Description: The desired name of your AWS EKS Cluster. VpcBlock: Type: String Default: 192.168.0.0/16 Description: The CIDR range for the VPC. This should be a valid private (RFC 1918) CIDR range. PublicSubnet01Block: Type: String Default: 192.168.0.0/18 Description: CidrBlock for public subnet 01 within the VPC PublicSubnet02Block: Type: String Default: 192.168.64.0/18 Description: CidrBlock for public subnet 02 within the VPC PrivateSubnet01Block: Type: String Default: 192.168.128.0/18 Description: CidrBlock for private subnet 01 within the VPC PrivateSubnet02Block: Type: String Default: 192.168.192.0/18 Description: CidrBlock for private subnet 02 within the VPC Metadata: AWS::CloudFormation::Interface: ParameterGroups: - Label: default: "Worker Network Configuration" Parameters: - VpcBlock - PublicSubnet01Block - PublicSubnet02Block - PrivateSubnet01Block - PrivateSubnet02Block Resources: EKSIAMRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Principal: Service: - eks.amazonaws.com Action: - 'sts:AssumeRole' RoleName: !Ref EKSIAMRoleName ManagedPolicyArns: - arn:aws:iam::aws:policy/AmazonEKSClusterPolicy - arn:aws:iam::aws:policy/AmazonEKSServicePolicy VPC: Type: AWS::EC2::VPC Properties: CidrBlock: !Ref VpcBlock EnableDnsSupport: true EnableDnsHostnames: true Tags: - Key: Name Value: !Sub '${AWS::StackName}-VPC' InternetGateway: Type: "AWS::EC2::InternetGateway" VPCGatewayAttachment: Type: "AWS::EC2::VPCGatewayAttachment" Properties: InternetGatewayId: !Ref InternetGateway VpcId: !Ref VPC PublicRouteTable: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VPC Tags: - Key: Name Value: Public Subnets - Key: Network Value: Public PrivateRouteTable01: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VPC Tags: - Key: Name Value: Private Subnet AZ1 - Key: Network Value: Private01 PrivateRouteTable02: Type: AWS::EC2::RouteTable Properties: VpcId: !Ref VPC Tags: - Key: Name Value: Private Subnet AZ2 - Key: Network Value: Private02 PublicRoute: DependsOn: VPCGatewayAttachment Type: AWS::EC2::Route Properties: RouteTableId: !Ref PublicRouteTable DestinationCidrBlock: 0.0.0.0/0 GatewayId: !Ref InternetGateway PrivateRoute01: DependsOn: - VPCGatewayAttachment - NatGateway01 Type: AWS::EC2::Route Properties: RouteTableId: !Ref PrivateRouteTable01 DestinationCidrBlock: 0.0.0.0/0 NatGatewayId: !Ref NatGateway01 PrivateRoute02: DependsOn: - VPCGatewayAttachment - NatGateway02 Type: AWS::EC2::Route Properties: RouteTableId: !Ref PrivateRouteTable02 DestinationCidrBlock: 0.0.0.0/0 NatGatewayId: !Ref NatGateway02 NatGateway01: DependsOn: - NatGatewayEIP1 - PublicSubnet01 - VPCGatewayAttachment Type: AWS::EC2::NatGateway Properties: AllocationId: !GetAtt 'NatGatewayEIP1.AllocationId' SubnetId: !Ref PublicSubnet01 Tags: - Key: Name Value: !Sub '${AWS::StackName}-NatGatewayAZ1' NatGateway02: DependsOn: - NatGatewayEIP2 - PublicSubnet02 - VPCGatewayAttachment Type: AWS::EC2::NatGateway Properties: AllocationId: !GetAtt 'NatGatewayEIP2.AllocationId' SubnetId: !Ref PublicSubnet02 Tags: - Key: Name Value: !Sub '${AWS::StackName}-NatGatewayAZ2' NatGatewayEIP1: DependsOn: - VPCGatewayAttachment Type: 'AWS::EC2::EIP' Properties: Domain: vpc NatGatewayEIP2: DependsOn: - VPCGatewayAttachment Type: 'AWS::EC2::EIP' Properties: Domain: vpc PublicSubnet01: Type: AWS::EC2::Subnet Metadata: Comment: Subnet 01 Properties: AvailabilityZone: Fn::Select: - '0' - Fn::GetAZs: Ref: AWS::Region CidrBlock: Ref: PublicSubnet01Block VpcId: Ref: VPC Tags: - Key: Name Value: !Sub "${AWS::StackName}-PublicSubnet01" PublicSubnet02: Type: AWS::EC2::Subnet Metadata: Comment: Subnet 02 Properties: AvailabilityZone: Fn::Select: - '1' - Fn::GetAZs: Ref: AWS::Region CidrBlock: Ref: PublicSubnet02Block VpcId: Ref: VPC Tags: - Key: Name Value: !Sub "${AWS::StackName}-PublicSubnet02" PrivateSubnet01: Type: AWS::EC2::Subnet Metadata: Comment: Subnet 03 Properties: AvailabilityZone: Fn::Select: - '0' - Fn::GetAZs: Ref: AWS::Region CidrBlock: Ref: PrivateSubnet01Block VpcId: Ref: VPC Tags: - Key: Name Value: !Sub "${AWS::StackName}-PrivateSubnet01" - Key: "kubernetes.io/role/internal-elb" Value: 1 PrivateSubnet02: Type: AWS::EC2::Subnet Metadata: Comment: Private Subnet 02 Properties: AvailabilityZone: Fn::Select: - '1' - Fn::GetAZs: Ref: AWS::Region CidrBlock: Ref: PrivateSubnet02Block VpcId: Ref: VPC Tags: - Key: Name Value: !Sub "${AWS::StackName}-PrivateSubnet02" - Key: "kubernetes.io/role/internal-elb" Value: 1 PublicSubnet01RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: SubnetId: !Ref PublicSubnet01 RouteTableId: !Ref PublicRouteTable PublicSubnet02RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: SubnetId: !Ref PublicSubnet02 RouteTableId: !Ref PublicRouteTable PrivateSubnet01RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: SubnetId: !Ref PrivateSubnet01 RouteTableId: !Ref PrivateRouteTable01 PrivateSubnet02RouteTableAssociation: Type: AWS::EC2::SubnetRouteTableAssociation Properties: SubnetId: !Ref PrivateSubnet02 RouteTableId: !Ref PrivateRouteTable02 ControlPlaneSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Cluster communication with worker nodes VpcId: !Ref VPC EKSCluster: Type: AWS::EKS::Cluster Properties: Name: !Ref EKSClusterName RoleArn: "Fn::GetAtt": ["EKSIAMRole", "Arn"] ResourcesVpcConfig: SecurityGroupIds: - !Ref ControlPlaneSecurityGroup SubnetIds: - !Ref PublicSubnet01 - !Ref PublicSubnet02 - !Ref PrivateSubnet01 - !Ref PrivateSubnet02 DependsOn: [EKSIAMRole, PublicSubnet01, PublicSubnet02, PrivateSubnet01, PrivateSubnet02, ControlPlaneSecurityGroup] Outputs: SubnetIds: Description: Subnets IDs in the VPC Value: !Join [ ",", [ !Ref PublicSubnet01, !Ref PublicSubnet02, !Ref PrivateSubnet01, !Ref PrivateSubnet02 ] ] SecurityGroups: Description: Security group for the cluster control plane communication with worker nodes Value: !Join [ ",", [ !Ref ControlPlaneSecurityGroup ] ] VpcId: Description: The VPC Id Value: !Ref VPC
Final Thoughts
We can conclude that Fault-Tolerant systems are intrinsically Highly available solutions with Zero-time downtime, but as we saw in this article, a Highly available solution is not completely Fault Tolerant. Microservices grant us an extra layer of resiliency, that also involves certain risk and complexity. It's down to us as Solution Architects to define which architecture we want to achieve based on business needs or budget constraints.
References:
Top comments (0)