DEV Community

Yasuhiro Matsuda for AWS Community Builders

Posted on • Edited on

Story about migrating Keycloak using Fargate

This article is about migrating Keycloak on Fargate, but it also describes how to scale out Fargate according to load and how to scale out according to the time of day, controlling scale out on Fargate It is also a reference for those who want to.

What is Keycloak

Open source identity and access management software for single sign-on and API access authentication and authorization control.

What is Fargate

It is an AWS management service that can execute containers. There are also ECS on EC2 that executes containers on EC2 and AWS EKS, which is a Kubernetes management service, but AWS Fargate is recommended for environments that can be executed simply without maintaining the container execution infrastructure.

About this story

This story describes the story of migrating Keycloak services built on ECS on EC2 to Fargate and migrating Keycloak from v6 to v19.


The story of migrating Keycloak's services that were built on ECS on EC2 to Fargate

There are three advantages of migrating ECS on EC2 to Fargate, and we migrated it before the version upgrade described below.

  • No need to maintain container infrastructure
  • Minimizes the cost of scaling out
  • Faster startup time during scale-out, making it easier to follow spikes

The story of migrating Keycloak from v6 to v19

Keycloak basically only needs to be started with a new version because it has a function to automatically migrate at startup, but there are some incompatibility problems due to DB constraints. Therefore, each time an error occurs, you need to investigate how to respond and adjust for inconsistencies.

In the following example, SELECT REALM_ID, NAME, COUNT() FROM KEYCLOAK_GROUP WHERE PARENT_GROUP IS NULL GROUP BY REALM_ID, NAME HAVING COUNT() > 1; can detect duplicate group names.

ERROR [org.keycloak.connections.jpa.updater.liquibase.conn.DefaultLiquibaseConnectionProvider] (ServerService Thread Pool -- 67) Change Set META-INF/jpa-changelog-9.0.1.xml::9.0.1-KEYCLOAK-12579-add-not-null-constraint::keycloak failed. Error: Duplicate entry 'school- -ks' for key 'SIBLING_NAMES' [Failed SQL: UPDATE authdbdev.KEYCLOAK_GROUP SET PARENT_GROUP = ' ' WHERE PARENT_GROUP IS NULL] FATAL [org.keycloak.services] (ServerService Thread Pool -- 67) java.lang.RuntimeException: Failed to update database 
Enter fullscreen mode Exit fullscreen mode

It is also important to note that the environment variables to be set have changed due to the migration from WildFly to Quarks.

WildFly Quarks
DB_DATABASE KC_DB_URL_DATABASE
DB_HOST KC_DB_URL_HOST
DB_PASSWORD KC_DB_PASSWORD
DB_USER KC_DB_USERNAME

When configuring a multi-node cluster with Infinispan defined in standalone-ha.xml in Wildfly, the following environment variables must be set in Quarks after v17.

KC_CACHE="ispn"
KC_CACHE_CONFIG_FILE="cache-ispn-jdbc-ping.xml"

The cache-ispn-jdbc-ping.xml performs the following description (when MySQL is selected for RDS): owners sets the number of nodes on which the cache is kept.

If you are scaling out while running with at least two nodes to maintain availability, you must determine the number of nodes while considering the number of nodes that will degenerate simultaneously when scaling in. (Since you cannot control the nodes when scaling in, you need to devise a way to prevent the cache from being lost by deleting the nodes that hold the cache all at once.)

Also, realms and users max-count affect performance. If you keep a session that exceeds max-count, communication with the DB will occur, so it is better to increase max-count as much as memory allows.

However, when starting in Duplicated mode instead of Replicated mode, it is necessary to thoroughly test with a load test tool using Distributed Load Testing on AWS, etc. so that the cache is rebalanced when scaling in, resulting in out-of-memory. For details of the parameters, see Configuring Infinispan caches and urn:infinispan:config:11.0.

<?xml version="1.0" encoding="UTF-8"?> <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:infinispan:config:11.0 http://www.infinispan.org/schemas/infinispan-config-11.0.xsd" xmlns="urn:infinispan:config:11.0"> <jgroups> <stack name="jdbc-ping-tcp" extends="tcp"> <JDBC_PING connection_driver="com.mysql.cj.jdbc.Driver" connection_username="${env.KC_DB_USERNAME}" connection_password="${env.KC_DB_PASSWORD}" connection_url="${env.KC_DB_URL}" initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, ping_data VARBINARY(255), constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name));" info_writer_sleep_time="500" remove_all_data_on_view_change="true" stack.combine="REPLACE" stack.position="MPING" /> </stack> </jgroups> <cache-container name="keycloak"> <transport lock-timeout="60000" stack="jdbc-ping-tcp"/> <local-cache name="realms"> <encoding> <key media-type="application/x-java-object"/> <value media-type="application/x-java-object"/> </encoding> <memory max-count="10000"/> </local-cache> <local-cache name="users"> <encoding> <key media-type="application/x-java-object"/> <value media-type="application/x-java-object"/> </encoding> <memory max-count="10000"/> </local-cache> <distributed-cache name="sessions" owners="3"> <expiration lifespan="-1"/> </distributed-cache> <distributed-cache name="authenticationSessions" owners="3"> <expiration lifespan="-1"/> </distributed-cache> <distributed-cache name="offlineSessions" owners="3"> <expiration lifespan="-1"/> </distributed-cache> <distributed-cache name="clientSessions" owners="3"> <expiration lifespan="-1"/> </distributed-cache> <distributed-cache name="offlineClientSessions" owners="3"> <expiration lifespan="-1"/> </distributed-cache> <distributed-cache name="loginFailures" owners="3"> <expiration lifespan="-1"/> </distributed-cache> <local-cache name="authorization"> <encoding> <key media-type="application/x-java-object"/> <value media-type="application/x-java-object"/> </encoding> <memory max-count="10000"/> </local-cache> <replicated-cache name="work"> <expiration lifespan="-1"/> </replicated-cache> <local-cache name="keys"> <encoding> <key media-type="application/x-java-object"/> <value media-type="application/x-java-object"/> </encoding> <expiration max-idle="3600000"/> <memory max-count="1000"/> </local-cache> <distributed-cache name="actionTokens" owners="3"> <encoding> <key media-type="application/x-java-object"/> <value media-type="application/x-java-object"/> </encoding> <expiration max-idle="-1" lifespan="-1" interval="300000"/> <memory max-count="-1"/> </distributed-cache> </cache-container> </infinispan> 
Enter fullscreen mode Exit fullscreen mode

The Dockerfile is as follows:

FROM quay.io/keycloak/keycloak:19.0.3 COPY conf/keycloak.conf /opt/keycloak/conf/keycloak.conf COPY conf/cache-ispn-jdbc-ping.xml /opt/keycloak/conf/cache-ispn-jdbc-ping.xml RUN /opt/keycloak/bin/kc.sh build --cache-config-file=cache-ispn-jdbc-ping.xml WORKDIR /opt/keycloak ENTRYPOINT [ "/opt/keycloak/bin/kc.sh" ] 
Enter fullscreen mode Exit fullscreen mode

The definition of ECS in Terraform is as follows: Please understand that the part marked with _xxxx_ is a constant passed in a variable.

resource "aws_ecs_cluster" "keycloak" { name = "clustername" setting { name = "containerInsights" value = "enabled" } } resource "aws_ecs_service" "keycloak" { cluster = aws_ecs_cluster.keycloak.id deployment_maximum_percent = 200 deployment_minimum_healthy_percent = 100 desired_count = _keycloak_desired_count_min_ enable_ecs_managed_tags = false enable_execute_command = true health_check_grace_period_seconds = 180 name = _servicename_ platform_version = "LATEST" propagate_tags = "TASK_DEFINITION" scheduling_strategy = "REPLICA" task_definition = aws_ecs_task_definition.keycloak.arn capacity_provider_strategy { capacity_provider = "FARGATE" base = 2 weight = 1 // After the third unit, it will be started with FARGATE at a rate of 25% } capacity_provider_strategy { capacity_provider = "FARGATE_SPOT" base = 0 weight = 3 // After the third unit, it starts with FARGATE_SPOT at a rate of 75% } deployment_circuit_breaker { enable = false rollback = false } deployment_controller { type = "ECS" } load_balancer { container_name = "keycloak" container_port = aws_alb_target_group.keycloak.port target_group_arn = aws_alb_target_group.keycloak.arn } network_configuration { assign_public_ip = true security_groups = [ aws_security_group.keycloak.id ] subnets = _cluster_subnets_ } timeouts {} lifecycle { ignore_changes = [desired_count] } } resource "aws_ecs_task_definition" "keycloak" { container_definitions = jsonencode( [ { cpu = 0 command = ["start --optimized"] disableNetworking = false portMappings = [ { containerPort = aws_alb_target_group.auth.port hostPort = aws_alb_target_group.auth.port protocol = "tcp" } ] environment = [ { name = "KC_DB_URL_DATABASE" value = _KC_DB_URL_DATABASE_ }, { name = "KC_DB_URL_HOST" value = _KC_DB_URL_HOST_ }, { name = "KC_DB_URL" value = _KC_DB_URL_ }, { name = "KC_DB_PASSWORD" value = _KC_DB_PASSWORD_ }, { name = "KC_DB_USERNAME" value = _KC_DB_USERNAME_ }, { name = "JAVA_OPTS" value = _JAVA_OPTS_ }, { name = "KC_CACHE" value = "ispn" }, { name = "KC_HOSTNAME" value = _keycloak_fqdn_ }, { name = "KC_HOSTNAME_STRICT_BACKCHANNEL" value = "true" }, { name = "KC_CACHE_CONFIG_FILE" value = "cache-ispn-jdbc-ping.xml" }, ] essential = true healthCheck = { command = [ "CMD-SHELL", "curl -f http://localhost:${_keycloak_port_}/auth/ || exit 1", ] interval = 30 retries = 3 timeout = 5 } image = _ecr_repo_url_ stopTimeout = 120 logConfiguration = { logDriver = "awslogs" options = { awslogs-group = aws_cloudwatch_log_group.keycloak.name awslogs-region = "ap-northeast-1" awslogs-stream-prefix = "ecs" } } mountPoints = [] name = "keycloak" volumesFrom = [] }, ] ) cpu = _keycloak_cpu_ task_role_arn = aws_iam_role.ecs_task_role.arn execution_role_arn = aws_iam_role.execution_role.arn family = _service_name_ memory = _keycloak_memory_ network_mode = "awsvpc" requires_compatibilities = [ "FARGATE", ] } resource "aws_alb_target_group" "keycloak" { deregistration_delay = "115" load_balancing_algorithm_type = "round_robin" name = _clustername_ port = _keycloak_port_ protocol = "HTTP" protocol_version = "HTTP1" slow_start = 0 target_type = "ip" vpc_id = _cluster_vpc_id_ health_check { ... } stickiness { cookie_duration = 86400 enabled = false type = "lb_cookie" } } resource "aws_iam_role" "execution_role" { name = "ecs-execution-role" managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"] assume_role_policy = jsonencode({ "Version" : "2008-10-17", "Statement" : [ { "Sid" : "", "Effect" : "Allow", "Principal" : { "Service" : "ecs-tasks.amazonaws.com" }, "Action" : "sts:AssumeRole" } ] }) } resource "aws_iam_role" "ecs_task_role" { name = "ecs-task-role" assume_role_policy = jsonencode({ "Version" : "2012-10-17", "Statement" : [ { "Sid" : "", "Effect" : "Allow", "Principal" : { "Service" : "ecs-tasks.amazonaws.com" }, "Action" : "sts:AssumeRole" } ] }) inline_policy { name = "SessionManagerRoleForECS" policy = jsonencode({ "Version" : "2012-10-17", "Statement" : [ { "Effect" : "Allow", "Action" : [ "ssmmessages:CreateControlChannel", "ssmmessages:CreateDataChannel", "ssmmessages:OpenControlChannel", "ssmmessages:OpenDataChannel" ], "Resource" : "*" } ] }) } } resource "aws_cloudwatch_log_group" "keycloak" { name = "/ecs/${_keycloak_service_name_}" retention_in_days = 180 } 
Enter fullscreen mode Exit fullscreen mode

Scaling policies can be realized by defining them as follows.

resource "aws_appautoscaling_target" "keycloak" { service_namespace = "ecs" resource_id = "service/${aws_ecs_cluster.keycloak.name}/${aws_ecs_service.keycloak.name}" scalable_dimension = "ecs:service:DesiredCount" min_capacity = _keycloak_desired_count_min_ max_capacity = _keycloak_desired_count_max_ lifecycle { ignore_changes = [min_capacity, max_capacity] } } resource "aws_appautoscaling_policy" "keycloak_scale_out" { name = "keycloak_scale_out" policy_type = "StepScaling" service_namespace = aws_appautoscaling_target.keycloak.service_namespace resource_id = aws_appautoscaling_target.keycloak.id scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension step_scaling_policy_configuration { adjustment_type = "ChangeInCapacity" cooldown = 30 metric_aggregation_type = "Maximum" step_adjustment { metric_interval_lower_bound = 0 metric_interval_upper_bound = local.KeycloakCpuHightThreshold scaling_adjustment = _keycloak_desired_count_scaleout_policy_ } step_adjustment { metric_interval_lower_bound = local.KeycloakCpuHightThreshold scaling_adjustment = _keycloak_desired_count_scaleout_policy_ * 2 } } } resource "aws_appautoscaling_policy" "keycloak_scale_in" { name = "keycloak_scale_in" policy_type = "StepScaling" service_namespace = aws_appautoscaling_target.keycloak.service_namespace resource_id = aws_appautoscaling_target.keycloak.id scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension step_scaling_policy_configuration { adjustment_type = "ChangeInCapacity" cooldown = 60 metric_aggregation_type = "Average" step_adjustment { metric_interval_upper_bound = 0 scaling_adjustment = -1 } } } 
Enter fullscreen mode Exit fullscreen mode

When scaling in advance by time zone, it can be realized by defining the following.

resource "aws_appautoscaling_scheduled_action" "keycloak_time_scaling_start" { name = "keycloak_time_caling_start" service_namespace = aws_appautoscaling_target.keycloak.service_namespace resource_id = aws_appautoscaling_target.keycloak.id scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension schedule = _keycloak_desired_count_time_scaling_start_ scalable_target_action { min_capacity = _keycloak_desired_count_min_ * _keycloak_desired_count_time_scaling_scale max_capacity = _keycloak_desired_count_max_ * _keycloak_desired_count_time_scaling_scale } } resource "aws_appautoscaling_scheduled_action" "keycloak_time_scaling_stop" { name = "keycloak_time_caling_stop" service_namespace = aws_appautoscaling_target.keycloak.service_namespace resource_id = aws_appautoscaling_target.keycloak.id scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension schedule = _keycloak_desired_count_time_scaling_stop_ scalable_target_action { min_capacity = _keycloak_desired_count_min_ max_capacity = _keycloak_desired_count_max_ } depends_on = [aws_appautoscaling_scheduled_action.keycloak_time_scaling_start] } 
Enter fullscreen mode Exit fullscreen mode

That's all you can control with Keycloak with Fargate.

Top comments (0)