Posted on May 16 • Edited on May 27

Migrating to Self-Hosted 3scale API Management on ROSA Kubernetes

#kubernetes #apimanagement #devops #programming

I was tasked to migrate from a Red Hat-hosted 3scale portal to a self-hosted version in ROSA (Red Hat OpenShift Service on AWS). This presented quite a challenge as my knowledge in Kubernetes was mostly theoretical, based on studying for the Kubernetes and Cloud Native Associate (KCNA) certification exam.

The goal was to recreate a self-hosted version of 3scale using an operator in ROSA, but what I thought would be a straightforward deployment turned into a valuable learning experience.

What is Red Hat-Managed/hosted 3scale?

When using Red Hat-hosted 3scale (also known as "SaaS" or managed 3scale), all infrastructure complexities are abstracted away. Red Hat handles the deployment, maintenance, updates, and scaling of the platform.

As a user, you simply access a provided portal URL and focus on managing your APIs rather than worrying about the underlying infrastructure. Your daily tasks revolve around the actual API management activities like adding backends, configuring products, creating applications, setting up authentication, and managing rate limits.

It's a convenient option that requires minimal operational overhead, allowing your team to focus on API strategy rather than platform management.

What is self-hosted 3scale?

In contrast, self-hosted 3scale brings both flexibility and responsibility. You gain complete control over your deployment configuration, integration with internal systems, customization options, and data locality.

Since the infrastructure runs on Kubernetes (in my case, ROSA - Red Hat OpenShift Service on AWS), you have access to all the native Kubernetes capabilities for scaling, monitoring, and management.

However, this freedom comes with the need to manage the entire application lifecycle within the Kubernetes ecosystem: installation via operators or templates, configuration through custom resources, scaling via horizontal pod autoscalers, implementing backup strategies, and handling upgrades.

You're responsible for ensuring high availability with proper pod distribution, performance tuning through resource allocation, and troubleshooting any issues that arise in both the 3scale application components and the underlying Kubernetes resources.

The Migration

Migrating from managed to self-hosted represented a significant shift in responsibilities, and I was about to discover just how much Red Hat had been handling behind the scenes.

This blog post documents a real-world troubleshooting journey that encountered and overcame significant challenges:

Missing Routes for Admin Access
DNS resolution issues preventing access to Red Hat's container registry
Architecture mismatch between my ARM-based MacBook for and the x86_64 docker container images required for deployment
PVC Access Mode Issues
Resource Constraints
Missing Service for App Components

By sharing this experience, I hope to help others who might encounter similar issues during their deployment process, especially those who are transitioning from theoretical Kubernetes knowledge to practical application.

The Initial Deployment Attempt

We started by creating a dedicated namespace for our 3scale deployment:

oc create namespace 3scale-backup

After switching to this namespace (oc project 3scale-backup), we downloaded the 3scale API Management Platform template:

curl -o amp.yml https://raw.githubusercontent.com/3scale/3scale-amp-openshift-templates/master/amp/amp.yml

Then we tried to deploy 3scale using this template:

oc new-app --file=amp.yml \ --param WILDCARD_DOMAIN=apps.[domain of your openshift].openshiftapps.com \ --param ADMIN_PASSWORD=password123

The template processing appeared successful, creating numerous resources:

Imagestreams
Deployment configs
Services
Routes
Persistent volume claims
Secrets

oc get all -n [your namespace]

However, when checking the status of the pods, we noticed that many deployments were either not starting, with errors, crashLoopBackOff or stuck in initialization phases:

oc get pods

While some components like Redis and database pods were running fine, critical components like backend-listener, backend-worker, and backend-cron were not deploying at all.

The system components were also failing during initialization.

Challenge 1: Missing Routes for Admin Access

Our first challenge was that the URLs for accessing the admin portal https://3scale-admin.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com were showing "Application is not available".

The reason was simple - the template had not created the necessary routes for our self-hosted 3scale services. We manually created them:

oc create route edge system-admin --service=system-provider --hostname=3scale-admin.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com oc create route edge system-developer --service=system-developer --hostname=3scale.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com oc create route edge system-master --service=system-master --hostname=master.apps.[YOUR-ROSA-DOMAIN].openshiftapps.com

However, after creating the routes, the admin portal still wasn't accessible. Digging into the logs with oc logs system-app-1-hook-pre, we discovered a more fundamental issue.

Challenge 2: DNS Resolution Issues

The pre-deployment hook was failing with a specific error:

ThreeScale::Core::APIClient::ConnectionError: connection refused: backend-listener:80

Further investigation revealed that the backend components weren't deployed at all. When checking the deployment configs:

oc get dc/backend-listener NAME REVISION DESIRED CURRENT TRIGGERED BY backend-listener 0 1 0 config,image(amp-backend:2.12)

We saw that backend-listener, backend-worker, and backend-cron had REVISION 0 and CURRENT 0, indicating they hadn't been deployed.

The root cause was found in the imagestream:

oc describe imagestream amp-backend

This showed an error:

error: Import failed (InternalError): Internal error occurred: registry.redhat.com/3scale-amp2/backend-rhel8:3scale2.12: Get "https://registry.redhat.com/v2/": dial tcp: lookup registry.redhat.com on 100.10.0.11:23: no such host

Our OpenShift cluster couldn't resolve the hostname registry.redhat.com due to DNS issues. This was confirmed by attempting to run:

nslookup registry.redhat.com

Which returned "No answer" from the DNS server.

Or technically it does not pull the image required from the RedHat registry to the pods.

So our workaround was to manually pull the image to the docker (locally), then push that image to the namespace's private RedHat registry.

Challenge 3: Architecture Mismatch

While working to address the DNS issues , we discovered another challenge - we were trying to pull Red Hat's container images on an ARM64-based machine (likely an Apple Silicon Mac), but the images were only available for x86_64 architecture.

When attempting to pull the images directly:

docker login [RedHat Credentials] docker pull registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12

We received:

no matching manifest for linux/arm64/v8 in the manifest list entries

The Solution Process

We implemented a multi-step solution to overcome these challenges:

Step 1: Authentication with Red Hat Registry

First, we logged in to the Red Hat Container Registry:

docker login registry.redhat.io

Step 2: Architecture-aware Image Pulling

Because I was using macOS and having the docker desktop installed in it, the pulled image does not match with the operating system's arch.

To overcome the architecture mismatch, we explicitly specified the platform when pulling:

docker pull --platform linux/amd64 registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12

This successfully pulled the image by using Rosetta 2 emulation on macOS.

Step 3: Exposing the OpenShift Registry

To make our OpenShift registry accessible:

oc patch configs.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge

Step 4: Pushing Images to Internal Registry

We pushed the pulled images to our OpenShift internal registry:

# Get credentials TOKEN=$(oc whoami -t) REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}') # Login to registry docker login -u kubeadmin -p $TOKEN $REGISTRY # Tag and push docker tag registry.redhat.io/3scale-amp2/backend-rhel8:3scale2.12 $REGISTRY/[namespace]/amp-backend:2.12 docker push $REGISTRY/[namespace]/amp-backend:2.12

Step 5: Updating ImageStreams

We updated the imagestream to point to our locally pushed image:

oc tag $REGISTRY/[namespace]/amp-backend:2.12 amp-backend:2.12 --source=docker

This automatically triggered the deployment due to the ImageChange trigger on the deployment config.

Results

After implementing these steps for the backend-listener component, the deployment began successfully (at least for this resource!).

Challenge 4: PVC Access Mode Issues

So we went to the 3Scale self-hosted Admin Portal and checked that still it's not working.

We checked the pods and found out that some of them are having issues.

The error logs show that several deployments are failing because their pods are taking too long to become available (timeout errors):

apicast-production-1-deploy: "pods took longer than 1800 seconds to become available"
system-sidekiq-1-deploy: "pods took longer than 1200 seconds to become available"
system-sphinx-1-deploy: "pods took longer than 1200 seconds to become available"

This typically happens when pods are stuck in a pending or initializing state for too long.

So we checked the logs of the problematic pods and checked the PVC as well.

# Check logs for apicast-production deployment oc logs apicast-production-1-deploy # Check logs for system-sidekiq deployment oc logs system-sidekiq-1-deploy # Check logs for system-sphinx deployment oc logs system-sphinx-1-deploy # Check events for the pending pod oc describe pod system-app-1-hook-pre

We discovered a storage issue where the system-storage PVC was failing to provision:

oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE backend-redis-storage Bound pvc-ss987s5b-026a-4srg-au97-549d8958933a 1Gi RWO gp3 53m mysql-storage Bound pvc-72s43210-s033-4c8w-ar53-043bf3kk1496 1Gi RWO gp3 53m system-redis-storage Bound pvc-s3r1111d2-4d57-41dd-9066-c488eda666d4 1Gi RWO gp3 53m system-storage Pending

The error was related to access modes:

failed to provision volume with StorageClass "gp3": rpc error: code = InvalidArgument desc = Volume capabilities MULTI_NODE_MULTI_WRITER not supported. Only AccessModes[ReadWriteOnce] supported.

We fixed it by creating a new PVC with the correct access mode:

# First, delete pods using this PVC oc delete pod system-app-1-hook-pre # Back up the current PVC definition oc get pvc system-storage -o yaml > system-storage-pvc.yaml # Delete the stuck PVC oc delete pvc system-storage # Create a new PVC with the correct settings oc create -f - <<EOF apiVersion: v1 kind: PersistentVolumeClaim metadata: name: system-storage namespace: [Namespace] labels: app: 3scale-api-management threescale_component: system threescale_component_element: app spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: gp3 EOF

After fixing the PVC issue, restart the deployments:

oc rollout retry dc/system-app oc rollout retry dc/apicast-production oc rollout retry dc/backend-listener oc rollout retry dc/system-sidekiq oc rollout retry dc/system-sphinx

The PVC issue got fixed, and the system-storage PVC is now correctly bound to a volume.

oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE backend-redis-storage Bound pvc-ss987s5b-026a-4srg-au97-549d8958933a 1Gi RWO gp3 53m mysql-storage Bound pvc-72s43210-s033-4c8w-ar53-043bf3kk1496 1Gi RWO gp3 53m system-redis-storage Bound pvc-s3r1111d2-4d57-41dd-9066-c488eda666d4 1Gi RWO gp3 53m system-storage Bound pvc-2286196a-8885-490s-11c1-654320bd8a5a6 1Gi RWO gp3 116s

Challenge 5: Resource Constraints

Even after resolving the PVC issue, pods were still stuck in Pending state due to insufficient resources:

oc describe pod system-app-2-mr25z

... Warning FailedScheduling 2m34s default-scheduler 0/9 nodes are available: 2 Insufficient cpu, 3 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 6 node(s) had volume node affinity conflict.

We reduced the resource requirements to make the pods fit on the available nodes:

oc patch dc/system-app -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-master","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}},{"name":"system-provider","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}},{"name":"system-developer","resources":{"requests":{"cpu":"25m","memory":"400Mi"}}}]}}}}' oc patch dc/apicast-production -p '{"spec":{"template":{"spec":{"containers":[{"name":"apicast-production","resources":{"requests":{"cpu":"25m","memory":"128Mi"}}}]}}}}' oc patch dc/system-sidekiq -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-sidekiq","resources":{"requests":{"cpu":"25m","memory":"250Mi"}}}]}}}}' oc patch dc/system-sphinx -p '{"spec":{"template":{"spec":{"containers":[{"name":"system-sphinx","resources":{"requests":{"cpu":"25m","memory":"250Mi"}}}]}}}}'

After applying these patches, we restarted the failed components:

# Retry system-sidekiq and system-sphinx deployments oc rollout retry dc/system-sidekiq oc rollout retry dc/system-sphinx

That got fixed too!!

 ➜ oc get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE apicast-production ClusterIP xxx.xx.xxx.xxx <none> 8080/TCP,8090/TCP 83m apicast-staging ClusterIP xxx.xx.xxx.xxx <none> 8080/TCP,8090/TCP 83m backend-listener ClusterIP xxx.xx.xxx.xxx <none> 3000/TCP 83m backend-redis ClusterIP xxx.xx.xxx.xxx <none> 6379/TCP 83m system-developer ClusterIP xxx.xx.xxx.xxx <none> 3000/TCP 83m system-master ClusterIP xxx.xx.xxx.xxx <none> 3000/TCP 83m system-memcache ClusterIP xxx.xx.xxx.xxx <none> 11211/TCP 83m system-mysql ClusterIP xxx.xx.xx.xxx <none> 3306/TCP 83m system-provider ClusterIP xxx.xx.x.xxx <none> 3000/TCP 83m system-redis ClusterIP xxx.xx.xxx.xxx <none> 6379/TCP 83m system-sphinx ClusterIP xxx.xx.xxx.xxx <none> 9306/TCP 83m zync ClusterIP xxx.xx.xxx.xx <none> 8080/TCP 83m zync-database ClusterIP xxx.xx.xx.xxx <none> 5432/TCP 83m

➜ oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD backend backend-3scale.apps.[YOUR-DOMAIN].openshiftapps.com backend-listener http edge/Allow None system-admin 3scale-admin.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None system-developer 3scale.apps.[YOUR-DOMAIN].openshiftapps.com /developer system-app 3001 edge/Allow None system-master master.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3002 edge/Allow None system-provider 3scale.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None zync-3scale-api-hhhjs api-3scale-apicast-production.apps.[YOUR-DOMAIN].openshiftapps.com apicast-production gateway edge/Redirect None zync-3scale-api-phh9n api-3scale-apicast-staging.apps.[YOUR-DOMAIN].p1.openshiftapps.com apicast-staging gateway edge/Redirect None zync-3scale-master-nhhht HostAlreadyClaimed system-master http edge/Redirect None zync-3scale-provider-q9hh9 HostAlreadyClaimed system-developer http edge/Redirect None zync-3scale-provider-shh6z HostAlreadyClaimed system-provider http edge/Redirect None

Since the containers are starting up, it should be a matter of minutes before we can access the admin portal.

We were just pod status, and when system-app shows 3/3 ready, tried accessing the admin portal at:

https://3scale-admin.apps.[YOUR-DOMAIN].openshiftapps.com

But then, it is still UNAVAILABLE.

Challenge 6: Missing Service for App Components

Even after all pods were running, the admin portal was not accessible. The issue was that we created routes pointing to a service named "system-app" which didn't exist:

oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD system-admin 3scale-admin.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None system-developer 3scale.apps.[YOUR-DOMAIN].openshiftapps.com /developer system-app 3001 edge/Allow None system-master master.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3002 edge/Allow None system-provider 3scale.apps.[YOUR-DOMAIN].openshiftapps.com system-app 3000 edge/Allow None

oc describe service system-app Error from server (NotFound): services "system-app" not found

We fixed this by creating the missing service:

bash oc create -f - <<EOF apiVersion: v1 kind: Service metadata: name: system-app namespace: [NAMESPACE] labels: app: 3scale-api-management spec: ports: - name: provider port: 3000 protocol: TCP targetPort: 3000 - name: developer port: 3001 protocol: TCP targetPort: 3001 - name: master port: 3002 protocol: TCP targetPort: 3002 selector: deploymentConfig: system-app type: ClusterIP EOF

Final Result
After working through all these challenges, we finally had a fully operational 3scale deployment:

bash oc get pods NAME READY STATUS RESTARTS AGE apicast-production-4-7hh00 1/1 Running 0 2m12s apicast-staging-1-6hh00 1/1 Running 0 83m backend-cron-2-6955a 1/1 Running 0 23m backend-listener-1-5hh00 1/1 Running 0 26m backend-redis-1-mhh005 1/1 Running 0 57m backend-worker-2-lr8gb 1/1 Running 0 23m system-app-3-7ln8g 3/3 Running 0 85s system-memcache-1-xddig 1/1 Running 0 80m system-mysql-1-ee4wt 1/1 Running 0 80m system-redis-1-45hh0 1/1 Running 0 80m zync-1-l7ghy 1/1 Running 0 80m zync-database-1-dt3l9 1/1 Running 0 80m zync-que-1-wwri9 1/1 Running 2 (80m ago) 80m

With all components running, finally, we were able to access the 3scale admin portal and begin configuring our APIs.

Verify Deployment

# Check all pods are running oc get pods # Expected output should show all pods in Running or Completed state: # - system-app-X-XXXXX (3/3 Running) # - apicast-production-X-XXXXX (1/1 Running) # - apicast-staging-X-XXXXX (1/1 Running) # - system-sidekiq-X-XXXXX (1/1 Running) # - system-sphinx-X-XXXXX (1/1 Running) # - backend-* pods (1/1 Running) # - system-mysql-X-XXXXX (1/1 Running) # - system-redis-X-XXXXX (1/1 Running) # - zync-* pods (1/1 Running)

Key Lessons Learned

DNS Resolution is Critical: Ensure your OpenShift cluster can resolve external registry hostnames before attempting deployments that rely on them.
Architecture Awareness: When working with enterprise container images on ARM-based development machines, be explicit about architecture requirements using the --platform flag.
Manual Image Mirroring: In restricted environments, manually pulling and pushing images to an internal registry is a viable workaround.
ImageStream Mechanics: Understanding how OpenShift's ImageStreams work is essential for troubleshooting deployment issues.
Network Policies: In enterprise environments, network policies may restrict access to external registries, requiring coordination with network administrators.

Common Issues and Solutions

Pod stuck in Pending: Usually resource constraints - reduce resource requests
PVC mounting issues: Check storage class and access modes
Routes not working: Ensure services exist and selectors match pods
Application not available: Create missing system-app service
Database connection issues: Check that system-mysql pod is running and accessible

Final Verification Checklist

All pods are in Running state (except completed deployment/hook pods)
Routes are accessible and don't show "Application not available"
Can log into Admin Portal successfully
Can access Developer Portal
Master Portal is accessible (if needed)
Default passwords have been changed

Once we successfully access the 3scale portals, we can begin migrating the existing 3scale components from another environment or start adding the new components in 3scale.

A. Backend APIs

API definitions and configurations
Authentication settings
Rate limiting rules

B. Products (API Products)

Product configurations
Application plans
Pricing rules
Methods and metrics

C. Applications

Application keys and secrets
Application plans assignments
Usage statistics (if needed)

D. Accounts and Users

Developer accounts
Admin users
Access permissions

E. Policies

Custom policies
Policy chains
Configuration settings

F. Developer Portal

Custom pages and templates
Documentation
CMS content

Conclusion

Deploying complex solutions like 3scale API Management in restricted network environments or across architecture boundaries presents unique challenges. By understanding the underlying issues and implementing a systematic approach to manually mirror images, we were able to overcome these obstacles.

While this process requires more manual effort than a standard deployment, it demonstrates the flexibility of OpenShift's container management capabilities and provides a path forward for deployments in environments with similar restrictions.

DEV Community