Posted on Jan 9, 2023 • Edited on Feb 4, 2023

HarperDB on EKS

Hi there 👋, let's see how to deploy HarperDB on EKS, and then test it with an API call from CURL. You can get the Kubernetes manifests that we make in this post from this link.

Hope you are already familiar with topics such as Deployment, Load Balancer service, Secret and Persistent volume claim

Ensure you have the required IAM permissions, have installed the aws, eksctl & kubectl cli tools, and have setup the config and credentials.

For me the config is as follows.

$ cat ~/.aws/config [default] region=us-east-1

Cluster

We can now create an EKS cluster with eksctl. You may see this video for cluster creation from the CLI.

$ eksctl create cluster --name eks-cluster --zones=us-east-1a,us-east-1b

This has taken around 20 mins for me. Once it's done we can update the kubeconfig.

$ aws eks update-kubeconfig --name eks-cluster

Docker hub

We can visit the docker hub page of harperdb to get an idea on the ports, environment variables, volume path etc.

They have given an example docker command as below.

docker run -d \ -v /host/directory:/opt/harperdb/hdb \ -e HDB_ADMIN_USERNAME=HDB_ADMIN \ -e HDB_ADMIN_PASSWORD=password \ -p 9925:9925 \ harperdb/harperdb

This tells us the volume mount path in the container is /opt/harperdb/hdb, there are 2 environment variables for username and password, and the container port is 9925. Finally the image is harperdb/harperdb.

We now have enough info to start writing our Kubernetes manifests.

Kubernetes manifests

I am going to create a directory by name harperdb where I would keep all the manifests.

$ mkdir harperdb $ cd harperdb

Let's begin with the environment variables, we can write both username and password in a secret object.

$ cat <<EOF > secret.yaml --- apiVersion: v1 kind: Secret metadata: name: harperdb namespace: harperdb stringData: HDB_ADMIN_USERNAME: admin HDB_ADMIN_PASSWORD: password12345 ... EOF

We can now go with a persistent volume claim, that can dynamically create an EBS volume of size 5Gi in AWS.

$ cat <<EOF > pvc.yaml --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: harperdb namespace: harperdb spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi ... EOF

Then comes the deployment manifest, where we can define the container image, refer to the secret for the env vars, and pvc for the volume. Note that the volume mount path matches with that in the docker command.

$ cat <<EOF > deploy.yaml --- apiVersion: apps/v1 kind: Deployment metadata: name: harperdb namespace: harperdb spec: selector: matchLabels: app: harperdb template: metadata: labels: app: harperdb spec: containers: - name: harperdb image: harperdb/harperdb envFrom: - secretRef: name: harperdb volumeMounts: - name: data mountPath: /opt/harperdb/hdb volumes: - name: data persistentVolumeClaim: claimName: harperdb ... EOF

Finally, we have to expose the deployment with a service, we know from the docker command that the container port is 9925.

$ cat <<EOF > svc.yaml --- apiVersion: v1 kind: Service metadata: name: harperdb namespace: harperdb spec: selector: app: harperdb type: LoadBalancer ports: - name: http port: 8080 targetPort: 9925 ... EOF

Note that we have used 8080 as the service port.

Workloads

Create a namespace by name harperdb, where we can create our objects.

$ kubectl create ns harperdb namespace/harperdb created

We are good to create objects with the 4 manifests.

$ ls deploy.yaml pvc.yaml secret.yaml svc.yaml $ kubectl create -f . deployment.apps/harperdb created persistentvolumeclaim/harperdb created secret/harperdb created service/harperdb created

Fix PVC

The pvc should be in pending status.

$ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE harperdb Pending gp2 7m3s

Please follow this link to add IAM role in AWS cloud, and ebs csi objects on the cluster. This should fix the PVC issue.

Once done, the pvc should be bound to a persistent volume(pv).

$ kubectl get pvc -n harperdb NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE harperdb Bound pvc-7c83e38c-b00a-4194-8c67-ba5c9c1118e7 5Gi RWO gp2 9s

And the pv should be mapped to an EBS volume.

$ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-7c83e38c-b00a-4194-8c67-ba5c9c1118e7 5Gi RWO Delete Bound harperdb/harperdb gp2 64s $ kubectl describe pv pvc-7c83e38c-b00a-4194-8c67-ba5c9c1118e7 | grep VolumeID VolumeID: vol-0bbca736346f02aa1

Note that a persistent volume is a cluster level object and not bound to a namespace. We can check the volume details from the aws cli.

$ aws ec2 describe-volumes --volume-ids vol-0bbca736346f02aa1 --query "Volumes[0].Size" 5 $ aws ec2 describe-volumes --volume-ids vol-0bbca736346f02aa1 --query "Volumes[0].Tags" [ { "Key": "ebs.csi.aws.com/cluster", "Value": "true" }, { "Key": "CSIVolumeName", "Value": "pvc-7c83e38c-b00a-4194-8c67-ba5c9c1118e7" }, { "Key": "kubernetes.io/created-for/pv/name", "Value": "pvc-7c83e38c-b00a-4194-8c67-ba5c9c1118e7" }, { "Key": "kubernetes.io/created-for/pvc/namespace", "Value": "harperdb" }, { "Key": "kubernetes.io/created-for/pvc/name", "Value": "harperdb" } ]

Volume permission fix

So the pvc seems good. Let's check our application status.

$ kubectl get po -n harperdb NAME READY STATUS RESTARTS AGE harperdb-79694c8b75-6ckn7 0/1 CrashLoopBackOff 4 (80s ago) 3m25s

The application was crashing, but the volume was getting mounted, and the env vars were fine too. I tried commenting out volumeMounts and volume and updated the deployment.

$ cat deploy.yaml | grep # #volumeMounts: #- name: data #mountPath: /opt/harperdb/hdb #volumes: #- name: data #persistentVolumeClaim: #claimName: harperdb $ kubectl apply -f deploy.yaml

The pod was running, and I checked the permissions of the directory where we need to mount the volume. And subsequently the id of the group.

$ kubectl exec -it deploy/harperdb -n harperdb -- bash ubuntu@harperdb-858cc7967d-5jcqm:~$ ls -l /opt/harperdb total 0 drwxr-xr-x 11 ubuntu ubuntu 155 Jan 9 06:59 hdb ubuntu@harperdb-858cc7967d-5jcqm:~$ id uid=1000(ubuntu) gid=1000(ubuntu) groups=1000(ubuntu) ubuntu@harperdb-858cc7967d-5jcqm:~$ exit

So the group id of the running user is 1000, hence we can set this as the group owner for the volume directory with the fsGroup option. If we don't specify this then the mountPath would by default be set with root(user) and root(group) as the owner for the directory and the running user ubuntu wouldn't have permissions on the mountPath to create any new files. This video has information about fsGroup.

We have to change the deployment as follows. We have added the security context with the fsGroup.

$ cat deploy.yaml --- apiVersion: apps/v1 kind: Deployment metadata: name: harperdb namespace: harperdb spec: selector: matchLabels: app: harperdb template: metadata: labels: app: harperdb spec: securityContext: fsGroup: 1000 containers: - name: harperdb image: harperdb/harperdb envFrom: - secretRef: name: harperdb volumeMounts: - name: data mountPath: /opt/harperdb/hdb volumes: - name: data persistentVolumeClaim: claimName: harperdb ...

Alternately, we could also set mountPath to just /opt/harperdb, where we wouldn't have to set the securityContext. But I thought this is a good use case to know about the fsGroup.

Update the deployment.

$ kubectl apply -f deploy.yaml

Check the workloads.

$ kubectl get all -n harperdb NAME READY STATUS RESTARTS AGE pod/harperdb-cc4f49dfc-m7d5p 1/1 Running 0 55s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/harperdb LoadBalancer 10.100.54.78 a0ba701c9c5a4463bb636551c79b4158-169592876.us-east-1.elb.amazonaws.com 8080:31819/TCP 55s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/harperdb 1/1 1 1 57s NAME DESIRED CURRENT READY AGE replicaset.apps/harperdb-cc4f49dfc 1 1 1 57s

API call

Send a CURL command to test schema creation. The endpoint is from the external IP column in the service. You may check this video to know how to obtain the curl command for harperdb.

$ HDB_API_ENDPOINT=http://a0ba701c9c5a4463bb636551c79b4158-169592876.us-east-1.elb.amazonaws.com:8080 $ curl --location --request POST ${HDB_API_ENDPOINT} \ --header 'Content-Type: application/json' \ --header 'Authorization: Basic YWRtaW46cGFzc3dvcmQxMjM0NQ==' \ --data-raw '{ "operation": "create_schema", "schema": "qa" }' {"message":"schema 'qa' successfully created"}

All good, it's working...

Persistence

Test persistence by deleting the pod.

$ kubectl delete po -n harperdb -l app=harperdb pod "harperdb-cc4f49dfc-m7d5p" deleted

This should launch a new pod.

$ kubectl get po -n harperdb NAME READY STATUS RESTARTS AGE harperdb-cc4f49dfc-c6vnc 1/1 Running 0 57s

We can try sending the same API call again.

$ curl --location --request POST ${HDB_API_ENDPOINT} \ --header 'Content-Type: application/json' \ --header 'Authorization: Basic YWRtaW46cGFzc3dvcmQxMjM0NQ==' \ --data-raw '{ "operation": "create_schema", "schema": "qa" }' {"error":"Schema 'qa' already exists"}

It's not creating a new schema, because the existing schema is restored from the attached volume. Hence, it's persistent.

Clean up

Let's do the clean up...

Delete all the objects that were created via manifests.

$ kubectl delete -f . deployment.apps "harperdb" deleted persistentvolumeclaim "harperdb" deleted secret "harperdb" deleted service "harperdb" deleted

Then delete the namespace.

$ kubectl delete ns harperdb namespace "harperdb" deleted

Delete the folder.

$ cd .. $ rm -rf harperdb

Finally delete the cluster.

$ eksctl delete cluster --name eks-cluster

That's it for the post, Thank you for reading !!!

DEV Community