DEV Community

Arseny Zinchenko
Arseny Zinchenko

Posted on • Originally published at rtfm.co.ua on

Nexus: running in Kubernetes, and setting up a PyPI caching repository

We run GitHub Runners in Kubernetes to build and deploy our Backend API, see GitHub Actions: running the Actions Runner Controller in Kubernetes.

But over time, we noticed that there was too much traffic on the NAT Gateway — see VictoriaLogs: a Grafana dashboard for AWS VPC Flow Logs — migrating from Grafana Loki.

The issue: Traffic on the AWS NAT Gateway

When we started investigation, we found an interesting detail:

Here, 40.8 gigabytes of data passed through the GW NAT in an hour , 40.7 of which was Ingress.

Out of these 40 GB, there are three Remote IPs at the top, each of which sent us almost 10 GB of traffic (the table on the bottom left of the screenshot above).

The top Remote IPs are:

Remote IP Value Percent ------------------------------ 20.60.6.4 10.6 GB 28% 20.150.90.164 9.79 GB 26% 20.60.6.100 8.30 GB 22% 185.199.111.133 2.06 GB 5% 185.199.108.133 1.89 GB 5% 185.199.110.133 1.78 GB 5% 185.199.109.133 1.40 GB 4% 140.82.114.4 805 MB 2% 146.75.28.223 705 MB 2% 54.84.248.61 267 MB 1% 
Enter fullscreen mode Exit fullscreen mode

And at the top of Kubernetes traffic, we have four Kubernetes Pods IPs:


Source IP Pod IP Value Percent ----------------------------------------------- 20.60.6.4 => 10.0.43.98 1.54 GB 14% 20.60.6.100 => 10.0.43.98 1.49 GB 14% 20.60.6.100 => 10.0.42.194 1.09 GB 10% 20.150.90.164 => 10.0.44.162 1.08 GB 10% 20.60.6.4 => 10.0.44.208 1.03 GB 9% 
Enter fullscreen mode Exit fullscreen mode

And all of these IPs belongs to GitHub Runners Pods, and the “kraken” in the name is just those runners for builds and deploys of our kraken project, the Backend:

The next step is even more interesting: if you check the IP https://20.60.6.4, you will see a hostname:

*.blob.core.windows.net???

What? I was very surprised, because we build a Python app, and there are no libraries from Microsoft. But then I had an idea: since we use PiP and Docker caching in GitHub Actions for the Backend API builds, it’s most likely GitHub storage, and it’s from there that we pull these caches to Kubernetes (it is, see the Communication requirements for GitHub-hosted runners and GitHub).

A similar check for the 185.199.111.133 and 140.82.114.4 shows us *.github.io, and the54.84.248.61 is for the athena.us-east-1.amazonaws.com.

So, what we decided to do was to run a local caching in Kubernetes with the Sonatype Nexus, and use it as a proxy for PyPi.org and for Docker Hub images.

We’ll talk about Docker caching next time, but for now, we will:

  • test Nexus locally with Docker on a work machine
  • run Nexus in Kubernetes from a Helm-chart
  • configure and test the PyPI cache for builds
  • and see the results

Nexus: testing locally with Docker

Run Nexus:

$ docker run -ti --rm --name nexus -p 8081:8081 sonatype/nexus3 
Enter fullscreen mode Exit fullscreen mode

Wait a few minutes, because Nexus is Java-based, so it takes a long time to start.

Get the admin password:

$ docker exec -ti nexus cat /nexus-data/admin.password 6221ad20-0196-4771-b1c7-43df355c2245 
Enter fullscreen mode Exit fullscreen mode

In a browser, go to the http://localhost:8081, and log in:

If you haven’t done this in the Setup wizard, then go to Security > Anonymous access, and allow connections without authentication:

Adding a pypi (proxy) repository

Go to Settings > Repositories, click Create repository:

Select the pypi (proxy) type:

Create a repository:

  • Name: pypi-proxy
  • Remote storage: https://pypi.org
  • Blob store: default

At the bottom, click the Create repository.

Let’s check what data we have now in the default Blob storage - go to the Nexus container:

$ docker exec -ti nexus bash bash-4.4$ 
Enter fullscreen mode Exit fullscreen mode

And look at the /nexus-data/blobs/default/content/ directory - now it's empty:

bash-4.4$ ls -l /nexus-data/blobs/default/content/ total 8 drwxr-xr-x 3 nexus nexus 4096 Nov 27 11:02 directpath drwxr-xr-x 2 nexus nexus 4096 Nov 27 11:02 tmp 
Enter fullscreen mode Exit fullscreen mode

Testing the Nexus PyPI cache

Now let’s check if our proxy cache is working.

Find the IP of the container from the Nexus:

$ docker inspect nexus | jq '.[].NetworkSettings.IPAddress' "172.17.0.2" 
Enter fullscreen mode Exit fullscreen mode

Run another container with Python:

$ docker run -ti --rm python bash root@addeba5d307c:/# 
Enter fullscreen mode Exit fullscreen mode

And execute pip install --index-url http://172.17.0.2:8081/repository/pypi-proxy/simple setuptools --trusted-host 172.17.0.2

root@addeba5d307c:/# time pip install --index-url http://172.17.0.2:8081/repository/pypi-proxy/simple setuptools --trusted-host 172.17.0.2 Looking in indexes: http://172.17.0.2:8081/repository/pypi-proxy/simple Collecting setuptools Downloading http://172.17.0.2:8081/repository/pypi-proxy/packages/setuptools/75.6.0/setuptools-75.6.0-py3-none-any.whl (1.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 81.7 MB/s eta 0:00:00 Installing collected packages: setuptools Successfully installed setuptools-75.6.0 ... real 0m2.595s ... 
Enter fullscreen mode Exit fullscreen mode

We can see that the Downloading process was completed, and it took 2.59 seconds.

Let’s see what the default Blob storage in Nexus is now:

bash-4.4$ ls -l /nexus-data/blobs/default/content/ total 20 drwxr-xr-x 3 nexus nexus 4096 Nov 27 11:02 directpath drwxr-xr-x 2 nexus nexus 4096 Nov 27 11:21 tmp drwxr-xr-x 3 nexus nexus 4096 Nov 27 11:21 vol-05 drwxr-xr-x 3 nexus nexus 4096 Nov 27 11:21 vol-19 drwxr-xr-x 3 nexus nexus 4096 Nov 27 11:21 vol-33 
Enter fullscreen mode Exit fullscreen mode

We have some data there now, okay.

Let’s test with the pip again - first, let's uninstall the installed package:

root@addeba5d307c:/# pip uninstall setuptools 
Enter fullscreen mode Exit fullscreen mode

And install it again, but now add the --no-cache-dir to avoid using the local cache in the container:

root@5dc925fe254f:/# time pip install --no-cache-dir --index-url http://172.17.0.2:8081/repository/pypi-proxy/simple setuptools --trusted-host 172.17.0.2 Looking in indexes: http://172.17.0.2:8081/repository/pypi-proxy/simple Collecting setuptools Downloading http://172.17.0.2:8081/repository/pypi-proxy/packages/setuptools/75.6.0/setuptools-75.6.0-py3-none-any.whl (1.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 942.9 MB/s eta 0:00:00 Installing collected packages: setuptools Successfully installed setuptools-75.6.0 ... real 0m1.589s 
Enter fullscreen mode Exit fullscreen mode

Now it took 1.52 seconds instead of 2.59.

Good, looks like everything works.

Let’s run Nexus on Kubernetes.

Running Nexus in Kubernetes

There is a chart called stevehipwell/nexus3.

You can write manifestos yourself, or you can try this chart.

What might be interesting to us from the chart’s values:

  • config.anonymous.enabled: Nexus will work locally in Kubernetes with access only via Cluster IP, so while it is in PoC and purely for PyPI cache - we can do it without authentication
  • config.blobStores: you can leave it as it is for now, but later can connect a dedicated EBS or AWS Elastic File System, see also persistence.enabled
  • config.job.tolerations and nodeSelector: if you need to tweak on a separate node, see Kubernetes: Pods and WorkerNodes to control the placement of pods on nodes
  • config.repos: create repositories directly through values
  • ingress.enabled: not our case, but it is possible
  • metrics.enabled: later we can look at the monitoring

First, let’s set it up with the default parameters, then we’ll add our own values.

Add a repository:

$ helm repo add stevehipwell https://stevehipwell.github.io/helm-charts/ "stevehipwell" has been added to your repositories 
Enter fullscreen mode Exit fullscreen mode

Create a separate namespace ops-nexus-ns:

$ kk create ns ops-nexus-ns namespace/ops-nexus-ns created 
Enter fullscreen mode Exit fullscreen mode

Install the chart:

$ helm -n ops-nexus-ns upgrade --install nexus3 stevehipwell/nexus3 
Enter fullscreen mode Exit fullscreen mode

It took about 5 minutes to launch, and I was thinking about dropping the chart and writing it myself, but eventually, it started. Well, Java — what can we do?

Let’s check what do we have here:

$ kk -n ops-nexus-ns get all NAME READY STATUS RESTARTS AGE pod/nexus3-0 4/4 Running 0 6m5s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/nexus3 ClusterIP 172.20.160.147 <none> 8081/TCP 6m5s service/nexus3-hl ClusterIP None <none> 8081/TCP 6m5s NAME READY AGE statefulset.apps/nexus3 1/1 6m6s 
Enter fullscreen mode Exit fullscreen mode

Add an Admin user password

Create a Kubernetes Secret with a password:

$ kk -n ops-nexus-ns create secret generic nexus-root-pass --from-literal=password=p@ssw0rd secret/nexus-root-pass created 
Enter fullscreen mode Exit fullscreen mode

Write a nexus-values.yaml file, in which we set the name of the Kubernetes Secret and the key with the password, and enable Anonymous Access:

rootPassword: secret: nexus-root-password key: password config: enabled: true anonymous: enabled: true 
Enter fullscreen mode Exit fullscreen mode

Adding a repository to Nexus via Helm chart values

I had to do some work using the “tick method”, but it worked.

Also, in values.yaml the chart says:”Repository configuration; based on the REST API (API reference docs require an existing Nexus installation and can be found at **Administration** under _System_ → _API_) but with format & type defined in the object.”.

Let’s see the Nexus API specification — what fields are passed to API request:

What about the format?

We can look at the Format and Type fields in some existing repository:

Describe the repository and other necessary parameters — for me, it looks like this:

rootPassword: secret: nexus-root-password key: password persistence: enabled: true storageClass: gp2-retain resources: requests: cpu: 1000m memory: 1500Mi config: enabled: true anonymous: enabled: true repos: - name: pip-cache format: pypi type: proxy online: true negativeCache: enabled: true timeToLive: 1440 proxy: remoteUrl: https://pypi.org metadataMaxAge: 1440 contentMaxAge: 1440 httpClient: blocked: false autoBlock: true connection: retries: 0 useTrustStore: false storage: blobStoreName: default strictContentTypeValidation: false 
Enter fullscreen mode Exit fullscreen mode

It’s a pretty simple setup, and I’ll do some tuning later if necessary. But it’s already working.

Let’s deploy it:

$ helm -n ops-nexus-ns upgrade --install nexus3 stevehipwell/nexus3 -f nexus-values.yml 
Enter fullscreen mode Exit fullscreen mode

In case of errors of the type “ Could not create repository ”:

$ kk -n ops-nexus-ns logs -f nexus3-config-9-2cssf Configuring Nexus3... Configuring anonymous access... Anonymous access configured. Configuring blob stores... Configuring scripts... Script 'cleanup' updated. Script 'task' updated. Configuring cleanup policies... Configuring repositories... ERROR: Could not create repository 'pip-cache'. 
Enter fullscreen mode Exit fullscreen mode

Check the logs — Nexus wants to transfer almost all fields, in this case, the config.repos.httpClient.contentMaxAge was missing:

nexus3-0:nexus3 2024-11-27 12:34:16,818+0000 WARN [qtp554755438-84] admin org.sonatype.nexus.siesta.internal.resteasy.ResteasyViolationExceptionMapper - (ID af473d22-3eca-49ea-adb9-c7985add27e7) Response: [400] '[ValidationErrorXO{id='PARAMETER strictContentTypeValidation', message='must not be null'}, ValidationErrorXO{id='PARAMETER negativeCache', message='must not be null'}, ValidationErrorXO{id='PARAMETER metadataMaxAge', message='must not be null'}, ValidationErrorXO{id='PARAMETER contentMaxAge'[]ust not be null]arg0.httpClient]ntMaxAge]]TypeValidation]TER httpClient', message='must not be null'}]'; mapped from: [PARAMETER] 
Enter fullscreen mode Exit fullscreen mode

During deployment, when we set the config.enabled=true parameter, the chart launches another Kubernetes Pod, which actually performs the Nexus configuration.

Let’s check the access and the repository — open a local port:

$ kk -n ops-nexus-ns port-forward pod/nexus3-0 8082:8081 Forwarding from 127.0.0.1:8082 -> 8081 Forwarding from [::1]:8082 -> 8081 
Enter fullscreen mode Exit fullscreen mode

And go to the http://localhost:8082/#admin/repository/repositories:

The Nexus needs a lot of resources, especially memory, because again, it’s Java:

Therefore, it makes sense to set requests in values.

Also, you can set JVM params:

... # Environment: # INSTALL4J_ADD_VM_PARAMS: -Djava.util.prefs.userRoot=${NEXUS_DATA}/javaprefs -Xms1024m -Xmx1024m -XX:MaxDirectMemorySize=2048m ... 
Enter fullscreen mode Exit fullscreen mode

Testing Nexus in Kubernetes

Launch a Pod with Python:

$ kk run pod --rm -i --tty --image python bash If you don't see a command prompt, try pressing enter. root@pod:/# 
Enter fullscreen mode Exit fullscreen mode

Find a Kubernetes Service for Nexus:

$ kk -n ops-nexus-ns get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE nexus3 ClusterIP 172.20.160.147 <none> 8081/TCP 78m nexus3-hl ClusterIP None <none> 8081/TCP 78m 
Enter fullscreen mode Exit fullscreen mode

Run pip install again:

root@pod:/# time pip install --index-url http://nexus3.ops-nexus-ns.svc:8081/repository/pip-cache/simple setuptools --trusted-host nexus3.ops-nexus-ns.svc Looking in indexes: http://nexus3.ops-nexus-ns.svc:8081/repository/pip-cache/simple Collecting setuptools Downloading http://nexus3.ops-nexus-ns.svc:8081/repository/pip-cache/packages/setuptools/75.6.0/setuptools-75.6.0-py3-none-any.whl (1.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 86.3 MB/s eta 0:00:00 Installing collected packages: setuptools Successfully installed setuptools-75.6.0 ... real 0m3.958s 
Enter fullscreen mode Exit fullscreen mode

It installed setuptools-75.6.0 in 3.95 seconds.

Let’s check in the http://localhost:8082/#browse/browse:pip-cache:

Remove setuptools from our Python Pod:

root@pod:/# pip uninstall setuptools 
Enter fullscreen mode Exit fullscreen mode

And install it again, again with the --no-cache-dir:

root@pod:/# time pip install --no-cache-dir --index-url http://nexus3.ops-nexus-ns.svc:8081/repository/pip-cache/simple setuptools --trusted-host nexus3.ops-nexus-ns.svc Looking in indexes: http://nexus3.ops-nexus-ns.svc:8081/repository/pip-cache/simple Collecting setuptools Downloading http://nexus3.ops-nexus-ns.svc:8081/repository/pip-cache/packages/setuptools/75.6.0/setuptools-75.6.0-py3-none-any.whl (1.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 875.9 MB/s eta 0:00:00 Installing collected packages: setuptools Successfully installed setuptools-75.6.0 .. real 0m2.364s 
Enter fullscreen mode Exit fullscreen mode

Now it took 2.364s.

The only thing left to do is to update GitHub Workflows — disable all caches there, and add the use of Nexus.

GitHub та результати по AWS NAT Gateway трафіку

I won’t go into detail about GitHub Actions Workflow, because it’s different for everyone, but in short, I’ve disabled the PiP caching:

... - name: "Setup: Python 3.10" uses: actions/setup-python@v5 with: python-version: "3.10" # cache: 'pip' check-latest: "false" # cache-dependency-path: "**/*requirements.txt" ... 
Enter fullscreen mode Exit fullscreen mode

This will save about 540 megabytes on downloading the archive with the cache for each Job run.

Next, we have a step that executes the pip install by calling make:

... - name: "Setup: Dev Dependencies" id: setup_dev_dependencies #run: make dev-python-requirements run: make dev-python-requirements-nexus shell: bash ... 
Enter fullscreen mode Exit fullscreen mode

And in the Makefile, I created a new task so that I could quickly revert to the old configuration:

... dev-python-requirements: python3 -m pip install --no-compile -r dev-requirements.txt dev-python-requirements-nexus: python3 -m pip install --index-url http://nexus3.ops-nexus-ns.svc:8081/repository/pip-cache/simple --no-compile -r dev-requirements.txt --trusted-host nexus3.ops-nexus-ns.svc ... 
Enter fullscreen mode Exit fullscreen mode

Then, in the Workflow, disable any caches like actions/cache:

.. # - name: "Setup: Get cached api-generator images" # id: api-generator-cache # uses: actions/cache@v4 # with: # path: ~/_work/api-generator-cache # key: api-generator-cache ... 
Enter fullscreen mode Exit fullscreen mode

Let’s compare the results.

The build with the old configuration, without Nexus and with GitHub caches — the traffic of the Kubernetes Pod GitHub Runner that this build was running:

3.55 gigabytes of traffic, the build and deployment took 4 minutes and 11 seconds.

And the same GitHub Actions Job, but with the changes merged — using Nexus, and without GitHub caching.

We can see in the logs that the packages are indeed taken from Nexus:

And traffic used:

329 megabytes , the build and deployment took 4 minutes and 20 seconds.

And that’s it for now.

What will be done next is to see how Nexus can be monitored, what metrics it has, and which ones can be used to make alerts, and then add more Docker cache, because we often run into Docker Hub limits — “ 429 Too Many Requests — Server message: toomanyrequests: You have reached your pull rate limit. You can increase the limit by authenticating and upgrading ”.

Originally published at RTFM: Linux, DevOps, and system administration.


Top comments (0)