Posted on May 7, 2024

Container size analysis: TensorFlow 2.8 base image vs Deep Learning

TLDR, building our DeepCell container from a base TensorFlow image is 50% faster to load and 60% smaller than using the Deep Learning container.

	Deep Learning image	Base TF image	Reduction
Uncompressed	19.5 GB	7.2 GB	63%
Compressed	8.4 GB	3.2 GB	62%
Batch job load time	6 min	3 min	50%

This post covers how we rebuilt our container on the smaller base image; and why the Deep Learning container is so big to begin with. The long and short of it is that you pay a steep price to have so many development tools available, and you typically don't need those for production tasks.

Optimizing our container

Our DeepCell journey began on Vertex AI. Google provides pre-built TensorFlow images as part of their Deep Learning Container Images.

These containers purport to let you:

Quickly prototype with a portable and consistent environment for developing, testing, and deploying your AI applications with Deep Learning Containers. These Docker images use popular frameworks and are performance optimized, compatibility tested, and ready to deploy.

Cool beans. Our DeepCell version uses TF2.8 so we picked this image from Google's list: us-docker.pkg.dev/deeplearning-platform-release/gcr.io/tf2-gpu.2-8.py37

It runs Python 3.7 which fortunately is still supported by DeepCell. (I've had mixed experiences with python version support across bioinformatics tools)

Our initial container build was simple:

 FROM us-docker.pkg.dev/deeplearning-platform-release/gcr.io/tf2-gpu.2-8.py37 ADD https://api.github.com/repos/dchaley/deepcell-imaging/git/refs/heads/main version.json RUN git clone https://github.com/dchaley/deepcell-imaging.git WORKDIR "/deepcell-imaging" RUN pip install --user --upgrade --quiet -r requirements.txt ENTRYPOINT ["python", "benchmarking/deepcell-e2e/benchmark.py"]

Our requirements file is pretty simple. We verified in the build logs that it didn't reinstall TensorFlow; note that the packages to install do not include TF:

 Requirement already satisfied: tensorflow~=2.8.0 in /opt/conda/lib/python3.7/site-packages (from deepcell==0.12.9->-r requirements.txt (line 1)) (2.8.4) ... Installing collected packages: tensorflow-addons, snakeviz, smart_open, qtpy, opencv-python-headless, lxml, jupyter-core, iniconfig, imagecodecs, cython, pytest, google-api-core, deepcell-toolbox, qtconsole, jupyter-console, deepcell-tracking, google-cloud-notebooks, google-cloud-bigquery, spektral, google-cloud-aiplatform, jupyter, deepcell

This resulted in a whopping ~20 GB container 😩

The compressed artifact size was ~8.5 GB: this is the amount of data that must be transmitted before unpacking.

The impact of all this? A six minute start time for Google Batch jobs, as defined from starting the container download …

 2024-04-30 14:56:20.896 PDT gce: Pulling from deepcell-on-batch/deepcell-benchmarking-us-central1/benchmarking

… until executing the container:

 2024-04-30 15:02:23.233 PDT Executing runnable container:

I wasn't thrilled with a six-minute minimum feedback cycle 😤 We tried image streaming to reduce startup time but alas, the container was so large it couldn't run without provisioning additional boot disk space.

We figured we must be able to build a container from a slimmer TensorFlow base image. We knew the DeepCell team had done some work scaling DeepCell using Kubernetes on GKE. Their Dockerfile confirmed that; just use TF's image.

We switched our base to TF's, grabbed the apt maintenance work they did, and updated our Dockerfile [diff].

The result; 7.2 GB uncompressed and 3.2 GB compressed. And ~3min time from starting to fetch the container to beginning to execute it.

	Deep Learning image	Base TF image	Reduction
Uncompressed	19.5 GB	7.2 GB	63%
Compressed	8.4 GB	3.2 GB	62%
Batch job load time	6 min	3 min	50%

That's better 😎 But I couldn't help but wonder … why?

Container size analysis

Let's deep dive on what's on the containers. The containers are too large to open in Cloud Shell 🫠 so we'll do it the old fashioned way on local.

Let's use ncdu to explore the file system.

Deep Learning

This container was built from the Deep Learning base. Let's boot it up & install ncdu.

 $ docker run -it --entrypoint bash us-central1-docker.pkg.dev/deepcell-on-batch/deepcell-benchmarking-us-central1/benchmarking@sha256:8cc9b89e5869a4d468d64810b2ae47e242cc106519b2b8d7c4a9daa07856bdde root@55a486270459:/deepcell-imaging# apt update && apt install ncdu

Begin scanning the root directory:

 root@55a486270459:/deepcell-imaging# ncdu /

It scans pretty quickly. Here's the summary:

So far this just tells us we have a lot in usr and opt (common places to install libraries). Let's start with usr.

 6.6 GiB [ 53.9%] /lib 4.9 GiB [ 39.6%] /local 363.3 MiB [ 2.9%] /share 276.8 MiB [ 2.2%] /bin 144.1 MiB [ 1.1%] /src

A bit odd to have stuff in both lib and local; but let's see. lib is mostly CUDA Deep Neural Network:

 --- /usr/lib ------------------------- /.. 5.5 GiB [ 83.2%] /x86_64-linux-gnu 938.5 MiB [ 13.8%] /google-cloud-sdk --- /usr/lib/x86_64-linux-gnu ---------------------------- /.. 1.4 GiB [ 24.5%] libcudnn_static.a 956.8 MiB [ 16.9%] libnvinfer_builder_resource.so.8.6.1 839.4 MiB [ 14.8%] libcudnn_cnn_infer_static.a 675.1 MiB [ 11.9%] libcudnn_cnn_infer.so.8.2.0 271.8 MiB [ 4.8%] libcudnn_ops_infer.so.8.2.0 227.3 MiB [ 4.0%] libcudnn_cnn_train_static.a 225.5 MiB [ 4.0%] libnvinfer.so.8.6.1

Static libraries are used to compile from source. We aren't doing that. Maybe we need the dynamic libraries for inference, I'm not sure. But the static libraries here are over 2.5 GB…

Surprising also to see a gig in the cloud sdk… it looks like the sdk ships its own Python distro and some other stuff.

 --- /usr/lib/google-cloud-sdk -- /.. 382.3 MiB [ 40.7%] /lib 296.7 MiB [ 31.6%] /platform 169.5 MiB [ 18.1%] /bin

As for /usr/local:

 --- /usr/local ---------------- /.. 3.4 GiB [ 70.0%] /cuda-11.3 850.0 MiB [ 17.0%] /share 603.9 MiB [ 12.1%] /cuda-12.2

Well… do we actually need 2 versions of CUDA? (Why is 12.2 so much smaller?) About half of the 11.3 version is static libraries again.

So far we're at ~4 GB of CUDA-related static libraries (which we don't need).

How about that /usr/local/share directory…

 --- /usr/local/share/.cache -- /.. 850.0 MiB [100.0%] /yarn

A gig of yarn package caches 😑 ~5 GB of stuff we don't need.

Alright, bouncing back to /opt (the other big directory, with 6 GB):

 --- /opt -------------------- /.. 4.8 GiB [ 79.1%] /conda 1.3 GiB [ 20.9%] /nvidia

Conda is a python distribution, let's check out what's in nvidia:

 --- /opt/nvidia -------------------- /.. 1.3 GiB [100.0%] /nsight-compute --- /opt/nvidia/nsight-compute ----- /.. 651.5 MiB [ 50.0%] /2021.1.1 651.3 MiB [ 50.0%] /2021.1.0

So we have half a gig on an old version. What is nsight anyhow?

NVIDIA Nsight™ Systems is a system-wide performance analysis tool

Well we don't need that … so, we're at ~6 GB stuff we don't need. Let's go back to /opt/conda (~5 GB); as expected most of the stuff is in packages & libraries:

 --- /opt/conda ----------- /.. 4.5 GiB [ 94.0%] /pkgs 3.2 GiB [ 67.0%] /lib

Most of the 4.5 GB of pkgs is in something called dlenv-tf-2-8-gpu-1.0.20230926-py37hab20f5e_0 which in turn is ~3 GB of libraries.

 --- /opt/conda/pkgs/dlenv-tf-2-8-gpu-1.0.20230926-py37hab20f5e_0 ----- /.. 2.9 GiB [ 81.3%] /lib 623.1 MiB [ 17.3%] /share

The libraries are Python 3.7 site-packages, mostly Tensorflow (1 GB), and a bunch of small Python libraries. We presumably need this stuff!

 --- /opt/conda/pkgs/dlenv-tf-2-8...e_0/lib/python3.7/site-packages --- /.. 1.1 GiB [ 39.5%] /tensorflow 282.5 MiB [ 9.7%] /ray 116.9 MiB [ 4.0%] /pyarrow 98.2 MiB [ 3.4%] /llvmlite 84.3 MiB [ 2.9%] /scipy 83.9 MiB [ 2.9%] /sklearn 78.8 MiB [ 2.7%] /plotly 69.5 MiB [ 2.4%] /tensorflow_io 58.6 MiB [ 2.0%] /clang 50.3 MiB [ 1.7%] /apache_beam 46.6 MiB [ 1.6%] /google

How about share ?

 --- /opt/conda/pkgs/dlenv-tf-2-8...0.20230926-py37hab20f5e_0/share --- /.. 621.3 MiB [ 99.7%] /jupyter --- /opt/conda/pkgs/dlenv-tf-2-8...f5e_0/share/jupyter/lab/staging --- /.. 480.7 MiB [ 88.5%] /node_modules 57.1 MiB [ 10.5%] /build

Half a gig for Jupyter's JS dependencies & build files. So, ~6.5 unused stuff.

How about the lib sibling to pkgs (3.2 GB) ? Almost all of it is … another Python distribution?

 --- /opt/conda/lib/python3.7 ------ /.. 2.9 GiB [ 98.5%] /site-packages --- /opt/conda/lib/python3.7/site-packages --- /.. 1.1 GiB [ 38.3%] /tensorflow 282.5 MiB [ 9.4%] /ray 117.0 MiB [ 3.9%] /pyarrow 98.2 MiB [ 3.3%] /llvmlite 84.3 MiB [ 2.8%] /scipy 83.9 MiB [ 2.8%] /sklearn 78.8 MiB [ 2.6%] /plotly 69.5 MiB [ 2.3%] /tensorflow_io 58.6 MiB [ 1.9%] /clang 50.8 MiB [ 1.7%] /google 50.3 MiB [ 1.7%] /apache_beam

These appear to be the same packages as the dlenv-etc folder… ~3 GB of duplication, bringing our unused total to ~9.5 GB.

Since that's nearly all of our ~12 GB difference I stopped here.

Container size analysis: TensorFlow base

Let's do a quick scan of the container built off the base TensorFlow image.

Let's open up the container. Ooh, fancy...

 root@2317ea736b48:/deepcell-imaging# apt update && apt install ncdu root@2317ea736b48:/deepcell-imaging# ncdu /

This time most of the contents are in usr and root

 --- / -------------------- 5.5 GiB [ 79.7%] /usr 1.3 GiB [ 19.0%] /root

Most of root is Python 3.8 libraries, which is a lot of small libraries:

 --- /root/.local/lib/python3.8 ---- /.. 1.0 GiB [100.0%] /site-packages --- /root/.local/lib/python3.8/site-packages ---- /.. 85.5 MiB [ 8.5%] /scipy 83.6 MiB [ 8.3%] /google 74.5 MiB [ 7.4%] /imagecodecs 72.3 MiB [ 7.2%] /cv2 62.1 MiB [ 6.1%] /opencv_python_headless.libs 61.9 MiB [ 6.1%] /pandas 45.7 MiB [ 4.5%] /sklearn

whereas /usr looks like this:

 --- /usr ------------------ /.. 3.1 GiB [ 57.2%] /local 2.2 GiB [ 40.1%] /lib

Almost all of lib is CUDA DNN:

 --- /usr/lib/x86_64-linux-gnu ------------------- /.. 757.3 MiB [ 36.9%] libcudnn_cnn_infer.so.8.1.0 442.8 MiB [ 21.6%] libnvinfer.so.7.2.2 267.4 MiB [ 13.0%] libcudnn_ops_infer.so.8.1.0

whereas local is split across more CUDA + python files:

 --- /usr/local ---------------- /.. 1.7 GiB [ 55.4%] /cuda-11.2 1.4 GiB [ 43.7%] /lib --- /usr/local/cuda-11.2/targets/x86_64-linux/lib --- /.. 382.7 MiB [ 25.4%] libcusolver.so.11.1.0.152 219.6 MiB [ 14.6%] libcusparse.so.11.4.1.1152 186.6 MiB [ 12.4%] libcusolverMg.so.11.1.0.152 181.3 MiB [ 12.0%] libcufft.so.10.4.1.152 176.7 MiB [ 11.7%] libcublasLt.so.11.4.1.1043 --- /usr/local/lib/python3.8/dist-packages --- /.. 1.1 GiB [ 84.0%] /tensorflow

It looks like the CUDA DNN files in /usr/lib are different from the CUDA files in /usr/local.

Conclusions

The Deep Learning container seems better suited for:

compiling tools from source
training, not just predicting
using notebooks for iterative development
overall development tasks

The TensorFlow base image seems better suited for:

running the specific thing you want to run once you've figured out how to run it.

Future work?

Google has optimized container images for VertexAI. We'd use: us-docker.pkg.dev/vertex-ai-restricted/prediction/tf_opt-gpu.2-8:latest

I get the sense from the docs these only work on Vertex AI & need you to train the model on Vertex AI as well:

The optimization occurs when Vertex AI uploads a model, before it runs.

At some point it may be worth investigating the cost of predicting via Vertex AI online models, vs, predicting with an open-source container on Batch. But, if the container is so large again because of training code, we may lose whatever benefits we gained…