TLDR, building our DeepCell container from a base TensorFlow image is 50% faster to load and 60% smaller than using the Deep Learning container.
Deep Learning image | Base TF image | Reduction | |
---|---|---|---|
Uncompressed | 19.5 GB | 7.2 GB | 63% |
Compressed | 8.4 GB | 3.2 GB | 62% |
Batch job load time | 6 min | 3 min | 50% |
This post covers how we rebuilt our container on the smaller base image; and why the Deep Learning container is so big to begin with. The long and short of it is that you pay a steep price to have so many development tools available, and you typically don't need those for production tasks.
Optimizing our container
Our DeepCell journey began on Vertex AI. Google provides pre-built TensorFlow images as part of their Deep Learning Container Images.
These containers purport to let you:
Quickly prototype with a portable and consistent environment for developing, testing, and deploying your AI applications with Deep Learning Containers. These Docker images use popular frameworks and are performance optimized, compatibility tested, and ready to deploy.
Cool beans. Our DeepCell version uses TF2.8 so we picked this image from Google's list: us-docker.pkg.dev/deeplearning-platform-release/gcr.io/tf2-gpu.2-8.py37
It runs Python 3.7 which fortunately is still supported by DeepCell. (I've had mixed experiences with python version support across bioinformatics tools)
Our initial container build was simple:
FROM us-docker.pkg.dev/deeplearning-platform-release/gcr.io/tf2-gpu.2-8.py37 ADD https://api.github.com/repos/dchaley/deepcell-imaging/git/refs/heads/main version.json RUN git clone https://github.com/dchaley/deepcell-imaging.git WORKDIR "/deepcell-imaging" RUN pip install --user --upgrade --quiet -r requirements.txt ENTRYPOINT ["python", "benchmarking/deepcell-e2e/benchmark.py"]
Our requirements file is pretty simple. We verified in the build logs that it didn't reinstall TensorFlow; note that the packages to install do not include TF:
Requirement already satisfied: tensorflow~=2.8.0 in /opt/conda/lib/python3.7/site-packages (from deepcell==0.12.9->-r requirements.txt (line 1)) (2.8.4) ... Installing collected packages: tensorflow-addons, snakeviz, smart_open, qtpy, opencv-python-headless, lxml, jupyter-core, iniconfig, imagecodecs, cython, pytest, google-api-core, deepcell-toolbox, qtconsole, jupyter-console, deepcell-tracking, google-cloud-notebooks, google-cloud-bigquery, spektral, google-cloud-aiplatform, jupyter, deepcell
This resulted in a whopping ~20 GB container 😩
The compressed artifact size was ~8.5 GB: this is the amount of data that must be transmitted before unpacking.
The impact of all this? A six minute start time for Google Batch jobs, as defined from starting the container download …
2024-04-30 14:56:20.896 PDT gce: Pulling from deepcell-on-batch/deepcell-benchmarking-us-central1/benchmarking
… until executing the container:
2024-04-30 15:02:23.233 PDT Executing runnable container:
I wasn't thrilled with a six-minute minimum feedback cycle 😤 We tried image streaming to reduce startup time but alas, the container was so large it couldn't run without provisioning additional boot disk space.
We figured we must be able to build a container from a slimmer TensorFlow base image. We knew the DeepCell team had done some work scaling DeepCell using Kubernetes on GKE. Their Dockerfile confirmed that; just use TF's image.
We switched our base to TF's, grabbed the apt
maintenance work they did, and updated our Dockerfile [diff].
The result; 7.2 GB uncompressed and 3.2 GB compressed. And ~3min time from starting to fetch the container to beginning to execute it.
Deep Learning image | Base TF image | Reduction | |
---|---|---|---|
Uncompressed | 19.5 GB | 7.2 GB | 63% |
Compressed | 8.4 GB | 3.2 GB | 62% |
Batch job load time | 6 min | 3 min | 50% |
That's better 😎 But I couldn't help but wonder … why?
Container size analysis
Let's deep dive on what's on the containers. The containers are too large to open in Cloud Shell 🫠 so we'll do it the old fashioned way on local.
Let's use ncdu
to explore the file system.
Deep Learning
This container was built from the Deep Learning base. Let's boot it up & install ncdu
.
$ docker run -it --entrypoint bash us-central1-docker.pkg.dev/deepcell-on-batch/deepcell-benchmarking-us-central1/benchmarking@sha256:8cc9b89e5869a4d468d64810b2ae47e242cc106519b2b8d7c4a9daa07856bdde root@55a486270459:/deepcell-imaging# apt update && apt install ncdu
Begin scanning the root directory:
root@55a486270459:/deepcell-imaging# ncdu /
It scans pretty quickly. Here's the summary:
So far this just tells us we have a lot in usr
and opt
(common places to install libraries). Let's start with usr
.
6.6 GiB [ 53.9%] /lib 4.9 GiB [ 39.6%] /local 363.3 MiB [ 2.9%] /share 276.8 MiB [ 2.2%] /bin 144.1 MiB [ 1.1%] /src
A bit odd to have stuff in both lib
and local
; but let's see. lib
is mostly CUDA Deep Neural Network:
--- /usr/lib ------------------------- /.. 5.5 GiB [ 83.2%] /x86_64-linux-gnu 938.5 MiB [ 13.8%] /google-cloud-sdk --- /usr/lib/x86_64-linux-gnu ---------------------------- /.. 1.4 GiB [ 24.5%] libcudnn_static.a 956.8 MiB [ 16.9%] libnvinfer_builder_resource.so.8.6.1 839.4 MiB [ 14.8%] libcudnn_cnn_infer_static.a 675.1 MiB [ 11.9%] libcudnn_cnn_infer.so.8.2.0 271.8 MiB [ 4.8%] libcudnn_ops_infer.so.8.2.0 227.3 MiB [ 4.0%] libcudnn_cnn_train_static.a 225.5 MiB [ 4.0%] libnvinfer.so.8.6.1
Static libraries are used to compile from source. We aren't doing that. Maybe we need the dynamic libraries for inference, I'm not sure. But the static libraries here are over 2.5 GB…
Surprising also to see a gig in the cloud sdk… it looks like the sdk ships its own Python distro and some other stuff.
--- /usr/lib/google-cloud-sdk -- /.. 382.3 MiB [ 40.7%] /lib 296.7 MiB [ 31.6%] /platform 169.5 MiB [ 18.1%] /bin
As for /usr/local
:
--- /usr/local ---------------- /.. 3.4 GiB [ 70.0%] /cuda-11.3 850.0 MiB [ 17.0%] /share 603.9 MiB [ 12.1%] /cuda-12.2
Well… do we actually need 2 versions of CUDA? (Why is 12.2 so much smaller?) About half of the 11.3 version is static libraries again.
So far we're at ~4 GB of CUDA-related static libraries (which we don't need).
How about that /usr/local/share
directory…
--- /usr/local/share/.cache -- /.. 850.0 MiB [100.0%] /yarn
A gig of yarn package caches 😑 ~5 GB of stuff we don't need.
Alright, bouncing back to /opt
(the other big directory, with 6 GB):
--- /opt -------------------- /.. 4.8 GiB [ 79.1%] /conda 1.3 GiB [ 20.9%] /nvidia
Conda is a python distribution, let's check out what's in nvidia:
--- /opt/nvidia -------------------- /.. 1.3 GiB [100.0%] /nsight-compute --- /opt/nvidia/nsight-compute ----- /.. 651.5 MiB [ 50.0%] /2021.1.1 651.3 MiB [ 50.0%] /2021.1.0
So we have half a gig on an old version. What is nsight anyhow?
NVIDIA Nsight™ Systems is a system-wide performance analysis tool
Well we don't need that … so, we're at ~6 GB stuff we don't need. Let's go back to /opt/conda
(~5 GB); as expected most of the stuff is in packages & libraries:
--- /opt/conda ----------- /.. 4.5 GiB [ 94.0%] /pkgs 3.2 GiB [ 67.0%] /lib
Most of the 4.5 GB of pkgs
is in something called dlenv-tf-2-8-gpu-1.0.20230926-py37hab20f5e_0
which in turn is ~3 GB of libraries.
--- /opt/conda/pkgs/dlenv-tf-2-8-gpu-1.0.20230926-py37hab20f5e_0 ----- /.. 2.9 GiB [ 81.3%] /lib 623.1 MiB [ 17.3%] /share
The libraries are Python 3.7 site-packages
, mostly Tensorflow (1 GB), and a bunch of small Python libraries. We presumably need this stuff!
--- /opt/conda/pkgs/dlenv-tf-2-8...e_0/lib/python3.7/site-packages --- /.. 1.1 GiB [ 39.5%] /tensorflow 282.5 MiB [ 9.7%] /ray 116.9 MiB [ 4.0%] /pyarrow 98.2 MiB [ 3.4%] /llvmlite 84.3 MiB [ 2.9%] /scipy 83.9 MiB [ 2.9%] /sklearn 78.8 MiB [ 2.7%] /plotly 69.5 MiB [ 2.4%] /tensorflow_io 58.6 MiB [ 2.0%] /clang 50.3 MiB [ 1.7%] /apache_beam 46.6 MiB [ 1.6%] /google
How about share
?
--- /opt/conda/pkgs/dlenv-tf-2-8...0.20230926-py37hab20f5e_0/share --- /.. 621.3 MiB [ 99.7%] /jupyter --- /opt/conda/pkgs/dlenv-tf-2-8...f5e_0/share/jupyter/lab/staging --- /.. 480.7 MiB [ 88.5%] /node_modules 57.1 MiB [ 10.5%] /build
Half a gig for Jupyter's JS dependencies & build files. So, ~6.5 unused stuff.
How about the lib
sibling to pkgs
(3.2 GB) ? Almost all of it is … another Python distribution?
--- /opt/conda/lib/python3.7 ------ /.. 2.9 GiB [ 98.5%] /site-packages --- /opt/conda/lib/python3.7/site-packages --- /.. 1.1 GiB [ 38.3%] /tensorflow 282.5 MiB [ 9.4%] /ray 117.0 MiB [ 3.9%] /pyarrow 98.2 MiB [ 3.3%] /llvmlite 84.3 MiB [ 2.8%] /scipy 83.9 MiB [ 2.8%] /sklearn 78.8 MiB [ 2.6%] /plotly 69.5 MiB [ 2.3%] /tensorflow_io 58.6 MiB [ 1.9%] /clang 50.8 MiB [ 1.7%] /google 50.3 MiB [ 1.7%] /apache_beam
These appear to be the same packages as the dlenv-etc
folder… ~3 GB of duplication, bringing our unused total to ~9.5 GB.
Since that's nearly all of our ~12 GB difference I stopped here.
Container size analysis: TensorFlow base
Let's do a quick scan of the container built off the base TensorFlow image.
Let's open up the container. Ooh, fancy...
root@2317ea736b48:/deepcell-imaging# apt update && apt install ncdu root@2317ea736b48:/deepcell-imaging# ncdu /
This time most of the contents are in usr
and root
--- / -------------------- 5.5 GiB [ 79.7%] /usr 1.3 GiB [ 19.0%] /root
Most of root
is Python 3.8 libraries, which is a lot of small libraries:
--- /root/.local/lib/python3.8 ---- /.. 1.0 GiB [100.0%] /site-packages --- /root/.local/lib/python3.8/site-packages ---- /.. 85.5 MiB [ 8.5%] /scipy 83.6 MiB [ 8.3%] /google 74.5 MiB [ 7.4%] /imagecodecs 72.3 MiB [ 7.2%] /cv2 62.1 MiB [ 6.1%] /opencv_python_headless.libs 61.9 MiB [ 6.1%] /pandas 45.7 MiB [ 4.5%] /sklearn
whereas /usr
looks like this:
--- /usr ------------------ /.. 3.1 GiB [ 57.2%] /local 2.2 GiB [ 40.1%] /lib
Almost all of lib
is CUDA DNN:
--- /usr/lib/x86_64-linux-gnu ------------------- /.. 757.3 MiB [ 36.9%] libcudnn_cnn_infer.so.8.1.0 442.8 MiB [ 21.6%] libnvinfer.so.7.2.2 267.4 MiB [ 13.0%] libcudnn_ops_infer.so.8.1.0
whereas local
is split across more CUDA + python files:
--- /usr/local ---------------- /.. 1.7 GiB [ 55.4%] /cuda-11.2 1.4 GiB [ 43.7%] /lib --- /usr/local/cuda-11.2/targets/x86_64-linux/lib --- /.. 382.7 MiB [ 25.4%] libcusolver.so.11.1.0.152 219.6 MiB [ 14.6%] libcusparse.so.11.4.1.1152 186.6 MiB [ 12.4%] libcusolverMg.so.11.1.0.152 181.3 MiB [ 12.0%] libcufft.so.10.4.1.152 176.7 MiB [ 11.7%] libcublasLt.so.11.4.1.1043 --- /usr/local/lib/python3.8/dist-packages --- /.. 1.1 GiB [ 84.0%] /tensorflow
It looks like the CUDA DNN files in /usr/lib
are different from the CUDA files in /usr/local
.
Conclusions
The Deep Learning container seems better suited for:
- compiling tools from source
- training, not just predicting
- using notebooks for iterative development
- overall development tasks
The TensorFlow base image seems better suited for:
- running the specific thing you want to run once you've figured out how to run it.
Future work?
Google has optimized container images for VertexAI. We'd use: us-docker.pkg.dev/vertex-ai-restricted/prediction/tf_opt-gpu.2-8:latest
I get the sense from the docs these only work on Vertex AI & need you to train the model on Vertex AI as well:
The optimization occurs when Vertex AI uploads a model, before it runs.
At some point it may be worth investigating the cost of predicting via Vertex AI online models, vs, predicting with an open-source container on Batch. But, if the container is so large again because of training code, we may lose whatever benefits we gained…
Top comments (0)