Goals:
- Run notebooks files in a lambda.
- Allow them to install their own dependencies.
In this instance, I've used serverless framework, but the problems solved likely apply to other frameworks. After trying a number of approaches, the following seemed to work within the constraints of lambdas read-only file system:
- Create a dedicated workspace in /tmp.
- Copy the notebook and a script to start and execute a virtual environment into the workspace.
- Fork off to the script and allow it to run to completion.
Starting with the serverless.yml
file, note "IPYTHONDIR" must be set to somewhere in /tmp
since lambdas run on a read only file system:
service: nb-exec frameworkVersion: '3' provider: name: aws functions: hello: handler: handler.hello environment: IPYTHONDIR: /tmp/ipythondir plugins: - serverless-python-requirements custom: pythonRequirements: fileName: requirements.txt dockerizePip: true package: patterns: - "!.venv/**" - "!node_modules/**"
Our requirements.txt
file, which we will use to execute the notebook files:
nbconvert===7.9.2 ipython===8.16.1 ipykernel===6.25.2
Next, inside our handler:
import os import shutil import subprocess import uuid def hello(event, context): unique_id = str(uuid.uuid4()) workspace_path = os.path.join(os.path.abspath(os.sep), "tmp", f"workspace_{unique_id}") if not os.path.exists(workspace_path): os.makedirs(workspace_path) shutil.copy("execute.sh", workspace_path) notebook_dir_path = os.path.join(workspace_path, "notebook") os.makedirs(notebook_dir_path, exist_ok=True) shutil.copy("example.ipynb", notebook_dir_path) execute_script_path = os.path.join(workspace_path, "execute.sh") subprocess.run(["bash", execute_script_path], cwd=workspace_path)
And finally the execute.sh
file:
# Make sure dependencies can be picked up from the deployment directory, as well as the # built in AWS runtime dependencies. export PYTHONPATH=$LAMBDA_TASK_ROOT:$LAMBDA_RUNTIME_DIR # Create a virtual environment that inherits these dependencies. python3 -m venv .venv --system-site-packages source .venv/bin/activate python3 -m nbconvert --to notebook --execute ./notebook/example.ipynb
One unsolved additional problem is the following error when installing dependencies from within a cell:
!pip install pandas Error: out of pty devices
But replacing this with the following seems to work fine:
subprocess.run(["pip", "install", "pandas"])
Note, running untrusted code in a lambda environment is not secure as each invocation may have access to other invocations or AWS resources.
Top comments (0)