In December 2020, AWS started to support Lambda functions as container images, which is a real breakdown that allows us to deploy way more complex projects with the same you-pay-only-for-what-you-use pricing and serverless architecture.
Web scraping workloads have real benefits from this Upgrade due to an easier installation of selenium.
Let's code!
The Dockerfile bellow is based on the oficial lambda container image for python 3.8 (it is really awful to create this image from scratch).
# Dockerfile FROM public.ecr.aws/lambda/python:3.8 RUN yum install -y \ Xvfb \ wget \ unzip # Install google-chrome-stable RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm && \ yum localinstall -y google-chrome-stable_current_x86_64.rpm # Install chromedriver RUN wget https://chromedriver.storage.googleapis.com/2.40/chromedriver_linux64.zip && \ unzip chromedriver_linux64.zip && \ chmod 775 chromedriver # Install selenium RUN pip3 install -U pip selenium # Copy lambda's main script COPY app.py . CMD ["app.lambda_handler"] The python script below configures the Selenium with a Chrome headless. Note the path of the chrome driver at the driver definition - such path comes from the work directory of the base image.
# app.py from selenium import webdriver chromeOptions = webdriver.ChromeOptions() chromeOptions.add_argument("--headless") chromeOptions.add_argument("--remote-debugging-port=9222") chromeOptions.add_argument('--no-sandbox') driver = webdriver.Chrome('/var/task/chromedriver',chrome_options=chromeOptions) def lambda_handler(event, context): driver.get("http://www.python.org") return { "statusCode": 200, "body": driver.title } Finally, build and run the container image!
$ docker build -t scrapper:latest . $ docker run -p 9000:8080 scrapper:latest In order to test your new web scraping containerized lambda function, run the following command.
$ curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}' {"statusCode": 200, "body": "Welcome to Python.org"}
Top comments (0)