DEV Community

Cover image for Ultimate web scraping with browserless, puppeteer and Node.js
Christian
Christian

Posted on • Originally published at cri.dev

Ultimate web scraping with browserless, puppeteer and Node.js

Originally posted on cri.dev

Browser automation built for enterprises, loved by developers.

browserless.io is a neat service for hosted puppeteer scraping, but there is also the official Docker image for running it locally.

I was amazed when I found out about it 🤯!

Find the whole source code on Github christian-fei/browserless-example!

Running browserless in docker

A one-liner is enough to have a full puppeteer backend, with configured concurrency etc., to leverage using puppeteer.

You can connect to a browserless backend by passing the option browserWSEndpoint like this:

async function createBrowser () { return puppeteer.connect({ browserWSEndpoint: 'ws://localhost:3000' }) } 
Enter fullscreen mode Exit fullscreen mode

To start the backend you can use the following command, using the docker image browserless/chrome:

docker run \ -e "MAX_CONCURRENT_SESSIONS=15" \ -e "MAX_QUEUE_LENGTH=0" \ -e "PREBOOT_CHROME=true" \ -e "DEFAULT_BLOCK_ADS=true" \ -e "DEFAULT_IGNORE_HTTPS_ERRORS=true" \ -e "CONNECTION_TIMEOUT=600000" \ -p 3000:3000 \ --rm -it browserless/chrome 
Enter fullscreen mode Exit fullscreen mode

Source code

Find the whole source code on Github christian-fei/browserless-example!

You'll find a web crawler with puppeteer!

git clone https://github.com/christian-fei/browserless-example.git cd browserless-example npm i npm run start-browserless node crawl-with-api.js https://christianfei.com 
Enter fullscreen mode Exit fullscreen mode

Puppeteer using browserless docker backend

You simply connect to the Browser WebSocket Endpoint ws://localhost:3000 and you're connected to the browserless backend!

Here is a short example of getting all links <a> on christianfei.com:

const puppeteer = require('puppeteer') main(process.argv[2]) .then(err => console.log('finished, exiting') && process.exit(0)) .catch(err => console.error(err) && process.exit(1)) async function main (url = 'https://christianfei.com') { const browser = await createBrowser() const page = await browser.newPage() await page.goto(url) console.log('title', await page.title()) const links = await page.evaluate(selector => [...document.querySelectorAll(selector)], 'a') console.log('links.length', links.length) } async function createBrowser () { return puppeteer.connect({ browserWSEndpoint: 'ws://localhost:3000' }) } 
Enter fullscreen mode Exit fullscreen mode

An example video:

Top comments (0)