@promptapi/scraper-pkg
is a simple JavaScript wrapper for scraper-api.
- You need to signup for Prompt API
- You need to subscribe scraper-api, test drive is free!!!
- You need to set
PROMPTAPI_TOKEN
environment variable after subscription.
then;
$ npm install @promptapi/scraper-pkg
or, install from GitHub registry;
$ npm install @promptapi/scraper-pkg@0.1.6
Basic scrape feature:
const promptapi = require('@promptapi/scraper-pkg') params = {} promptapi.scraper('https://pypi.org/classifiers/', params).then(result => { if(result.error){ console.log(result.error) } else { console.log(result.data); // your scraped data... console.log(result.headers); console.log(result.url); promptapi.save('/tmp/data.html', result.data) // save result } })
Output:
// result.data <!DOCTYPE html> <html lang="en" dir="ltr"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="defaultLanguage" content="en"> <meta name="availableLanguages" content="en, es, fr, ja, pt_BR, uk, el, de, zh_Hans, ru, he"> : : : // result.headers { 'Content-Length': '322126', ... // result.url https://pypi.org/classifiers/ /tmp/data.html saved successfully, written 322126 bytes
You can add url parameters for extra operations. Valid parameters are:
auth_password
: for HTTP Realm auth passwordauth_username
: for HTTP Realm auth usernamecookie
: URL Encoded cookie header.country
: 2 character country code. If you wish to scrape from an IP address of a specific country.referer
: HTTP referer headerselector
: CSS style selector path such asa.btn div li
. Ifselector
is enabled, returning result will be collection of data and saved file will be in.json
format.
const promptapi = require('@promptapi/scraper-pkg') params = {country: 'EE', selector: 'ul li button[data-clipboard-text]'} promptapi.scraper('https://pypi.org/classifiers/', params).then(result => { if(result.error){ console.log(result.error) } else { console.log(result.data); // your scraped data... console.log(result.headers); console.log(result.url); promptapi.save('/tmp/data.json', result.data) } })
Output :
// result.data [ '<button class="button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w" data-clipboard-text="Development Status :: 1 - Planning" data-tooltip-label="Copy to clipboard" type="button">\n Copy\n</button>\n', '<button class="button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w" data-clipboard-text="Development Status :: 2 - Pre-Alpha" data-tooltip-label="Copy to clipboard" type="button">\n Copy\n</button>\n', '<button class="button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w" data-clipboard-text="Development Status :: 3 - Alpha" data-tooltip-label="Copy to clipboard" type="button">\n Copy\n</button>\n', : : : // result.headers { 'Content-Length': '322126', ... // result.url https://pypi.org/classifiers/ /tmp/data.json saved successfully, written 174182 bytes
If you have jq
tool;
$ cat /tmp/data.json | jq 'length' 736
You can also add extra X-
headers to your request. Read more about http headers at Mozilla’s website.
const promptapi = require('@promptapi/scraper-pkg') params = {} headers = {'X-Referer': 'https://www.google.com'} promptapi.scraper('https://pypi.org/classifiers/', params, headers=headers).then(result => { if(result.error){ console.log(result.error) } else { console.log(result.data); // your scraped data... console.log(result.headers); console.log(result.url); promptapi.save('/tmp/data.html', result.data) // save result } })
All you need is node
and npm
...
This project is licensed under MIT
- Prompt API - Creator, maintainer
All PR’s are welcome!
fork
(https://github.com/promptapi/scraper-pkg/fork)- Create your
branch
(git checkout -b my-feature
) commit
yours (git commit -am 'Add awesome features...'
)push
yourbranch
(git push origin my-feature
)- Than create a new Pull Request!
This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.