Cheerio Tree is a powerful utility built on Cheerio, designed for efficient DOM parsing. It enables rapid conversion of HTML data into JSON format. When paired with YAML, it provides an intuitive and streamlined approach to data handling and transformation.
npm run dev # or yarn dev # or pnpm dev Now, Try Your First Api Scraper:
Localhost:
https://www.proxysites.ai/category
Online:
https://www.proxysites.ai/category
For example: data/wordpressCom/tags.yml
Please use camelCase for folder and file naming.
After saving the YAML file, it will be automatically converted to JSON in the development environment
and saved as app/lib/cheerio-tree/wordpressCom-tags.ts.
Make sure to configure the parsing settings in the predetermined format to avoid issues with file generation.
# data/wordpressCom/tags.yml regexToI: ®exToI regex: '[^\d]' replace: regexToF: ®exToF regex: '[^\d\.]' replace: regexToK: ®exToK regex: 'K' replace: "000" regexToM: ®exToM regex: 'M' replace: "000000" # string to int # eg. 1.1K will be 1100 toI: &toI - <<: *regexToK - <<: *regexToM - <<: *regexToI addHost: &addHost regex: '^(.*)$' replace: https://wordpress.com$1 # Main # ================================================== # ================================================== tree: # URL to match url: match: https://wordpress.com/tags nodes: trending: wrapper: list: true selector: div.trending-tags__container .trending-tags__column normal: tag: selector: a .trending-tags__title link: selector: a attr: href after_regular: - <<: *addHost count: to_i: selector: .trending-tags__count after_regular: *toInpm run build # or # pnpm build git add dist && git commit -m "build" Create your test at tests
pnpm test or npm run testYou can deploy this project to Vercel with the following button:
# Config Your Api Key SECRET_API_KEY=your_api_key # You can find at https://www.proxysites.ai/ HTTP_PROXY=You can use your API with two authentication methods: URL parameter and Header parameter. Here is how to use these two methods for authentication in detail.
Add the token parameter to the request URL and set your API key as the parameter value. For example, if your API endpoint is http://localhost:3000/api/v1/resource and your API key is your_api_key, you can call the API like this:
curl "http://localhost:3000/api/v1/resource?token=your_api_key"Add X-Api-Key to the request header and set your API key as the value. You can use the curl command to send a request with a custom header:
curl -H "X-Api-Key: your_api_key" "http://localhost:3000/api/v1/resource"Suppose you have an API endpoint https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category?url=https://www.proxysites.ai/category/proxy-type. You can authenticate using the following two methods:
API Key: expressapikey
curl "https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category?url=https://www.proxysites.ai/category/proxy-type&token=expressapikey"curl -H "X-Api-Key: expressapikey" "https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category?url=https://www.proxysites.ai/category/proxy-type"URL Param Authentication: Add the token parameter to the request URL.
Header Authentication: Add X-Api-Key to the request header.
Choose the appropriate authentication method based on your needs and use case. Generally, using header authentication is more secure as it does not expose the key in the URL.
You can call the API in your code as follows:
function urlEncode(url) { return encodeURIComponent(url); } const encodedUrl = urlEncode('https://www.proxysites.ai/category/proxy-type'); const apiKey = 'expressapikey'; // Using URL Param fetch(`https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category?url=${urlEncode}&token=${apiKey}`) .then(response => response.json()) .then(data => console.log(data)) .catch(error => console.error('Error:', error)); // Using Header fetch(`https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category?url=${urlEncode}`, { headers: { 'X-Api-Key': apiKey } }) .then(response => response.json()) .then(data => console.log(data)) .catch(error => console.error('Error:', error));import requests import urllib.parse def url_encode(url): return urllib.parse.quote(url, safe='') encoded_url = url_encode('https://www.proxysites.ai/category/proxy-type') api_key = 'expressapikey' # Using URL Param url_param_response = requests.get( 'https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category', params={'url': encoded_url, 'token': api_key} ) print("URL Param Response:") print(url_param_response.json()) # Using Header headers = { 'X-Api-Key': api_key } header_response = requests.get( 'https://express-scraper-api.vercel.app/api/v1/proxysites.ai/category', headers=headers, params={'url': encoded_url} ) print("Header Response:") print(header_response.json())