Installation Install the package using npm, pnpm, yarn or bun: # Using npm npm i scrapegraph-js # Using pnpm pnpm i scrapegraph-js # Using yarn yarn add scrapegraph-js # Using bun bun add scrapegraph-js Features AI-Powered Extraction : Smart web scraping with artificial intelligence Async by Design : Fully asynchronous architecture Type Safety : Built-in TypeScript support with Zod schemas Production Ready : Automatic retries and detailed logging Developer Friendly : Comprehensive error handling Quick Start Basic example Store your API keys securely in environment variables. Use .env files and libraries like dotenv to load them into your app.
import { smartScraper } from "scrapegraph-js" ; import "dotenv/config" ; // Initialize variables const apiKey = process . env . SGAI_APIKEY ; // Set your API key as an environment variable const websiteUrl = "https://example.com" ; const prompt = "What does the company do?" ; try { const response = await smartScraper ( apiKey , websiteUrl , prompt ); // call SmartScraper function console . log ( response . result ); } catch ( error ) { console . error ( "Error:" , error ); } Services SmartScraper Extract specific information from any webpage using AI: const response = await smartScraper ( apiKey , "https://example.com" , "Extract the main content" ); Parameters Parameter Type Required Description apiKey string Yes The ScrapeGraph API Key. websiteUrl string Yes The URL of the webpage that needs to be scraped. prompt string Yes A textual description of what you want to achieve. schema object No The Pydantic or Zod object that describes the structure and format of the response. renderHeavyJs boolean No Enable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, etc.). Default: false
Define a simple schema using Zod: import { z } from "zod" ; const ArticleSchema = z . object ({ title: z . string (). describe ( "The article title" ), author: z . string (). describe ( "The author's name" ), publishDate: z . string (). describe ( "Article publication date" ), content: z . string (). describe ( "Main article content" ), category: z . string (). describe ( "Article category" ), }); const ArticlesArraySchema = z . array ( ArticleSchema ) . describe ( "Array of articles" ); const response = await smartScraper ( apiKey , "https://example.com/blog/article" , "Extract the article information" , ArticlesArraySchema ); console . log ( `Title: ${ response . result . title } ` ); console . log ( `Author: ${ response . result . author } ` ); console . log ( `Published: ${ response . result . publishDate } ` );
Define a complex schema for nested data structures: import { z } from "zod" ; const EmployeeSchema = z . object ({ name: z . string (). describe ( "Employee's full name" ), position: z . string (). describe ( "Job title" ), department: z . string (). describe ( "Department name" ), email: z . string (). describe ( "Email address" ), }); const OfficeSchema = z . object ({ location: z . string (). describe ( "Office location/city" ), address: z . string (). describe ( "Full address" ), phone: z . string (). describe ( "Contact number" ), }); const CompanySchema = z . object ({ name: z . string (). describe ( "Company name" ), description: z . string (). describe ( "Company description" ), industry: z . string (). describe ( "Industry sector" ), foundedYear: z . number (). describe ( "Year company was founded" ), employees: z . array ( EmployeeSchema ). describe ( "List of key employees" ), offices: z . array ( OfficeSchema ). describe ( "Company office locations" ), website: z . string (). url (). describe ( "Company website URL" ), }); // Extract comprehensive company information const response = await smartScraper ( apiKey , "https://example.com/about" , "Extract detailed company information including employees and offices" , CompanySchema ); // Access nested data console . log ( `Company: ${ response . result . name } ` ); console . log ( " \n Key Employees:" ); response . result . employees . forEach (( employee ) => { console . log ( `- ${ employee . name } ( ${ employee . position } )` ); }); console . log ( " \n Office Locations:" ); response . result . offices . forEach (( office ) => { console . log ( `- ${ office . location } : ${ office . address } ` ); });
Enhanced JavaScript Rendering Example
For modern web applications built with React, Vue, Angular, or other JavaScript frameworks: import { smartScraper } from 'scrapegraph-js' ; import { z } from 'zod' ; const apiKey = 'your-api-key' ; const ProductSchema = z . object ({ name: z . string (). describe ( 'Product name' ), price: z . string (). describe ( 'Product price' ), description: z . string (). describe ( 'Product description' ), availability: z . string (). describe ( 'Product availability status' ) }); try { const response = await smartScraper ( apiKey , 'https://example-react-store.com/products/123' , 'Extract product details including name, price, description, and availability' , ProductSchema , true // Enable render_heavy_js for JavaScript-heavy sites ); console . log ( 'Product:' , response . result . name ); console . log ( 'Price:' , response . result . price ); console . log ( 'Available:' , response . result . availability ); } catch ( error ) { console . error ( 'Error:' , error ); } When to use renderHeavyJs: React, Vue, or Angular applications Single Page Applications (SPAs) Sites with heavy client-side rendering Dynamic content loaded via JavaScript Interactive elements that depend on JavaScript execution SearchScraper Search and extract information from multiple web sources using AI: const response = await searchScraper ( apiKey , "Find the best restaurants in San Francisco" ); Parameters Parameter Type Required Description apiKey string Yes The ScrapeGraph API Key. prompt string Yes A textual description of what you want to achieve. numResults number No Number of websites to search (3-20). Default: 3. extractionMode boolean No true = AI extraction mode (10 credits/page), false = markdown mode (2 credits/page). Default: trueschema object No The Pydantic or Zod object that describes the structure and format of the response (AI extraction mode only)
Define a simple schema using Zod: import { z } from "zod" ; const ArticleSchema = z . object ({ title: z . string (). describe ( "The article title" ), author: z . string (). describe ( "The author's name" ), publishDate: z . string (). describe ( "Article publication date" ), content: z . string (). describe ( "Main article content" ), category: z . string (). describe ( "Article category" ), }); const response = await searchScraper ( apiKey , "Find news about the latest trends in AI" , ArticleSchema ); console . log ( `Title: ${ response . result . title } ` ); console . log ( `Author: ${ response . result . author } ` ); console . log ( `Published: ${ response . result . publishDate } ` );
Define a complex schema for nested data structures: import { z } from "zod" ; const EmployeeSchema = z . object ({ name: z . string (). describe ( "Employee's full name" ), position: z . string (). describe ( "Job title" ), department: z . string (). describe ( "Department name" ), email: z . string (). describe ( "Email address" ), }); const OfficeSchema = z . object ({ location: z . string (). describe ( "Office location/city" ), address: z . string (). describe ( "Full address" ), phone: z . string (). describe ( "Contact number" ), }); const RestaurantSchema = z . object ({ name: z . string (). describe ( "Restaurant name" ), address: z . string (). describe ( "Restaurant address" ), rating: z . number (). describe ( "Restaurant rating" ), website: z . string (). url (). describe ( "Restaurant website URL" ), }); // Extract comprehensive company information const response = await searchScraper ( apiKey , "Find the best restaurants in San Francisco" , RestaurantSchema );
Use markdown mode for cost-effective content gathering: import { searchScraper } from 'scrapegraph-js' ; const apiKey = 'your-api-key' ; try { // Enable markdown mode for cost-effective content gathering const response = await searchScraper ( apiKey , 'Latest developments in artificial intelligence' , 3 , // Search 3 websites false // Enable markdown mode (2 credits per page vs 10 credits) ); // Access the raw markdown content const markdownContent = response . markdown_content ; console . log ( 'Markdown content length:' , markdownContent . length ); console . log ( 'Reference URLs:' , response . reference_urls ); // Process the markdown content console . log ( 'Content preview:' , markdownContent . substring ( 0 , 500 ) + '...' ); } catch ( error ) { console . error ( 'Error:' , error ); } Markdown Mode Benefits: Cost-effective : Only 2 credits per page (vs 10 credits for AI extraction) Full content : Get complete page content in markdown format Faster : No AI processing overhead Perfect for : Content analysis, bulk data collection, building datasets Markdownify Convert any webpage into clean, formatted markdown: const response = await markdownify ( apiKey , "https://example.com" ); Parameters Parameter Type Required Description apiKey string Yes The ScrapeGraph API Key. websiteUrl string Yes The URL of the webpage to convert to markdown.
API Credits Check your available API credits: import { getCredits } from "scrapegraph-js" ; try { const credits = await getCredits ( apiKey ); console . log ( "Available credits:" , credits ); } catch ( error ) { console . error ( "Error fetching credits:" , error ); } Feedback Help us improve by submitting feedback programmatically: import { sendFeedback } from "scrapegraph-js" ; try { await sendFeedback ( apiKey , "request-id" , 5 , "Great results!" ); } catch ( error ) { console . error ( "Error sending feedback:" , error ); } Support
This project is licensed under the MIT License. See the LICENSE file for details.