Skip to main content
SmartScraper Service

Overview

SmartScraper is our flagship LLM-powered web scraping service that intelligently extracts structured data from any website. Using advanced LLM models, it understands context and content like a human would, making web data extraction more reliable and efficient than ever.
Try SmartScraper instantly in our interactive playground - no coding required!

Getting Started

Quick Start

from scrapegraph_py import Client  client = Client(api_key="your-api-key")  response = client.smartscraper(  website_url="https://scrapegraphai.com/",  user_prompt="Extract info about the company" ) 

Parameters

ParameterTypeRequiredDescription
apiKeystringYesThe ScrapeGraph API Key.
websiteUrlstringYesThe URL of the webpage that needs to be scraped.
promptstringYesA textual description of what you want to achieve.
schemaobjectNoThe Pydantic or Zod object that describes the structure and format of the response.
render_heavy_jsbooleanNoEnable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, etc.). Default: false
mockbooleannoIf you want mock data for testing
plain_textbooleannoIf you want result as plain text instead of Json
Get your API key from the dashboard

Enhanced JavaScript Rendering

For websites that heavily rely on JavaScript frameworks like React, Vue, Angular, or other Single Page Applications (SPAs), enable the render_heavy_js parameter to ensure complete content rendering before extraction.

When to Use render_heavy_js

Enable this parameter for:
  • React/Vue/Angular Applications: Modern web apps built with JavaScript frameworks
  • Single Page Applications (SPAs): Sites that load content dynamically via JavaScript
  • Heavy JavaScript Sites: Websites with complex client-side rendering
  • Dynamic Content: Pages where content appears after JavaScript execution
  • Interactive Elements: Sites with JavaScript-dependent content loading
Here’s how to use render_heavy_js for modern web applications:
from scrapegraph_py import Client from pydantic import BaseModel, Field  class ProductInfo(BaseModel):  name: str = Field(description="Product name")  price: str = Field(description="Product price")  description: str = Field(description="Product description")  availability: str = Field(description="Product availability status")  client = Client(api_key="your-api-key")  # Enable enhanced JavaScript rendering for a React-based e-commerce site response = client.smartscraper(  website_url="https://example-react-store.com/products/123",  user_prompt="Extract product details including name, price, description, and availability",  output_schema=ProductInfo,  render_heavy_js=True # Enable for React/Vue/Angular sites )  print(f"Product: {response['result']['name']}") print(f"Price: {response['result']['price']}") print(f"Available: {response['result']['availability']}") 
import { smartScraper } from 'scrapegraph-js'; import { z } from 'zod';  const apiKey = 'your-api-key';  const ProductSchema = z.object({  name: z.string().describe('Product name'),  price: z.string().describe('Product price'),  description: z.string().describe('Product description'),  availability: z.string().describe('Product availability status') });  try {  const response = await smartScraper(  apiKey,  'https://example-react-store.com/products/123',  'Extract product details including name, price, description, and availability',  ProductSchema,  true // Enable render_heavy_js for JavaScript-heavy sites  );    console.log('Product:', response.result.name);  console.log('Price:', response.result.price);  console.log('Available:', response.result.availability);   } catch (error) {  console.error('Error:', error); } 
cURL
curl -X 'POST' \  'https://api.scrapegraphai.com/v1/smartscraper' \  -H 'accept: application/json' \  -H 'SGAI-APIKEY: your-api-key' \  -H 'Content-Type: application/json' \  -d '{  "website_url": "https://example-react-store.com/products/123",  "user_prompt": "Extract product details including name, price, description, and availability",  "render_heavy_js": true }' 

Performance Considerations

  • Enhanced Rendering: render_heavy_js=true provides more thorough JavaScript execution
  • Processing Time: May take slightly longer due to enhanced rendering capabilities
  • Use When Needed: Only enable for sites that actually require it to avoid unnecessary overhead
  • Default Behavior: Standard rendering works for most websites
Pro Tip: If you’re getting incomplete or missing data from a modern web application, try enabling render_heavy_js=true to ensure all JavaScript-rendered content is captured.
{  "request_id": "sg-req-abc123",  "status": "completed",  "website_url": "https://scrapegraphai.com/",  "user_prompt": "Extract info about the company",  "result": {  "company_name": "ScrapeGraphAI",  "description": "ScrapeGraphAI is a powerful AI scraping API designed for efficient web data extraction to power LLM applications and AI agents...",  "features": [  "Effortless, cost-effective, and AI-powered data extraction",  "Handles proxy rotation and rate limits",  "Supports a wide variety of websites"  ],  "contact_email": "contact@scrapegraphai.com",  "social_links": {  "github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai",  "linkedin": "https://www.linkedin.com/company/101881123",  "twitter": "https://x.com/scrapegraphai"  }  },  "error": "" } 
The response includes:
  • request_id: Unique identifier for tracking your request
  • status: Current status of the extraction (“completed”, “running”, “failed”)
  • result: The extracted data in structured JSON format
  • error: Error message (if any occurred during extraction)
Instead of providing a URL, you can optionally pass your own HTML content:
html_content = """ <html>  <body>  <h1>ScrapeGraphAI</h1>  <div class="description">  <p>AI-powered web scraping for modern applications.</p>  </div>  <div class="features">  <ul>  <li>Smart Extraction</li>  <li>Local Processing</li>  <li>Schema Support</li>  </ul>  </div>  </body> </html> """  response = client.smartscraper(  website_html=html_content, # This will override website_url if both are provided  user_prompt="Extract info about the company" ) 
This is useful when:
  • You already have the HTML content cached
  • You want to process modified HTML
  • You’re working with dynamically generated content
  • You need to process content offline
  • You want to pre-process the HTML before extraction
When both website_url and website_html are provided, website_html takes precedence and will be used for extraction.

Key Features

Universal Compatibility

Works with any website structure, including JavaScript-rendered content

AI Understanding

Contextual understanding of content for accurate extraction

Structured Output

Returns clean, structured data in your preferred format

Schema Support

Define custom output schemas using Pydantic or Zod

Use Cases

Content Aggregation

  • News article extraction
  • Blog post summarization
  • Product information gathering
  • Research data collection

Data Analysis

  • Market research
  • Competitor analysis
  • Price monitoring
  • Trend tracking

AI Training

  • Dataset creation
  • Training data collection
  • Content classification
  • Knowledge base building
Want to learn more about our AI-powered scraping technology? Visit our main website to discover how we’re revolutionizing web data extraction.

Other Functionality

Retrieve a previous request

If you know the response id of a previous request you made, you can retrieve all the information.
import { getSmartScraperRequest } from 'scrapegraph-js';  const apiKey = 'your_api_key'; const requestId = 'ID_of_previous_request';  try {  const requestInfo = await getSmartScraperRequest(apiKey, requestId);  console.log(requestInfo); } catch (error) {  console.error(error); } 

Parameters

ParameterTypeRequiredDescription
apiKeystringYesThe ScrapeGraph API Key.
requestIdstringYesThe request ID associated with the output of a previous smartScraper request.

Custom Schema Example

Define exactly what data you want to extract:
from pydantic import BaseModel, Field  class ArticleData(BaseModel):  title: str = Field(description="Article title")  author: str = Field(description="Author name")  content: str = Field(description="Main article content")  publish_date: str = Field(description="Publication date")  response = client.smartscraper(  website_url="https://example.com/article",  user_prompt="Extract the article information",  output_schema=ArticleData ) 

Async Support

For applications requiring asynchronous execution, SmartScraper provides comprehensive async support through the AsyncClient:
import asyncio from scrapegraph_py import AsyncClient from pydantic import BaseModel, Field  # Define your schema class WebpageSchema(BaseModel):  title: str = Field(description="The title of the webpage")  description: str = Field(description="The description of the webpage")  summary: str = Field(description="A brief summary of the webpage")  async def main():  # Initialize the async client  async with AsyncClient(api_key="your-api-key") as client:  # List of URLs to analyze  urls = [  "https://scrapegraphai.com/",  "https://github.com/ScrapeGraphAI/Scrapegraph-ai",  ]   # Create scraping tasks for each URL  tasks = [  client.smartscraper(  website_url=url,  user_prompt="Summarize the main content",  output_schema=WebpageSchema  )  for url in urls  ]   # Execute requests concurrently  responses = await asyncio.gather(*tasks, return_exceptions=True)   # Process results  for i, response in enumerate(responses):  if isinstance(response, Exception):  print(f"Error for {urls[i]}: {response}")  else:  print(f"Result for {urls[i]}: {response['result']}")  # Run the async function if __name__ == "__main__":  asyncio.run(main()) 

Infinite Scroll Support

SmartScraper can handle infinite scroll pages by automatically scrolling to load more content before extraction. This is perfect for social media feeds, e-commerce product listings, and other dynamic content.
from scrapegraph_py import Client from scrapegraph_py.logger import sgai_logger from pydantic import BaseModel from typing import List  sgai_logger.set_logging(level="INFO")  # Define the output schema class Company(BaseModel):  name: str  category: str  location: str  class CompaniesResponse(BaseModel):  companies: List[Company]  # Initialize the client with explicit API key sgai_client = Client(api_key="sgai-api-key")  try:  # SmartScraper request with infinite scroll  response = sgai_client.smartscraper(  website_url="https://www.ycombinator.com/companies?batch=Spring%202025",  user_prompt="Extract all company names and their categories from the page",  output_schema=CompaniesResponse,  number_of_scrolls=10 # Scroll 10 times to load more companies  )   # Print the response  print(f"Request ID: {response['request_id']}")    # Parse and print the results in a structured way  result = CompaniesResponse.model_validate(response['result'])  print("\nExtracted Companies:")  print("-" * 80)  for company in result.companies:  print(f"Name: {company.name}")  print(f"Category: {company.category}")  print(f"Location: {company.location}")  print("-" * 80)  except Exception as e:  print(f"An error occurred: {e}")  finally:  sgai_client.close() 

Parameters for Infinite Scroll

ParameterTypeRequiredDescription
number_of_scrollsnumberNoNumber of times to scroll down to load more content (default: 0)
Infinite scroll is particularly useful for:
  • Social media feeds (Twitter, Instagram, LinkedIn)
  • E-commerce product listings
  • News websites with continuous scrolling
  • Any page that loads content dynamically as you scroll

SmartScraper Endpoint

The SmartScraper endpoint is our core service for extracting structured data from any webpage using advanced language models. It automatically adapts to different website layouts and content types, enabling quick and reliable data extraction.

Key Capabilities

  • Universal Compatibility: Works with any website structure, including JavaScript-rendered content
  • Schema Validation: Supports both Pydantic (Python) and Zod (JavaScript) schemas
  • Concurrent Processing: Efficient handling of multiple URLs through async support
  • Custom Extraction: Flexible user prompts for targeted data extraction

Endpoint Details

POST https://api.scrapegraphai.com/v1/smartscraper 
Required Headers
HeaderDescription
SGAI-APIKEYYour API authentication key
Content-Typeapplication/json
Request Body
FieldTypeRequiredDescription
website_urlstringYes*URL to scrape (*either this or website_html required)
website_htmlstringNoRaw HTML content to process
user_promptstringYesInstructions for data extraction
output_schemaobjectNoPydantic or Zod schema for response validation
render_heavy_jsbooleanNoEnable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, etc.). Default: false
mockbooleannoIf you want mock data for testing
Response Format
{  "request_id": "sg-req-abc123",  "status": "completed",  "website_url": "https://example.com",  "result": {  // Structured data based on schema or extraction prompt  },  "error": null } 

Best Practices

  1. Schema Definition:
    • Define schemas to ensure consistent data structure
    • Use descriptive field names and types
    • Include field descriptions for better extraction accuracy
  2. Async Processing:
    • Use async clients for concurrent requests
    • Implement proper error handling
    • Monitor rate limits and implement backoff strategies
  3. Error Handling:
    • Always wrap requests in try-catch blocks
    • Check response status before processing
    • Implement retry logic for failed requests

Integration Options

Official SDKs

  • Python SDK - Perfect for data science and backend applications
  • JavaScript SDK - Ideal for web applications and Node.js

AI Framework Integrations

Best Practices

Optimizing Extraction

  1. Be specific in your prompts
  2. Use schemas for structured data
  3. Handle pagination for multi-page content
  4. Implement error handling and retries

Rate Limiting

  • Implement reasonable delays between requests
  • Use async clients for better performance
  • Monitor your API usage

Example Projects

Check out our cookbook for real-world examples:
  • E-commerce product scraping
  • News aggregation
  • Research data collection
  • Content monitoring

API Reference

For detailed API documentation, see:

Support & Resources

Ready to Start?

Sign up now and get your API key to begin extracting data with SmartScraper!
⌘I