Python SDK

PyPI Package

Python Support

Installation

Install the package using pip:

pip install scrapegraph-py

Features

AI-Powered Extraction: Advanced web scraping using artificial intelligence
Flexible Clients: Both synchronous and asynchronous support
Type Safety: Structured output with Pydantic schemas
Production Ready: Detailed logging and automatic retries
Developer Friendly: Comprehensive error handling

Quick Start

Initialize the client with your API key:

from scrapegraph_py import Client  client = Client(api_key="your-api-key-here")

You can also set the SGAI_API_KEY environment variable and initialize the client without parameters: client = Client()

Services

SmartScraper

Extract specific information from any webpage using AI:

response = client.smartscraper(  website_url="https://example.com",  user_prompt="Extract the main heading and description" ) 

Parameters

Parameter	Type	Required	Description
website_url	string	Yes	The URL of the webpage that needs to be scraped.
user_prompt	string	Yes	A textual description of what you want to achieve.
output_schema	object	No	The Pydantic object that describes the structure and format of the response.
render_heavy_js	boolean	No	Enable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, etc.). Default: False

Basic Schema Example

Define a simple schema for basic data extraction:

from pydantic import BaseModel, Field  class ArticleData(BaseModel):  title: str = Field(description="The article title")  author: str = Field(description="The author's name")  publish_date: str = Field(description="Article publication date")  content: str = Field(description="Main article content")  category: str = Field(description="Article category")  response = client.smartscraper(  website_url="https://example.com/blog/article",  user_prompt="Extract the article information",  output_schema=ArticleData )  print(f"Title: {response.title}") print(f"Author: {response.author}") print(f"Published: {response.publish_date}") 

Advanced Schema Example

Define a complex schema for nested data structures:

from typing import List from pydantic import BaseModel, Field  class Employee(BaseModel):  name: str = Field(description="Employee's full name")  position: str = Field(description="Job title")  department: str = Field(description="Department name")  email: str = Field(description="Email address")  class Office(BaseModel):  location: str = Field(description="Office location/city")  address: str = Field(description="Full address")  phone: str = Field(description="Contact number")  class CompanyData(BaseModel):  name: str = Field(description="Company name")  description: str = Field(description="Company description")  industry: str = Field(description="Industry sector")  founded_year: int = Field(description="Year company was founded")  employees: List[Employee] = Field(description="List of key employees")  offices: List[Office] = Field(description="Company office locations")  website: str = Field(description="Company website URL")  # Extract comprehensive company information response = client.smartscraper(  website_url="https://example.com/about",  user_prompt="Extract detailed company information including employees and offices",  output_schema=CompanyData )  # Access nested data print(f"Company: {response.name}") print("\nKey Employees:") for employee in response.employees:  print(f"- {employee.name} ({employee.position})")  print("\nOffice Locations:") for office in response.offices:  print(f"- {office.location}: {office.address}") 

Enhanced JavaScript Rendering Example

For modern web applications built with React, Vue, Angular, or other JavaScript frameworks:

from scrapegraph_py import Client from pydantic import BaseModel, Field  class ProductInfo(BaseModel):  name: str = Field(description="Product name")  price: str = Field(description="Product price")  description: str = Field(description="Product description")  availability: str = Field(description="Product availability status")  client = Client(api_key="your-api-key")  # Enable enhanced JavaScript rendering for a React-based e-commerce site response = client.smartscraper(  website_url="https://example-react-store.com/products/123",  user_prompt="Extract product details including name, price, description, and availability",  output_schema=ProductInfo,  render_heavy_js=True # Enable for React/Vue/Angular sites )  print(f"Product: {response['result']['name']}") print(f"Price: {response['result']['price']}") print(f"Available: {response['result']['availability']}") 

When to use render_heavy_js:

React, Vue, or Angular applications
Single Page Applications (SPAs)
Sites with heavy client-side rendering
Dynamic content loaded via JavaScript
Interactive elements that depend on JavaScript execution

SearchScraper

Search and extract information from multiple web sources using AI:

response = client.searchscraper(  user_prompt="What are the key features and pricing of ChatGPT Plus?" ) 

Parameters

Parameter	Type	Required	Description
user_prompt	string	Yes	A textual description of what you want to achieve.
num_results	number	No	Number of websites to search (3-20). Default: 3.
extraction_mode	boolean	No	True = AI extraction mode (10 credits/page), False = markdown mode (2 credits/page). Default: True
output_schema	object	No	The Pydantic object that describes the structure and format of the response (AI extraction mode only)

Basic Schema Example

Define a simple schema for structured search results:

from pydantic import BaseModel, Field from typing import List  class ProductInfo(BaseModel):  name: str = Field(description="Product name")  description: str = Field(description="Product description")  price: str = Field(description="Product price")  features: List[str] = Field(description="List of key features")  availability: str = Field(description="Availability information")  response = client.searchscraper(  user_prompt="Find information about iPhone 15 Pro",  output_schema=ProductInfo )  print(f"Product: {response.name}") print(f"Price: {response.price}") print("\nFeatures:") for feature in response.features:  print(f"- {feature}") 

Advanced Schema Example

Define a complex schema for comprehensive market research:

from typing import List from pydantic import BaseModel, Field  class MarketPlayer(BaseModel):  name: str = Field(description="Company name")  market_share: str = Field(description="Market share percentage")  key_products: List[str] = Field(description="Main products in market")  strengths: List[str] = Field(description="Company's market strengths")  class MarketTrend(BaseModel):  name: str = Field(description="Trend name")  description: str = Field(description="Trend description")  impact: str = Field(description="Expected market impact")  timeframe: str = Field(description="Trend timeframe")  class MarketAnalysis(BaseModel):  market_size: str = Field(description="Total market size")  growth_rate: str = Field(description="Annual growth rate")  key_players: List[MarketPlayer] = Field(description="Major market players")  trends: List[MarketTrend] = Field(description="Market trends")  challenges: List[str] = Field(description="Industry challenges")  opportunities: List[str] = Field(description="Market opportunities")  # Perform comprehensive market research response = client.searchscraper(  user_prompt="Analyze the current AI chip market landscape",  output_schema=MarketAnalysis )  # Access structured market data print(f"Market Size: {response.market_size}") print(f"Growth Rate: {response.growth_rate}")  print("\nKey Players:") for player in response.key_players:  print(f"\n{player.name}")  print(f"Market Share: {player.market_share}")  print("Key Products:")  for product in player.key_products:  print(f"- {product}")  print("\nMarket Trends:") for trend in response.trends:  print(f"\n{trend.name}")  print(f"Impact: {trend.impact}")  print(f"Timeframe: {trend.timeframe}") 

Markdown Mode Example

Use markdown mode for cost-effective content gathering:

from scrapegraph_py import Client  client = Client(api_key="your-api-key")  # Enable markdown mode for cost-effective content gathering response = client.searchscraper(  user_prompt="Latest developments in artificial intelligence",  num_results=3,  extraction_mode=False # Enable markdown mode (2 credits per page vs 10 credits) )  # Access the raw markdown content markdown_content = response['markdown_content'] reference_urls = response['reference_urls']  print(f"Markdown content length: {len(markdown_content)} characters") print(f"Reference URLs: {len(reference_urls)}")  # Process the markdown content print("Content preview:", markdown_content[:500] + "...")  # Save to file for analysis with open('ai_research_content.md', 'w', encoding='utf-8') as f:  f.write(markdown_content)  print("Content saved to ai_research_content.md") 

Markdown Mode Benefits:

Cost-effective: Only 2 credits per page (vs 10 credits for AI extraction)
Full content: Get complete page content in markdown format
Faster: No AI processing overhead
Perfect for: Content analysis, bulk data collection, building datasets

Markdownify

Convert any webpage into clean, formatted markdown:

response = client.markdownify(  website_url="https://example.com" )

Async Support

All endpoints support asynchronous operations:

import asyncio from scrapegraph_py import AsyncClient  async def main():  async with AsyncClient() as client:  response = await client.smartscraper(  website_url="https://example.com",  user_prompt="Extract the main content"  )  print(response)  asyncio.run(main()) 

Feedback

Help us improve by submitting feedback programmatically:

client.submit_feedback(  request_id="your-request-id",  rating=5,  feedback_text="Great results!" ) 

Support

GitHub

Report issues and contribute to the SDK

Email Support

Get help from our development team

License

This project is licensed under the MIT License. See the LICENSE file for details.

Get Started

Services

Official SDKs

Integrations

Contribute

Resources

PyPI Package

Python Support

Installation

Features

Quick Start

Services

SmartScraper

Parameters

SearchScraper

Parameters

Markdownify

Async Support

Feedback

Support

GitHub

Email Support

Get Started

Services

Official SDKs

Integrations

Contribute

Resources

PyPI Package

Python Support

​Installation

​Features

​Quick Start

​Services

​SmartScraper

​Parameters

​SearchScraper

​Parameters

​Markdownify

​Async Support

​Feedback

​Support

GitHub

Email Support

Installation

Features

Quick Start

Services

SmartScraper

Parameters

SearchScraper

Parameters

Markdownify

Async Support

Feedback

Support