Skip to main content
Pagination Configuration

Overview

SmartScraper supports pagination functionality to extract data from multiple pages of a website. This is particularly useful for:
  • E-commerce product listings
  • News article collections
  • Job listing aggregations
  • Any content spread across multiple pages

Pagination Parameters

Core Parameters

ParameterTypeRequiredDefaultRangeDescription
total_pagesintegerNo11-10Number of pages to scrape
number_of_scrollsintegerNo00-10Number of scrolls per page
wait_forintegerNo00-30Wait time in seconds between actions

Advanced Parameters

ParameterTypeRequiredDefaultDescription
pagination_delayintegerNo2Delay between page requests (seconds)
scroll_delayintegerNo1Delay between scrolls (seconds)
max_items_per_pageintegerNo100Maximum items to extract per page

Basic Usage

Python SDK

from scrapegraph_py import Client from pydantic import BaseModel from typing import List  class Product(BaseModel):  name: str  price: str  rating: str  class ProductList(BaseModel):  products: List[Product]  client = Client(api_key="your-api-key")  # Basic pagination - scrape 3 pages response = client.smartscraper(  website_url="https://example-store.com/products",  user_prompt="Extract all product information",  output_schema=ProductList,  total_pages=3 ) 

JavaScript SDK

import { smartScraper } from 'scrapegraph-js';  const apiKey = 'your-api-key'; const url = 'https://example-store.com/products'; const prompt = 'Extract all product information';  // Basic pagination - scrape 3 pages const response = await smartScraper(  apiKey,   url,   prompt,   null,   null,   3 // total_pages ); 

Advanced Pagination Examples

E-commerce Product Scraping

from scrapegraph_py import Client from pydantic import BaseModel, Field from typing import List, Optional  class Product(BaseModel):  name: str = Field(description="Product name")  price: str = Field(description="Product price")  rating: Optional[str] = Field(description="Customer rating")  image_url: Optional[str] = Field(description="Product image URL")  availability: Optional[str] = Field(description="Product availability status")  class ProductCatalog(BaseModel):  products: List[Product] = Field(description="List of products")  client = Client(api_key="your-api-key")  # Scrape 5 pages with scrolling and delays response = client.smartscraper(  website_url="https://amazon.com/s?k=laptops",  user_prompt="Extract all laptop products with their details",  output_schema=ProductCatalog,  total_pages=5,  number_of_scrolls=3,  wait_for=2 ) 

News Article Collection

class Article(BaseModel):  title: str = Field(description="Article title")  summary: str = Field(description="Article summary")  author: str = Field(description="Author name")  publish_date: str = Field(description="Publication date")  url: str = Field(description="Article URL")  class NewsFeed(BaseModel):  articles: List[Article] = Field(description="List of articles")  # Collect articles from multiple news pages response = client.smartscraper(  website_url="https://techcrunch.com/category/artificial-intelligence/",  user_prompt="Extract all AI-related articles with their details",  output_schema=NewsFeed,  total_pages=4,  number_of_scrolls=2 ) 

Job Listing Aggregation

class JobListing(BaseModel):  title: str = Field(description="Job title")  company: str = Field(description="Company name")  location: str = Field(description="Job location")  salary: Optional[str] = Field(description="Salary information")  requirements: List[str] = Field(description="Job requirements")  class JobBoard(BaseModel):  jobs: List[JobListing] = Field(description="List of job listings")  # Gather job listings from multiple pages response = client.smartscraper(  website_url="https://linkedin.com/jobs/search?keywords=python",  user_prompt="Extract all Python developer job listings",  output_schema=JobBoard,  total_pages=3,  number_of_scrolls=5 ) 

Pagination Strategies

1. Sequential Pagination

For websites with traditional page-based navigation:
# Traditional pagination (page=1, page=2, etc.) response = client.smartscraper(  website_url="https://example.com/products?page=1",  user_prompt="Extract products from this page",  total_pages=5 ) 

2. Infinite Scroll Pagination

For websites with infinite scroll or “Load More” buttons:
# Infinite scroll pagination response = client.smartscraper(  website_url="https://example.com/feed",  user_prompt="Extract all posts from the feed",  total_pages=1, # Single page  number_of_scrolls=10 # Multiple scrolls to load more content ) 

3. Hybrid Approach

Combine both strategies for complex websites:
# Hybrid: multiple pages with scrolling on each response = client.smartscraper(  website_url="https://example.com/category/electronics",  user_prompt="Extract all electronic products",  total_pages=3, # 3 category pages  number_of_scrolls=5 # 5 scrolls per page ) 

Best Practices

1. Start Small and Scale Up

# Start with 1-2 pages for testing response = client.smartscraper(  website_url="https://example.com",  user_prompt="Extract basic information",  total_pages=1 # Start small )  # Then scale up based on results response = client.smartscraper(  website_url="https://example.com",  user_prompt="Extract comprehensive data",  total_pages=5 # Scale up ) 

2. Optimize Prompts for Pagination

# Good: Specific about pagination context user_prompt = """ Extract all product information from this page and any subsequent pages. Include: name, price, rating, availability, and image URL. Ensure you capture all products across multiple pages. """  # Better: Include pagination instructions user_prompt = """ Extract product information from this e-commerce page. For each product, get: name, price, rating, availability, image URL. This is page 1 of a multi-page product listing. Look for pagination controls and extract data from all visible pages. """ 

3. Handle Rate Limiting

import time  # Implement delays between requests for page in range(1, 6):  response = client.smartscraper(  website_url=f"https://example.com/products?page={page}",  user_prompt="Extract products",  total_pages=1, # One page at a time  wait_for=3 # Wait 3 seconds  )    # Additional delay between requests  time.sleep(2) 

4. Error Handling and Retries

import time from scrapegraph_py.exceptions import APIError  def scrape_with_retry(client, url, prompt, max_retries=3):  for attempt in range(max_retries):  try:  response = client.smartscraper(  website_url=url,  user_prompt=prompt,  total_pages=3  )  return response  except APIError as e:  if attempt < max_retries - 1:  print(f"Attempt {attempt + 1} failed: {e}")  time.sleep(2 ** attempt) # Exponential backoff  else:  raise e 

Common Use Cases

E-commerce Scraping

# Amazon product scraping response = client.smartscraper(  website_url="https://amazon.com/s?k=smartphones",  user_prompt="""  Extract all smartphone products from this search results page.  For each product include: name, price, rating, reviews count,   availability, and prime eligibility.  """,  output_schema=ProductCatalog,  total_pages=5,  number_of_scrolls=3 ) 

Social Media Monitoring

# Twitter/X feed scraping response = client.smartscraper(  website_url="https://twitter.com/search?q=AI",  user_prompt="""  Extract all tweets from this search results page.  For each tweet include: author, content, timestamp,   likes, retweets, and replies count.  """,  total_pages=1,  number_of_scrolls=15 # More scrolls for social media ) 

News Aggregation

# News website scraping response = client.smartscraper(  website_url="https://reuters.com/technology",  user_prompt="""  Extract all technology news articles from this page.  For each article include: headline, summary, author,   publication date, and category.  """,  output_schema=NewsFeed,  total_pages=4 ) 

Troubleshooting

Common Issues

Problem: Data is only extracted from the first page.Solutions:
  • Verify the website supports pagination
  • Check if total_pages parameter is set correctly
  • Ensure the URL includes proper pagination parameters
  • Try increasing number_of_scrolls for infinite scroll sites
Problem: Requests are being rate limited.Solutions:
  • Reduce total_pages value
  • Increase wait_for and pagination_delay
  • Implement exponential backoff
  • Use async clients for better performance
Problem: Not all expected data is extracted.Solutions:
  • Increase number_of_scrolls for dynamic content
  • Add wait_for parameter for slow-loading pages
  • Refine your user prompt for better extraction
  • Check if the website requires authentication
Problem: Getting API errors during pagination.Solutions:
  • Verify your API key is valid
  • Check your API usage limits
  • Ensure the website URL is accessible
  • Review error messages for specific issues

Performance Optimization

Async Processing

import asyncio from scrapegraph_py import AsyncClient  async def scrape_multiple_sites():  client = AsyncClient(api_key="your-api-key")    urls = [  "https://site1.com/products",  "https://site2.com/products",  "https://site3.com/products"  ]    tasks = []  for url in urls:  task = client.smartscraper(  website_url=url,  user_prompt="Extract products",  total_pages=3  )  tasks.append(task)    results = await asyncio.gather(*tasks)  return results 

Batch Processing

def process_in_batches(urls, batch_size=3):  results = []    for i in range(0, len(urls), batch_size):  batch = urls[i:i + batch_size]    # Process batch  batch_results = []  for url in batch:  response = client.smartscraper(  website_url=url,  user_prompt="Extract data",  total_pages=2  )  batch_results.append(response)    results.extend(batch_results)    # Delay between batches  time.sleep(5)    return results 

API Reference

For detailed API documentation, see:

Support & Resources

Need Help?

Contact our support team for assistance with pagination or any other questions!
⌘I