Python Crawling

Open-source Python projects categorized as Crawling

Top 22 Python Crawling Projects

  1. Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

    Project mention: Scrapy Middlewares: A Practical Guide for Beginners (With Real-World Examples) | dev.to | 2025-12-17

    User-Agent: Scrapy/2.11.0 (+https://scrapy.org)

  2. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  3. Scrapling

    πŸ•·οΈ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

    Project mention: Agentic Coding Slot Machines – Did We Just Summon a Genie Addiction? – Part 1 | news.ycombinator.com | 2025-07-03
  4. crawlee-python

    Crawleeβ€”A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

    Project mention: Launching Crawlee for Python v1.0 to simplify building web scrapers and crawlers | news.ycombinator.com | 2025-09-30
  5. ai.robots.txt

    A list of AI agents and robots to block.

    Project mention: Stop crawling my HTML you dickheads – use the API | news.ycombinator.com | 2025-12-14
  6. Grab

    Web Scraping Framework

  7. mlscraper

    πŸ€– Scrape data from HTML websites automatically by just providing examples

  8. scrapyrt

    HTTP API for Scrapy spiders

  9. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  10. scrapfly-scrapers

    Scalable Python web scraping scripts for +40 popular domains

    Project mention: A Comprehensive Guide to TikTok API | dev.to | 2025-03-20

    At scrapfly, we are dedicated to provide developer withs all the resources they need to reach their scraping goals. Check out our comprehensive guide on scraping tiktok as well as our example tiktok scraper using Scrapfly's APIs on github.

  11. isp-data-pollution

    ISP Data Pollution to Protect Private Browsing History with Obfuscation

  12. spidermon

    Scrapy Extension for monitoring spiders execution.

  13. LinkedInDumper

    Python 3 script to dump/scrape/extract company employees from LinkedIn API

  14. WarcDB

    WarcDB: Web crawl data as SQLite databases.

  15. spidy Web Crawler

    The simple, easy to use command line web crawler.

  16. telegram-crawler

    πŸ•· Automatically detect changes made to the official Telegram sites, clients and servers.

  17. scrapper

    Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.

  18. estela

    estela, an elastic web scraping cluster πŸ•Έ

  19. courlan

    Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

  20. datacrawl

    A simple and easy to use web crawler for Python

  21. qcrawl

    qcrawl - fast async web crawling & scraping framework for Python.

    Project mention: Introducing QCrawl β€” A Modern Async Web Crawler Framework for Python | dev.to | 2025-12-05

    Hi everyone, I’ve released an open-source project I’ve been building: https://github.com/crawlcore/qcrawl

  22. XingDumper

    Python 3 script to dump/scrape/extract company employees from XING API

  23. sneakpeek

    Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex scraping logic that needs to be run on a constant basis (by flulemon)

  24. estela-cli

    estela Command Line Client πŸ•Έ

  25. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Crawling discussion

Python Crawling related posts

  • Scrapy Middlewares: A Practical Guide for Beginners (With Real-World Examples)

    1 project | dev.to | 17 Dec 2025
  • Stop crawling my HTML you dickheads – use the API

    2 projects | news.ycombinator.com | 14 Dec 2025
  • Progress Updates on Contribution to Scrapy

    1 project | dev.to | 10 Dec 2025
  • Introducing QCrawl β€” A Modern Async Web Crawler Framework for Python

    1 project | dev.to | 5 Dec 2025
  • How I Block All 26M of Your Curl Requests

    6 projects | news.ycombinator.com | 2 Oct 2025
  • Launching Crawlee for Python v1.0 to simplify building web scrapers and crawlers

    1 project | news.ycombinator.com | 30 Sep 2025
  • How we moved Crawlee for Python out of Beta

    2 projects | dev.to | 30 Sep 2025
  • A note from our sponsor - Stream
    getstream.io | 22 Dec 2025
    Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more β†’

Index

What are some of the best open-source Crawling projects in Python? This list will help you:

# Project Stars
1 Scrapy 59,259
2 Scrapling 8,322
3 crawlee-python 7,286
4 ai.robots.txt 3,418
5 Grab 2,442
6 mlscraper 1,369
7 scrapyrt 871
8 scrapfly-scrapers 799
9 isp-data-pollution 614
10 spidermon 551
11 LinkedInDumper 526
12 WarcDB 406
13 spidy Web Crawler 348
14 telegram-crawler 334
15 scrapper 295
16 estela 193
17 courlan 154
18 datacrawl 64
19 qcrawl 42
20 XingDumper 38
21 sneakpeek 37
22 estela-cli 4

Sponsored
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.
Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
getstream.io

Did you know that Python is
the 2nd most popular programming language
based on number of references?