Top 22 Python Crawling Projects

Scrapy

1 193 59,259 9.5 Python

Scrapy, a fast high-level web crawling & scraping framework for Python.

Project mention: Scrapy Middlewares: A Practical Guide for Beginners (With Real-World Examples) | dev.to | 2025-12-17

User-Agent: Scrapy/2.11.0 (+https://scrapy.org)
Stream

getstream.io featured

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
Scrapling

2 6 8,322 9.8 Python

🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

Project mention: Agentic Coding Slot Machines – Did We Just Summon a Genie Addiction? – Part 1 | news.ycombinator.com | 2025-07-03
crawlee-python

3 15 7,286 9.8 Python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

Project mention: Launching Crawlee for Python v1.0 to simplify building web scrapers and crawlers | news.ycombinator.com | 2025-09-30
ai.robots.txt

4 15 3,418 9.4 Python

A list of AI agents and robots to block.

Project mention: Stop crawling my HTML you dickheads – use the API | news.ycombinator.com | 2025-12-14
Grab

5 0 2,442 9.2 Python

Web Scraping Framework
mlscraper

6 10 1,369 0.6 Python

🤖 Scrape data from HTML websites automatically by just providing examples
scrapyrt

7 3 871 3.8 Python

HTTP API for Scrapy spiders
InfluxDB

www.influxdata.com featured

InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
scrapfly-scrapers

8 5 799 9.4 Python

Scalable Python web scraping scripts for +40 popular domains

Project mention: A Comprehensive Guide to TikTok API | dev.to | 2025-03-20

At scrapfly, we are dedicated to provide developer withs all the resources they need to reach their scraping goals. Check out our comprehensive guide on scraping tiktok as well as our example tiktok scraper using Scrapfly's APIs on github.
isp-data-pollution

9 2 614 0.0 Python

ISP Data Pollution to Protect Private Browsing History with Obfuscation
spidermon

10 2 551 5.9 Python

Scrapy Extension for monitoring spiders execution.
LinkedInDumper

11 1 526 6.9 Python

Python 3 script to dump/scrape/extract company employees from LinkedIn API
WarcDB

12 7 406 6.3 Python

WarcDB: Web crawl data as SQLite databases.
spidy Web Crawler

13 0 348 0.0 Python

The simple, easy to use command line web crawler.
telegram-crawler

14 1 334 9.9 Python

🕷 Automatically detect changes made to the official Telegram sites, clients and servers.
scrapper

15 1 295 6.5 Python

Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
estela

16 10 193 5.9 Python

estela, an elastic web scraping cluster 🕸
courlan

17 0 154 0.9 Python

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
datacrawl

18 1 64 9.2 Python

A simple and easy to use web crawler for Python
qcrawl

19 2 42 9.1 Python

qcrawl - fast async web crawling & scraping framework for Python.

Project mention: Introducing QCrawl — A Modern Async Web Crawler Framework for Python | dev.to | 2025-12-05

Hi everyone, I’ve released an open-source project I’ve been building: https://github.com/crawlcore/qcrawl
XingDumper

20 1 38 5.0 Python

Python 3 script to dump/scrape/extract company employees from XING API
sneakpeek

21 3 37 7.5 Python

Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex scraping logic that needs to be run on a constant basis (by flulemon)
estela-cli

22 1 4 0.0 Python

estela Command Line Client 🕸
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Crawling discussion

Python Crawling related posts

Scrapy Middlewares: A Practical Guide for Beginners (With Real-World Examples)

1 project | dev.to | 17 Dec 2025
Stop crawling my HTML you dickheads – use the API

2 projects | news.ycombinator.com | 14 Dec 2025
Progress Updates on Contribution to Scrapy

1 project | dev.to | 10 Dec 2025
Introducing QCrawl — A Modern Async Web Crawler Framework for Python

1 project | dev.to | 5 Dec 2025
How I Block All 26M of Your Curl Requests

6 projects | news.ycombinator.com | 2 Oct 2025
Launching Crawlee for Python v1.0 to simplify building web scrapers and crawlers

1 project | news.ycombinator.com | 30 Sep 2025
How we moved Crawlee for Python out of Beta

2 projects | dev.to | 30 Sep 2025
A note from our sponsor - Stream
getstream.io | 22 Dec 2025

Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →

Index

What are some of the best open-source Crawling projects in Python? This list will help you:

#	Project	Stars
1	Scrapy	59,259
2	Scrapling	8,322
3	crawlee-python	7,286
4	ai.robots.txt	3,418
5	Grab	2,442
6	mlscraper	1,369
7	scrapyrt	871
8	scrapfly-scrapers	799
9	isp-data-pollution	614
10	spidermon	551
11	LinkedInDumper	526
12	WarcDB	406
13	spidy Web Crawler	348
14	telegram-crawler	334
15	scrapper	295
16	estela	193
17	courlan	154
18	datacrawl	64
19	qcrawl	42
20	XingDumper	38
21	sneakpeek	37
22	estela-cli	4

Python Crawling

Top 22 Python Crawling Projects

Python Crawling discussion

Python Crawling related posts

Scrapy Middlewares: A Practical Guide for Beginners (With Real-World Examples)

Stop crawling my HTML you dickheads – use the API

Progress Updates on Contribution to Scrapy

Introducing QCrawl — A Modern Async Web Crawler Framework for Python

How I Block All 26M of Your Curl Requests

Launching Crawlee for Python v1.0 to simplify building web scrapers and crawlers

How we moved Crawlee for Python out of Beta

Index

Did you know that Python is the 2nd most popular programming language based on number of references?

Did you know that Python is
the 2nd most popular programming language
based on number of references?