Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more β
Top 22 Python Crawling Projects
- Project mention: Scrapy Middlewares: A Practical Guide for Beginners (With Real-World Examples) | dev.to | 2025-12-17
User-Agent: Scrapy/2.11.0 (+https://scrapy.org)
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
-
Scrapling
π·οΈ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
Project mention: Agentic Coding Slot Machines β Did We Just Summon a Genie Addiction? β Part 1 | news.ycombinator.com | 2025-07-03 -
crawlee-python
CrawleeβA web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Project mention: Launching Crawlee for Python v1.0 to simplify building web scrapers and crawlers | news.ycombinator.com | 2025-09-30 - Project mention: Stop crawling my HTML you dickheads β use the API | news.ycombinator.com | 2025-12-14
-
-
-
-
InfluxDB
InfluxDB β Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
At scrapfly, we are dedicated to provide developer withs all the resources they need to reach their scraping goals. Check out our comprehensive guide on scraping tiktok as well as our example tiktok scraper using Scrapfly's APIs on github.
-
-
-
-
-
-
telegram-crawler
π· Automatically detect changes made to the official Telegram sites, clients and servers.
-
scrapper
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
-
-
courlan
Clean, filter and sample URLs to optimize data collection β Python & command-line β Deduplication, spam, content and language filters
-
- Project mention: Introducing QCrawl β A Modern Async Web Crawler Framework for Python | dev.to | 2025-12-05
Hi everyone, Iβve released an open-source project Iβve been building: https://github.com/crawlcore/qcrawl
-
-
sneakpeek
Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. Itβs the best choice for scrapers that have some specific complex scraping logic that needs to be run on a constant basis (by flulemon)
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Crawling discussion
Python Crawling related posts
-
Scrapy Middlewares: A Practical Guide for Beginners (With Real-World Examples)
-
Stop crawling my HTML you dickheads β use the API
-
Progress Updates on Contribution to Scrapy
-
Introducing QCrawl β A Modern Async Web Crawler Framework for Python
-
How I Block All 26M of Your Curl Requests
-
Launching Crawlee for Python v1.0 to simplify building web scrapers and crawlers
-
How we moved Crawlee for Python out of Beta
- A note from our sponsor - Stream getstream.io | 22 Dec 2025
Index
What are some of the best open-source Crawling projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | Scrapy | 59,259 |
| 2 | Scrapling | 8,322 |
| 3 | crawlee-python | 7,286 |
| 4 | ai.robots.txt | 3,418 |
| 5 | Grab | 2,442 |
| 6 | mlscraper | 1,369 |
| 7 | scrapyrt | 871 |
| 8 | scrapfly-scrapers | 799 |
| 9 | isp-data-pollution | 614 |
| 10 | spidermon | 551 |
| 11 | LinkedInDumper | 526 |
| 12 | WarcDB | 406 |
| 13 | spidy Web Crawler | 348 |
| 14 | telegram-crawler | 334 |
| 15 | scrapper | 295 |
| 16 | estela | 193 |
| 17 | courlan | 154 |
| 18 | datacrawl | 64 |
| 19 | qcrawl | 42 |
| 20 | XingDumper | 38 |
| 21 | sneakpeek | 37 |
| 22 | estela-cli | 4 |