jonasjacek / robots.txt Star 84 Code Issues Pull requests Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites. search-engine whitelist user-agent seo crawling twitterbot robots-txt googlebot crawlers web-crawling bingbot robots-exclusion-standard blocking-bots web-robots search-engine-optimization baiduspider Updated Feb 16, 2025
din0s / ml-for-bot-detection Star 13 Code Issues Pull requests A Python notebook showcasing the use of Machine Learning for the task of bot detection, with an emphasis on e-commerce sites. machine-learning e-commerce web-robots bot-detection Updated Jul 8, 2021 Jupyter Notebook
acuciureanu / spidertrap-rs Star 12 Code Issues Pull requests A simple trap for web crawlers rust web-crawler web-scraping cybersecurity intrusion-detection web-security anti-bot data-privacy ethical-hacking web-robots network-security bot-detection web-spider web-monitoring spider-trap website-protection anti-scraping crawler-trap bot-trap server-defense Updated Aug 2, 2023 Rust
jimsmart / progszy Star 1 Code Issues Pull requests Progszy is a hard-caching HTTP(S) proxy server, for web robots. go http-proxy https-proxy web-robots caching-proxy Updated Mar 18, 2025 Go