🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
- Updated
Jul 7, 2025 - TypeScript
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Python scraper based on AI
🛏 An HTML to Markdown converter written in JavaScript
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
helloworld 开发者社区开源的一个轻量级,强大的 html 一键转 md 工具,支持多平台文章一键转换,并保存下载到本地。
HTML to Markdown converter and crawler.
AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.
It's time for your markup to get down! HTML to markdown converter. Breakdance is a highly pluggable, flexible and easy to use.
➖ Stripped down, stable version of firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are completely removed. Crawl and convert any website into LLM-ready markdown.
🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG
reader is for your command line what the “readability” view is for modern browsers: A lightweight tool offering better readability of web pages (and EML files!) on the CLI.
📋 Browser extension to copy text as Markdown (with GFM and MathML support)
Slurps webpages and saves them as clean, uncluttered Markdown. Think Pocket, but better.
Firefox add-on to copy selection as Markdown
A CLI tool that converts exported Medium posts (html) to Jekyll/Hugo compatible markdown with front matter.
😼 Dependency-free and lean DOM parser that outputs Markdown
Export Atlassian Confluence pages as markdown files.
Add a description, image, and links to the html-to-markdown topic page so that developers can more easily learn about it.
To associate your repository with the html-to-markdown topic, visit your repo's landing page and select "manage topics."