Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
- Updated
Sep 12, 2025 - Python
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG
Export Atlassian Confluence pages as markdown files.
Multimodal document parser for high quality data understanding and extraction
URL to Markdown API is a service that convert web content into clean, structured Markdown format through a simple HTTP GET request. It's built using FastAPI and the MarkItDown library, offering a straightforward way to convert various content types (web pages, YouTube videos, PDFs, documents) into Markdown that's optimized for Large Language Mod
✅ Parse your browser's exported HTML bookmark file to Markdown.
Turn a supported list of filetypes (e.g. .docx) into a markdown structured text file. Also optionally defangs indicators and extract texts from images. Built for threat intel use-cases.
Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.
Python script to convert Google Keep HTML note exports into Markdown (.md) files suitable for importing into Joplin.
a cli tool to fetch webpages main content and print it as markdown
Outillage d'extraction du contenu de l'ancien site de Geotribu (web scraping, conversion en markdown...)
website scraper for text with conversion to markdown.md and directory structuring
A simplified online encyclopedia with Markdown-formatted entries. Powered by Django.
Leverage Reader-LM's capabilities using LitServe.
Convert HTML to Discord's Markdown-formatted text.
Let's do web scrapping from codewars and bring all the solution codes along with their README at once
Add a description, image, and links to the html-to-markdown topic page so that developers can more easily learn about it.
To associate your repository with the html-to-markdown topic, visit your repo's landing page and select "manage topics."