Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
- Updated
Nov 13, 2025 - Python
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
A simple resume parser used for extracting information from resumes
Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extraction easy.
extract data from html table
Extract colors from an image. Colors are grouped based on visual similarities using the CIE76 formula.
Get Lyrics for any songs by just passing in the song name (spelled or misspelled) in less than 2 seconds using this awesome Python Library.
This program extracts insider trading data from the sec website and stores it in excel file for the specified time frame.
Unofficial Python client for Twitter
Extract audio and other data from the Digitech Trio Plus guitar pedal's SD card
Extract structured data from any unstructured web page
Different python utility scripts to help automate mundane/repetitive tasks. Useful for performance testers/data scientist or anyone who wants to automate mundane tasks in python.
A simple UI tool to batch crop images to prepare datasets from images and videos.
A Python module for reading data from a plot provided as SVG file.
Extract data from Octopus mdict (*.mdd, *.mdx) files
This is a library for making batch request to Google Analytics Core Reporting v3 API and extracting data from Google Analytics property into Python 3 data structures.
A toolkit for extracting elements and visualization for Waymo Open Dataset
A tool designed to extract numerical data from scanned historical weather documents.
Extract emails and phone numbers from the list of url addresses
Add a description, image, and links to the extract-data topic page so that developers can more easily learn about it.
To associate your repository with the extract-data topic, visit your repo's landing page and select "manage topics."