Skip to content

AWeirdDev/crapdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦀 crapdf

Extract text from a PDF file. Uses the lopdf crate. Kind of crappy.

from crapdf import extract, extract_bytes # Extract from file path texts: list[str] = extract("file.pdf") # Extract from bytes with open("file.pdf", "rb") as f: content = f.read() texts: list[str] = extract_bytes(content)

Performance

Run the benchmarks using bench.py. Make sure to install dev dependencies from requirements-dev.txt.

The overall performance is similar to pypdf.


AWeirdDev. GitHub Repo

About

🦀 Extract text from PDF files.

Topics

Resources

Stars

Watchers

Forks