Name	Name	Last commit message	Last commit date
Latest commit History 206 Commits
.github	.github
Tabula.Csv	Tabula.Csv
Tabula.Json	Tabula.Json
Tabula.Tests	Tabula.Tests
Tabula	Tabula
images	images
.gitattributes	.gitattributes
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
tabula-sharp.sln	tabula-sharp.sln

Name

Last commit message

Last commit date

206 Commits

tabula-sharp

tabula-sharp is a library for extracting tables from PDF files — it is a port of tabula-java

Supports netstandard2.0, net462, net471, net6.0, net8.0
No java bindings

NuGet packages available on the releases page and on www.nuget.org:

Differences with tabula-java

Uses PdfPig, and not PdfBox.
Coordinate system starts from the bottom left point (going up) of the page, and not from the top left point (going down).
The NurminenDetectionAlgorithm is replaced by SimpleNurminenDetectionAlgorithm, because it requieres an image management library.
Table results might be different because of the way PdfPig builds Letters bounding box.

Usage

Stream mode - BasicExtractionAlgorithm

using (PdfDocument document = PdfDocument.Open("doc.pdf", new ParsingOptions() { ClipPaths = true })) { PageArea page = ObjectExtractor.Extract(document, 1); // detect canditate table zones SimpleNurminenDetectionAlgorithm detector = new SimpleNurminenDetectionAlgorithm(); var regions = detector.Detect(page); IExtractionAlgorithm ea = new BasicExtractionAlgorithm(); IReadOnlyList<Table> tables = ea.Extract(page.GetArea(regions[0].BoundingBox)); // take first candidate area var table = tables[0]; var rows = table.Rows; }

Lattice mode - SpreadsheetExtractionAlgorithm

using (PdfDocument document = PdfDocument.Open("doc.pdf", new ParsingOptions() { ClipPaths = true })) { PageArea page = ObjectExtractor.Extract(document, 1); IExtractionAlgorithm ea = new SpreadsheetExtractionAlgorithm(); IReadOnlyList<Table> tables = ea.Extract(page); var table = tables[0]; var rows = table.Rows; }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tabula-sharp

Differences with tabula-java

Usage

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

Results

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

About

Uh oh!

Releases 10

Packages

Contributors 2

Languages

License

BobLd/tabula-sharp

Folders and files

Latest commit

History

Repository files navigation

tabula-sharp

Differences with tabula-java

Usage

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

Results

Stream mode - BasicExtractionAlgorithm

Lattice mode - SpreadsheetExtractionAlgorithm

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 2

Languages

Packages