Skip to content
This repository was archived by the owner on Apr 11, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ Lower-Level Classes
.. autoclass:: pypdf_table_extraction.parsers.Lattice
:inherited-members:

.. autoclass:: camelot.parsers.Network
.. autoclass:: pypdf_table_extraction.parsers.Network
:inherited-members:

.. autoclass:: camelot.parsers.Hybrid
.. autoclass:: pypdf_table_extraction.parsers.Hybrid
:inherited-members:

Lower-Lower-Level Classes
Expand All @@ -41,7 +41,7 @@ Lower-Lower-Level Classes
Plotting
--------

.. autofunction:: camelot.plot
.. autofunction:: pypdf_table_extraction.plot

.. autoclass:: camelot.plotting.PlotMethods
.. autoclass:: pypdf_table_extraction.plotting.PlotMethods
:inherited-members:
22 changes: 11 additions & 11 deletions docs/user/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Read the PDF

Reading a PDF to extract tables with pypdf_table_extraction is very simple.

Begin by importing the Camelot module
Begin by importing the pypdf_table_extraction module

.. code-block:: pycon

Expand All @@ -34,7 +34,7 @@ Now, let's try to read a PDF. (You can check out the PDF used in this example `h
>>> tables
<TableList n=1>

Now, we have a :class:`TableList <camelot.core.TableList>` object called ``tables``, which is a list of :class:`Table <camelot.core.Table>` objects. We can get everything we need from this object.
Now, we have a :class:`TableList <pypdf_table_extraction.core.TableList>` object called ``tables``, which is a list of :class:`Table <pypdf_table_extraction.core.Table>` objects. We can get everything we need from this object.

We can access each table using its index. From the code snippet above, we can see that the ``tables`` object has only one table, since ``n=1``. Let's access the table using the index ``0`` and take a look at its ``shape``.

Expand All @@ -55,7 +55,7 @@ Let's print the parsing report.
'page': 1
}

Woah! The accuracy is top-notch and there is less whitespace, which means the table was most likely extracted correctly. You can access the table as a pandas DataFrame by using the :class:`table <camelot.core.Table>` object's ``df`` property.
Woah! The accuracy is top-notch and there is less whitespace, which means the table was most likely extracted correctly. You can access the table as a pandas DataFrame by using the :class:`table <pypdf_table_extraction.core.Table>` object's ``df`` property.

.. code-block:: pycon

Expand All @@ -64,15 +64,15 @@ Woah! The accuracy is top-notch and there is less whitespace, which means the ta
.. csv-table::
:file: ../_static/csv/foo.csv

Looks good! You can now export the table as a CSV file using its :meth:`to_csv() <camelot.core.Table.to_csv>` method. Alternatively you can use :meth:`to_json() <camelot.core.Table.to_json>`, :meth:`to_excel() <camelot.core.Table.to_excel>` :meth:`to_html() <camelot.core.Table.to_html>` :meth:`to_markdown() <camelot.core.Table.to_markdown>` or :meth:`to_sqlite() <camelot.core.Table.to_sqlite>` methods to export the table as JSON, Excel, HTML files or a sqlite database respectively.
Looks good! You can now export the table as a CSV file using its :meth:`to_csv() <pypdf_table_extraction.core.Table.to_csv>` method. Alternatively you can use :meth:`to_json() <pypdf_table_extraction.core.Table.to_json>`, :meth:`to_excel() <pypdf_table_extraction.core.Table.to_excel>` :meth:`to_html() <pypdf_table_extraction.core.Table.to_html>` :meth:`to_markdown() <pypdf_table_extraction.core.Table.to_markdown>` or :meth:`to_sqlite() <pypdf_table_extraction.core.Table.to_sqlite>` methods to export the table as JSON, Excel, HTML files or a sqlite database respectively.

.. code-block:: pycon

>>> tables[0].to_csv('foo.csv')

This will export the table as a CSV file at the path specified. In this case, it is ``foo.csv`` in the current directory.

You can also export all tables at once, using the :class:`tables <camelot.core.TableList>` object's :meth:`export() <camelot.core.TableList.export>` method.
You can also export all tables at once, using the :class:`tables <pypdf_table_extraction.core.TableList>` object's :meth:`export() <pypdf_table_extraction.core.TableList.export>` method.

.. code-block:: pycon

Expand All @@ -87,7 +87,7 @@ You can also export all tables at once, using the :class:`tables <camelot.core.T

This will export all tables as CSV files at the path specified. Alternatively, you can use ``f='json'``, ``f='excel'``, ``f='html'``, ``f='markdown'`` or ``f='sqlite'``.

.. note:: The :meth:`export() <camelot.core.TableList.export>` method exports files with a ``page-*-table-*`` suffix. In the example above, the single table in the list will be exported to ``foo-page-1-table-1.csv``. If the list contains multiple tables, multiple CSV files will be created. To avoid filling up your path with multiple files, you can use ``compress=True``, which will create a single ZIP file at your path with all the CSV files.
.. note:: The :meth:`export() <pypdf_table_extraction.core.TableList.export>` method exports files with a ``page-*-table-*`` suffix. In the example above, the single table in the list will be exported to ``foo-page-1-table-1.csv``. If the list contains multiple tables, multiple CSV files will be created. To avoid filling up your path with multiple files, you can use ``compress=True``, which will create a single ZIP file at your path with all the CSV files.

.. note:: pypdf_table_extraction handles rotated PDF pages automatically. As an exercise, try to extract the table out of `this PDF`_.

Expand All @@ -98,7 +98,7 @@ Specify page numbers

By default, pypdf_table_extraction only uses the first page of the PDF to extract tables. To specify multiple pages, you can use the ``pages`` keyword argument::

>>> camelot.read_pdf('your.pdf', pages='1,2,3')
>>> pypdf_table_extraction.read_pdf('your.pdf', pages='1,2,3')

.. tip::
Here's how you can do the same with the :ref:`command-line interface <cli>`.
Expand All @@ -116,7 +116,7 @@ pypdf_table_extraction supports extracting tables in parrallel using all the ava

.. code-block:: pycon

>>> tables = camelot.read_pdf('foo.pdf', page='all', parallel=True)
>>> tables = pypdf_table_extraction.read_pdf('foo.pdf', page='all', parallel=True)
>>> tables
<TableList n=1>

Expand All @@ -133,11 +133,11 @@ pypdf_table_extraction supports extracting tables in parrallel using all the ava
Reading encrypted PDFs
----------------------

To extract tables from encrypted PDF files you must provide a password when calling :meth:`read_pdf() <camelot.read_pdf>`.
To extract tables from encrypted PDF files you must provide a password when calling :meth:`read_pdf() <pypdf_table_extraction.read_pdf>`.

.. code-block:: pycon

>>> tables = camelot.read_pdf('foo.pdf', password='userpass')
>>> tables = pypdf_table_extraction.read_pdf('foo.pdf', password='userpass')
>>> tables
<TableList n=1>

Expand All @@ -150,7 +150,7 @@ To extract tables from encrypted PDF files you must provide a password when call

pypdf_table_extraction supports PDFs with all encryption types supported by `pypdf`_. This might require installing PyCryptodome. An exception is thrown if the PDF cannot be read. This may be due to no password being provided, an incorrect password, or an unsupported encryption algorithm.

Further encryption support may be added in future, however in the meantime if your PDF files are using unsupported encryption algorithms you are advised to remove encryption before calling :meth:`read_pdf() <camelot.read_pdf>`. This can been successfully achieved with third-party tools such as `QPDF`_.
Further encryption support may be added in future, however in the meantime if your PDF files are using unsupported encryption algorithms you are advised to remove encryption before calling :meth:`read_pdf() <pypdf_table_extraction.read_pdf>`. This can been successfully achieved with third-party tools such as `QPDF`_.

.. code-block:: console

Expand Down
Loading