Add case for 'endobj' in PDF parsing #784

NamelessCoder · 2025-11-01T13:43:36Z

Fixes an issue with parsing some PDF documents which contain this instruction. I have no idea what it means.

Type of pull request

Bug fix (involves code and configuration changes)
New feature (involves code and configuration changes)
Documentation update
Something else

About

I am working on parsing a HUGE library of PDF files and some of those files contain an endobj header element
which isn't handled by the parser, and this results in an exception. The files do parse correctly if this header
element is simply ignored. It doesn't seem to have any practical use.

Checklist for code / configuration changes

See CONTRIBUTING.md for all essential information about contributing.

Fixes an issue with parsing some PDF documents which contain this instruction. I have no idea what it means.

k00ni · 2025-11-03T07:45:26Z

Thank you for your PR.

Is it still work in progress?

If not, there are a few tasks left to solve before I take a closer look. Please read https://github.com/smalot/pdfparser/blob/master/CONTRIBUTING.md for more information.

NamelessCoder · 2025-11-03T14:50:09Z

@k00ni It's no longer WIP.

I assume you mean the part about needing a test? I did look in the test cases but didn't find any current tests that deal with the "skip unknowns" behavior in the parser class, so I wasn't sure if the policy would be to allow it w/o test, or require a test to be written for just that one thing (same as other issueXX tests), or a general test to nail down the "skip unknowns" as a whole.

k00ni

Changing code without tests that cover the changes might lead to unintended side effects. We therefore only allow code changes without tests under rare circumstances.

But lets take a step back and get an overview about the underlying issue:

I assume you receive an Exception (Invalid type) without this code change?
Can you provide a PDF which triggers this problem and is free of charge so we can use it in the test suite?
If you are not able to produce/provide a usable PDF (e.g. because it contains sensitive data) we have to think about a basic test case which triggers your code.

Don't get me wrong, I know that a faulty library can disrupt a project in a significant way but just because a PDF fails to parse doesn't mean the library needs a change. It might be because of the PDF itself, the library or somewhere in between. I hope we can find a way to "harden" the library but also add at least 1 test to cover the change.

Add case for 'endobj' in PDF parsing

f84541e

Fixes an issue with parsing some PDF documents which contain this instruction. I have no idea what it means.

k00ni added tests required fix labels Nov 3, 2025

k00ni reviewed Nov 7, 2025

View reviewed changes

k00ni added the needs more info label Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add case for 'endobj' in PDF parsing #784

Add case for 'endobj' in PDF parsing #784

Uh oh!

NamelessCoder commented Nov 1, 2025

k00ni commented Nov 3, 2025

NamelessCoder commented Nov 3, 2025

k00ni left a comment •

edited

Loading

Labels

2 participants

Add case for 'endobj' in PDF parsing #784

Are you sure you want to change the base?

Add case for 'endobj' in PDF parsing #784

Uh oh!

Conversation

NamelessCoder commented Nov 1, 2025

Type of pull request

About

Checklist for code / configuration changes

k00ni commented Nov 3, 2025

NamelessCoder commented Nov 3, 2025

k00ni left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Labels

2 participants

k00ni left a comment •

edited

Loading