- Notifications
You must be signed in to change notification settings - Fork 567
Description
This is a follow up to #733, but could also fix some other issues, like #671.
In #733, I wrote: "I'm not sure if it's appropriate to skip Form and PS types here or not". I now think skipping Form XObjects is the right thing to do. NB that a Form XObject isn't a form in the sense of a Form to capture data. It's a Form as in a Shape. (Section 8.10.1 of https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf#page=217 helped my understanding here.)
I'd seen the consistent off-by-one error mentioned in #671 in the PDFs I'm working with. I have some code that's using the position of text to try to find the text inside a given annotation, and it was frequently finding the text to the left of the given annotation.
Each of my PDFs has a Form early on in each page's data, which I believe is the use case of "a form XObject may serve as the template for an entire page" mentioned in the spec.
Skipping the Form when creating the text array, much like we did for Images in the fix for #733, removes this off by one error for me.
MR and examples to follow.