Using insert_text() with a subsetted font

Lukes · December 9, 2025, 9:54pm

Hi, I think the title doesn’t capture my situation all that well.

And apologies in advance for a somewhat long post.

I am writing a little python script that takes as input

a larger PDF with a bunch of songs (say 300) in it[1] and
a list of songs to extract (say 20)

and should produce in output two smaller PDFs: one is the listed songs, with their chords and all, as they appear in the source. And the other is the same songs without the chords (so approximately “every other line” from the source PDF). As you can guess: one is for the band, the other for the singer.

I have written all the code to detect what blocks need to be used to produce the result, and where they should go, and the result seems to work correctly if I render the result in a base14 font, say helv (the test source is Arial, so all the metrics happen to work out), but I am having trouble generating the output when I try to carry over the original fonts as well.

The emission stage in my script is a loop that for each page will use page.insert_font(fontname=…, fontbuffer=…) passing over the previously extracted fonts (using get_fonts()) , and then scan all the characters and use page.insert_text((char[“origin”][0] + offsetx, char[“origin”][1] + offsety), char[‘c’], fontname=…, …) for transfering the single characters and carry over all the positions[2].

The issue is that I get all the right content if I use a base14 font, but if I use the original document’s fonts, I get a whole pile of tofu boxes instead. I suspect the reason behind this is that the original font is subsetted, and the character I’m passing in is not mapped through the subset’s CMAP.

It seems that I would need to transfer over the CMAP and pass to a method like insert_text() the glyph indices in the font instead of the unicode codepoints, right? Or register the CMAP with the font, and let pymupdf deal with the unicode to glyph-index lookup? I could find no way of achieving either of these, though. I Googled for this without much useful coming up.

Or even, is there a way that I could just copy-paste the spans across from source to dest? Or something along those lines?

One last thought abot this last question: I do need that the relative positioning of the characters in the original and copied pages are precisely kept (as the chords “float” above the lyrics, and they must not undergo repositioning relative to each other), as well as the ability to remove “every other line” (roughly) to produce “lyrics-only” versions for the singer. For this second reason I suspect an approach based on clipping/cropping and using show_pdf_page() might be too coarse a method of working (and I’m somewhat concerned about clipping with rectangles and letting through small parts of adjacent content that should not be in the output).

Any help much appreciated,

with thanks

Luca

[1] Imagine pages with a few songs like this https://www.pinterest.com/pin/acoustic-guitar-chords-and-lyrics--23784704274174095/ on each page.

[2] Going this route of one char at a time is very very slow, it takes a second or three per song, but I was hoping I might improve on this once the rest of the script was working correctly. I don’t love it.

HaraldLieder · December 11, 2025, 12:38am

As you already indicated, doing such a thing on the basis of a font subset in another PDF is close to being a hopeless undertaking.
Instead consider using Page.show_pdf_page. This allows you to display parts of a source page in another PDF inside a given target rectangle of the current page.

The to-be-displayed source page area can be restricted to a “clip” sub-rectangle.
The target rectangle does not need to have the same size or aspect ratio: The method will always center the clip content centered vertically and horizontally, and scale up or down until either the target rectangle’s width or height are fully used up.
If desired, you can also give up the aspect ratio, which will then fully fill up the target rect.

Topic		Replies	Views
PyMuPDF and using custom TrueType font How To font , text-insertion , text	2	56	September 24, 2025
Any idea what is wrong with this PDF? PyMuPDF	6	138	July 9, 2025
How to debug - no insert_text found in new pdf PyMuPDF text-insertion	8	31	November 20, 2025
PDF Translate to another language having issue translated text going out of page width PyMuPDF	5	22	September 12, 2025
How to fix code=4: no font file for digest? How To	3	75	June 30, 2025

Using insert_text() with a subsetted font

Related topics