BUG: pymupdf4llm list index out of range in document_layout.py (2)

robvd · December 3, 2025, 9:51am

I stumbled on another list index out of range. When parsing a large file using pymupdf.layout+pymupdf4llm the following traceback is encountered:

Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/pymupdf4llm/__init__.py", line 83, in to_markdown parsed_doc = parse_document( File "/usr/local/lib/python3.10/site-packages/pymupdf4llm/__init__.py", line 42, in parse_document return document_layout.parse_document( File "/usr/local/lib/python3.10/site-packages/pymupdf4llm/helpers/document_layout.py", line 908, in parse_document utils.clean_tables(page, blocks) File "/usr/local/lib/python3.10/site-packages/pymupdf4llm/helpers/utils.py", line 261, in clean_tables y_vals = [y_vals0[0]] IndexError: list index out of range

Versions:

pymupdf4llm: 0.2.5

pymupdf-layout: 1.26.6

The commands used were:

doc=pymupdf.open(pdf_name) md_chunks = pymupdf4llm.to_markdown(doc)

The size of the PDF file is 142MB so I cannot upload it here.

p.s. these files belong to the open data of the Dutch government and are important to parse. Unfortunately there is a great variety in quality and size of these files. On the other hand, they are great test cases

HaraldLieder · December 3, 2025, 5:26pm

This problem should have been fixed in pymupdf4llm version 0.2.6.

Jamie_Lemon · December 3, 2025, 9:47pm

@robvd Are you able to share the open data link to the PDFs maybe? Hoping indeed that the new PyMuPDF4LLM 0.2.6 resolves your issue, at least it resolved the similar issue here: BUG: list index out of range using new layout feature - #10 by Jamie_Lemon

robvd · December 4, 2025, 8:19am

It is indeed working with version 0.2.6.

@Jamie_Lemon I had stored this file locally once because it caused trouble - unfortunately I did not save the original url. If you want I can send the file e.g. using WeTransfer, just dm me your email address.

Topic		Replies	Views
BUG: pymupdf4llm list index out of range in document_layout.py PyMuPDF	9	26	December 2, 2025
BUG: list index out of range using new layout feature PyMuPDF	16	40	December 11, 2025
Pymupdf4llm parsing takes excessively long time PyMuPDF	2	24	December 4, 2025
BUG: parameter page_chunks is ignored when passed to pymupdf4llm.to_markdown PyMuPDF	2	8	December 8, 2025
BUG: pymupdf.mupdf.FzErrorArgument: code=4: Invalid bandwriter header dimensions/setup PyMuPDF	3	10	December 6, 2025

BUG: pymupdf4llm list index out of range in document_layout.py (2)

Related topics