Skip to content

DeprecationWarning: invalid escape sequence #482

@stephenfin

Description

@stephenfin

Bug description

Calling Page.getText('blocks') on PDFs that contain invalid Python escape sequences (e.g. \ ) result in the following warnings:

../fitz/fitz.py:5404: DeprecationWarning: invalid escape sequence '\ ' return _fitz.TextPage_extractBLOCKS(self, lines) 

This is a warning now but may or may not be an error in Python 3.10.

To Reproduce (mandatory)

  1. Create the following test script and save as test.py:

     import sys import fitz pdf = fitz.open(sys.argv[1]) for page in pdf.pages(): page.getText('blocks') 
  2. Save the attached file locally

  3. Run the script against the file with deprecation warnings enabled:

     PYTHONWARNINGS=d python3 test.py test_aafigure.pdf 

Expected behavior (optional)

The strings should be marked as rawstring (e.g. r'\ ') internally or escaped.

Screenshots (optional)

N/A

Your configuration (mandatory)

  • Fedora 31 (64 bit)
  • Python 3.7.6
  • PyMuPDF 1.16.16, wheel
3.7.6 (default, Jan 30 2020, 09:44:41) [GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] linux 
PyMuPDF 1.16.16: Python bindings for the MuPDF 1.16.0 library. Version date: 2020-03-29 09:44:30. Built for Python 3.7 on linux (64-bit). 

Additional context (optional)

I did try to fix this myself, but I haven't worked with SWIG (or Python bindings to a C lib) before and got lost. Sorry 😞

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions