Skip to content

Python 3.11.0b3 chockes on regex sub from rjsmin, time grows seemingly exponentially #94675

@hroncok

Description

@hroncok

Bug report

When building chromium in Fedora 37 with Python 3.11.0b3, the build hangs seemingly forever.

While isolating the hang, we've noticed it happens in rjsmin, at https://github.com/ndparker/rjsmin/blob/1.2.0/rjsmin.py#L361

I've further reduced the javascript input as much as I could but I kept the regex pattern intact.

Here's the reporucer:

import re, sys # pattern from rjsmin # Copyright 2011 - 2021 André Malo or his licensors, as applicable. # Apache License Version 2.0 # https://github.com/ndparker/rjsmin/blob/1.2.0/rjsmin.py#L196 pattern = '(?<=[(,=:\\[!&|?{};\\r\\n+*-])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*((?:/(?![\\r\\n/*])[^/\\\\\\[\\r\\n]*(?:(?:\\\\[^\\r\\n]|(?:\\[[^\\\\\\]\\r\\n]*(?:\\\\[^\\r\\n][^\\\\\\]\\r\\n]*)*\\]))[^/\\\\\\[\\r\\n]*)*/[a-z]*))((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)+(?=[^\\000-\\040&)+,.:;=?\\]|}-]))?' # (originally reported as) pattern = '([^\\047"\\140/\\000-\\040]+)|((?:(?:\\047[^\\047\\\\\\r\\n]*(?:\\\\(?:[^\\r\\n]|\\r?\\n|\\r)[^\\047\\\\\\r\\n]*)*\\047)|(?:"[^"\\\\\\r\\n]*(?:\\\\(?:[^\\r\\n]|\\r?\\n|\\r)[^"\\\\\\r\\n]*)*")|(?:\\140[^\\140\\\\]*(?:\\\\(?:[^\\r\\n]|\\r?\\n|\\r)[^\\140\\\\]*)*\\140))[^\\047"\\140/\\000-\\040]*)|(?<=[)])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*((?:/(?![\\r\\n/*])[^/\\\\\\[\\r\\n]*(?:(?:\\\\[^\\r\\n]|(?:\\[[^\\\\\\]\\r\\n]*(?:\\\\[^\\r\\n][^\\\\\\]\\r\\n]*)*\\]))[^/\\\\\\[\\r\\n]*)*/[a-z]*))(?=(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*\\.(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*[a-z])|(?<=[(,=:\\[!&|?{};\\r\\n+*-])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*((?:/(?![\\r\\n/*])[^/\\\\\\[\\r\\n]*(?:(?:\\\\[^\\r\\n]|(?:\\[[^\\\\\\]\\r\\n]*(?:\\\\[^\\r\\n][^\\\\\\]\\r\\n]*)*\\]))[^/\\\\\\[\\r\\n]*)*/[a-z]*))((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)+(?=[^\\000-\\040&)+,.:;=?\\]|}-]))?|(?<=[\\000-#%-,./:-@\\[-^\\140{-~-]return)(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:((?:(?://[^\\r\\n]*)?[\\r\\n]))(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)*((?:/(?![\\r\\n/*])[^/\\\\\\[\\r\\n]*(?:(?:\\\\[^\\r\\n]|(?:\\[[^\\\\\\]\\r\\n]*(?:\\\\[^\\r\\n][^\\\\\\]\\r\\n]*)*\\]))[^/\\\\\\[\\r\\n]*)*/[a-z]*))((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)+(?=[^\\000-\\040&)+,.:;=?\\]|}-]))?|(?<=[^\\000-!#%&(*,./:-@\\[\\\\^{|~])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*(?:((?:(?://[^\\r\\n]*)?[\\r\\n]))(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)+(?=[^\\000-\\040"#%-\\047)*,./:-@\\\\-^\\140|-~])|(?<=[^\\000-#%-,./:-@\\[-^\\140{-~-])((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/)))+(?=[^\\000-#%-,./:-@\\[-^\\140{-~-])|(?<=\\+)((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/)))+(?=\\+)|(?<=-)((?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/)))+(?=-)|(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))+|(?:(?:(?://[^\\r\\n]*)?[\\r\\n])(?:[\\000-\\011\\013\\014\\016-\\040]|(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/))*)+' input_data = """a(function() {{  {} }}); """.format('/' * int(sys.argv[1])) print(re.compile(pattern).sub('', input_data))

The original input reduced from Chromium's JavaScript file looks like:

a(function() { ////////////////////////////////////////////////////////////////////////// }); 

And here how long it takes with Python 3.10.5 and 3.11.0b3 for a reduced number of slashes.

$ time python3.10 reproducer.py 30 real 0m0,033s $ time python3.11 reproducer.py 30 real 0m0,082s $ time python3.10 reproducer.py 40 real 0m0,034s $ time python3.11 reproducer.py 40 real 0m6,902s $ time python3.10 reproducer.py 45 real 0m0,032s $ time python3.11 reproducer.py 45 real 1m25,004s $ time python3.10 reproducer.py 50 real 0m0,031s $ time python3.11 reproducer.py 50 ??? killed after 10+ minutes 

For 74 slashes, I guess it would run forever.

Your environment

  • CPython versions tested on: 3.11.0b3
  • Operating system and architecture: Fedora Linux 35 or 37, x86_64

Metadata

Metadata

Assignees

No one assigned

    Labels

    testsTests in the Lib/test dirtopic-regextype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions