Skip to content

Optimize (?!) in regular expressions #106566

Closed
@serhiy-storchaka

Description

@serhiy-storchaka

Some regular expression engines support (*FAIL) as a pattern which fails to match anything. (?!) is an idiomatic way to write this in engines which do not support (*FAIL).

It works pretty well, but it can be optimized. Instead of compiling it as ASSERT_NOT opcode

>>> re.compile(r'12(?!)|3', re.DEBUG) BRANCH LITERAL 49 LITERAL 50 ASSERT_NOT 1 OR LITERAL 51 0. INFO 9 0b100 1 2 (to 10) in 5. LITERAL 0x31 ('1') 7. LITERAL 0x33 ('3') 9. FAILURE 10: BRANCH 11 (to 22) 12. LITERAL 0x31 ('1') 14. LITERAL 0x32 ('2') 16. ASSERT_NOT 3 0 (to 20) 19. SUCCESS 20: JUMP 7 (to 28) 22: branch 5 (to 27) 23. LITERAL 0x33 ('3') 25. JUMP 2 (to 28) 27: FAILURE 28: SUCCESS re.compile('12(?!)|3', re.DEBUG) 

it can be compiled as FAILURE opcode.

>>> re.compile(r'12(?!)|3', re.DEBUG) BRANCH LITERAL 49 LITERAL 50 FAILURE OR LITERAL 51 0. INFO 9 0b100 1 2 (to 10) in 5. LITERAL 0x31 ('1') 7. LITERAL 0x33 ('3') 9. FAILURE 10: BRANCH 8 (to 19) 12. LITERAL 0x31 ('1') 14. LITERAL 0x32 ('2') 16. FAILURE 17. JUMP 7 (to 25) 19: branch 5 (to 24) 20. LITERAL 0x33 ('3') 22. JUMP 2 (to 25) 24: FAILURE 25: SUCCESS re.compile('12(?!)|3', re.DEBUG) 

Unfortunately I do not know good examples of using (*FAIL) in regular expressions (without using (*SKIP)) to include them in the documentation. Perhaps other patterns of using (*FAIL) could be optimized future, but I do not know what to optimize.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions