Skip to content

broadening of reverse suffix optimization has led to incorrect matches #1110

@BurntSushi

Description

@BurntSushi

Specifically, this program succeeds in regex 1.9.x but fails in regex 1.10.1:

fn main() -> anyhow::Result<()> { let re = regex::Regex::new(r"(\\N\{[^}]+})|([{}])").unwrap(); let hay = r#"hiya \N{snowman} bye"#; let matches = re.find_iter(hay).map(|m| m.range()).collect::<Vec<_>>(); assert_eq!(matches, vec![5..16]); Ok(()) }

Its output with 1.10.1:

$ cargo run -q thread 'main' panicked at main.rs:7:5: assertion `left == right` failed left: [7..8, 15..16] right: [5..16] note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace 

I believe the issue here was my change to broaden the reverse suffix optimization to use one of many possible literals. But this turns out to be not be quite correct since the rules that govern prefixes don't apply to suffixes. In this case, the literal optimization extracts { and } as suffixes. It looks for a { first and finds a match at that position via the second alternate in the regex. But this winds up missing the match that came before it with the first alternate since the { isn't a suffix of the first alternate.

This is why we should, at least at present, only use this optimization when there is a non-empty longest common suffix. In that case, and only that case, we know that it is a suffix of every possible path through the regex.

Thank you to @charliermarsh for finding this! See: astral-sh/ruff#7980

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions