Skip to content

Conversation

@Licheam
Copy link
Contributor

@Licheam Licheam commented Jul 21, 2023

Replace the canonicalize function the in-place way with constant memory.
Replace the negate function the in-place way with constant memory.
Simplify the difference function without changing the algorithm.
Implement symmetric_difference function like others.
Implement push function like others.
Force the symmetric_difference function in Interval return ranges in order.

Licheam added 2 commits July 21, 2023 20:32
Replace the old canonicalize part with the in-place way with constant memory.
Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w00t! This is awesome. Nice work. :-)

@Licheam Licheam changed the title syntax: get canonicalization down in-place with constant memory. syntax: optimize some functions in IntervalSet. Jul 22, 2023
@Licheam Licheam marked this pull request as draft July 22, 2023 09:54
@Licheam Licheam marked this pull request as ready for review July 22, 2023 10:20
@BurntSushi
Copy link
Member

I see you're working on more stuff here. If possible, please avoid merge commits. Instead, please rebase and force push.

@Licheam
Copy link
Contributor Author

Licheam commented Jul 22, 2023

I see you're working on more stuff here. If possible, please avoid merge commits. Instead, please rebase and force push.

Sorry about the inconvenience I brought here, and thank you for your professional advices. I am still learning to contribute codes in public repo. The work on this PR is down from my perspective and sorry again for my inexperienced in git.

@BurntSushi
Copy link
Member

No need to apologize! Everyone deals with git and commit history a little differently. If you look at the commit log for this repo, it's a straight linear line with (usually) decent messages. Basically, if you can just avoid merges then it should be easier for me to massage your PR.

@Licheam
Copy link
Contributor Author

Licheam commented Jul 23, 2023

I see. Thx for the explanation. I'll pay attention to this point in future potential PRs.

@Licheam Licheam force-pushed the interval_canoicalization_opt branch from 5c97a7b to c4b718f Compare July 23, 2023 04:21
@Licheam Licheam force-pushed the interval_canoicalization_opt branch from c4b718f to a71e3c8 Compare July 23, 2023 04:27
@Licheam Licheam marked this pull request as draft July 23, 2023 06:06
@Licheam Licheam marked this pull request as ready for review July 23, 2023 09:44
@Licheam Licheam force-pushed the interval_canoicalization_opt branch from 2f2db21 to dd46d34 Compare July 23, 2023 13:50
BurntSushi pushed a commit that referenced this pull request Oct 6, 2023
This reduces or eliminates allocation when combining Unicode classes and should make some things faster. It's unlikely for these optimizations to matter much in practice, but they are likely to help in niche or pathological cases where there are a lot of ops in a class. Closes #1051
Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Some of this looks... quite gnarly. But I feel pretty confident with the test suite. I also tried some random mutations to provoke test failures, and it did. So that's good too.

This is really great work. I remember banging my head against the wall trying to write these routines without using extra memory and just gave up.

BurntSushi pushed a commit that referenced this pull request Oct 8, 2023
This reduces or eliminates allocation when combining Unicode classes and should make some things faster. It's unlikely for these optimizations to matter much in practice, but they are likely to help in niche or pathological cases where there are a lot of ops in a class. Closes #1051
BurntSushi pushed a commit that referenced this pull request Oct 8, 2023
This reduces or eliminates allocation when combining Unicode classes and should make some things faster. It's unlikely for these optimizations to matter much in practice, but they are likely to help in niche or pathological cases where there are a lot of ops in a class. Closes #1051
BurntSushi pushed a commit that referenced this pull request Oct 9, 2023
This reduces or eliminates allocation when combining Unicode classes and should make some things faster. It's unlikely for these optimizations to matter much in practice, but they are likely to help in niche or pathological cases where there are a lot of ops in a class. Closes #1051
@BurntSushi BurntSushi closed this in 6d2b09e Oct 9, 2023
BurntSushi added a commit that referenced this pull request Oct 11, 2023
This reverts commit 6d2b09e. Sadly I just don't have the time to fix this code myself. It's too subtle. So I'm just reverting it entirely for now. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63155 Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63153 Ref #1051
BurntSushi added a commit that referenced this pull request Oct 11, 2023
This reverts commit 6d2b09e. Sadly I just don't have the time to fix this code myself. It's too subtle. So I'm just reverting it entirely for now. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63155 Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63153 Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63168 Ref #1051
@BurntSushi
Copy link
Member

FYI, I ended up needing to revert this change because of bugs caught by OSS-fuzz. See #1102 for more info.

BurntSushi added a commit that referenced this pull request Oct 11, 2023
This reverts commit 6d2b09e. Sadly I just don't have the time to fix this code myself. It's too subtle. So I'm just reverting it entirely for now. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63155 Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63153 Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63168 Ref #1051
BurntSushi added a commit that referenced this pull request Oct 11, 2023
This reverts commit 6d2b09e. Sadly I just don't have the time to fix this code myself. It's too subtle. So I'm just reverting it entirely for now. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63155 Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63153 Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=63168 Ref #1051
BurntSushi added a commit that referenced this pull request Oct 12, 2023
Licheam added a commit to Licheam/regex that referenced this pull request Oct 12, 2023
@Licheam
Copy link
Contributor Author

Licheam commented Oct 12, 2023

FYI, I ended up needing to revert this change because of bugs caught by OSS-fuzz. See #1102 for more info.

Based on the test cases, it turns out that the implementation of push is not very thoughtful. I refractor the whole push function making it more direct and readable while fixing the bugs found. Sorry for my carelessness and thx for the test cases. I made a PR #1104 for the change.

Marwes added a commit to Marwes/regex that referenced this pull request Oct 20, 2025
Canonicalize is taking up a significant amount due to a regex with a huge amount of character ranges (generated by [lalrpop](https://github.com/lalrpop/lalrpop)'s lexer expanding multiple `\w` in a token). While this could perhaps be fixed in lalrpop I did notice the TODO in the code and after addressing this so we automatically union and compress on each push instead of re-canonicalizing on every push and that fixed the performance problem. I did see the earlier attempt at this rust-lang#1051 and it seems like that was reverted and regression tests were added so I hope that and the existing tests are enough (I don't have a clear idea on what tests might be missing).
Marwes added a commit to Marwes/regex that referenced this pull request Oct 20, 2025
Canonicalize is taking up a significant amount due to a regex with a huge amount of character ranges (generated by [lalrpop](https://github.com/lalrpop/lalrpop)'s lexer expanding multiple `\w` in a token). While this could perhaps be fixed in lalrpop I did notice the TODO in the code and after addressing this so we automatically union and compress on each push instead of re-canonicalizing on every push and that fixed the performance problem. I did see the earlier attempt at this rust-lang#1051 and it seems like that was reverted and regression tests were added so I hope that and the existing tests are enough (I don't have a clear idea on what tests might be missing).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants