Skip to content

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented May 23, 2025

Starting with Lucene 10, CharacterRunAutomaton is no longer determinized automatically.
In Elasticsearch 9, we adapted to this by eagerly determinizing automatons early (via Regex#simpleMatchToAutomaton). However, this introduced regression: operations like index template conflict checks, which only require intersection testing, now pay the cost of determinization, an expensive step that wasn’t needed before. In some cases, especially when many wildcard patterns are involved, determinization can even fail due to state explosion.

This change removes the unnecessary determinization for index patterns conflict check, restoring the pre-9.0 behavior and allowing valid index templates with many patterns to be registered again.

closes: #127972

Starting with Lucene 10, `CharacterRunAutomaton` is no longer determinized automatically. In Elasticsearch 9, we adapted to this by eagerly determinizing automatons early (via `Regex#simpleMatchToAutomaton`). However, this introduced regression: operations like index template conflict checks, which only require intersection testing, now pay the cost of determinization—an expensive step that wasn’t needed before. In some cases, especially when many wildcard patterns are involved, determinization can even fail due to state explosion. This change removes the unnecessary determinization, restoring the pre-9.0 behavior and allowing valid index templates with many patterns to be registered again.
@jimczi jimczi requested a review from a team as a code owner May 23, 2025 10:13
@jimczi jimczi added >bug :Data Management/Indices APIs APIs to create and manage indices and templates v9.1.0 labels May 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label May 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

Copy link
Contributor

@pawankartik-elastic pawankartik-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Leaving 2 tiny comments.

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for tagging us

@jimczi jimczi merged commit 8312613 into elastic:main Jun 2, 2025
18 checks passed
@jimczi jimczi deleted the index_patterns_automaton branch June 2, 2025 08:39
@benwtrent
Copy link
Member

This should be backported to 9.0.x right? Its existed ever since the lucene 10 upgrade.

@pawankartik-elastic @jimczi

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jun 2, 2025
…lastic#128362) Starting with Lucene 10, `CharacterRunAutomaton` is no longer determinized automatically. In Elasticsearch 9, we adapted to this by eagerly determinizing automatons early (via `Regex#simpleMatchToAutomaton`). However, this introduced regression: operations like index template conflict checks, which only require intersection testing, now pay the cost of determinization—an expensive step that wasn’t needed before. In some cases, especially when many wildcard patterns are involved, determinization can even fail due to state explosion. This change removes the unnecessary determinization, restoring the pre-9.0 behavior and allowing valid index templates with many patterns to be registered again.
elasticsearchmachine pushed a commit that referenced this pull request Jun 2, 2025
…128362) (#128759) Starting with Lucene 10, `CharacterRunAutomaton` is no longer determinized automatically. In Elasticsearch 9, we adapted to this by eagerly determinizing automatons early (via `Regex#simpleMatchToAutomaton`). However, this introduced regression: operations like index template conflict checks, which only require intersection testing, now pay the cost of determinization—an expensive step that wasn’t needed before. In some cases, especially when many wildcard patterns are involved, determinization can even fail due to state explosion. This change removes the unnecessary determinization, restoring the pre-9.0 behavior and allowing valid index templates with many patterns to be registered again.
mridula-s109 pushed a commit that referenced this pull request Jun 2, 2025
…128362) Starting with Lucene 10, `CharacterRunAutomaton` is no longer determinized automatically. In Elasticsearch 9, we adapted to this by eagerly determinizing automatons early (via `Regex#simpleMatchToAutomaton`). However, this introduced regression: operations like index template conflict checks, which only require intersection testing, now pay the cost of determinization—an expensive step that wasn’t needed before. In some cases, especially when many wildcard patterns are involved, determinization can even fail due to state explosion. This change removes the unnecessary determinization, restoring the pre-9.0 behavior and allowing valid index templates with many patterns to be registered again.
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 3, 2025
…lastic#128362) Starting with Lucene 10, `CharacterRunAutomaton` is no longer determinized automatically. In Elasticsearch 9, we adapted to this by eagerly determinizing automatons early (via `Regex#simpleMatchToAutomaton`). However, this introduced regression: operations like index template conflict checks, which only require intersection testing, now pay the cost of determinization—an expensive step that wasn’t needed before. In some cases, especially when many wildcard patterns are involved, determinization can even fail due to state explosion. This change removes the unnecessary determinization, restoring the pre-9.0 behavior and allowing valid index templates with many patterns to be registered again.
joshua-adams-1 pushed a commit to joshua-adams-1/elasticsearch that referenced this pull request Jun 3, 2025
…lastic#128362) Starting with Lucene 10, `CharacterRunAutomaton` is no longer determinized automatically. In Elasticsearch 9, we adapted to this by eagerly determinizing automatons early (via `Regex#simpleMatchToAutomaton`). However, this introduced regression: operations like index template conflict checks, which only require intersection testing, now pay the cost of determinization—an expensive step that wasn’t needed before. In some cases, especially when many wildcard patterns are involved, determinization can even fail due to state explosion. This change removes the unnecessary determinization, restoring the pre-9.0 behavior and allowing valid index templates with many patterns to be registered again.
Samiul-TheSoccerFan pushed a commit to Samiul-TheSoccerFan/elasticsearch that referenced this pull request Jun 5, 2025
…lastic#128362) Starting with Lucene 10, `CharacterRunAutomaton` is no longer determinized automatically. In Elasticsearch 9, we adapted to this by eagerly determinizing automatons early (via `Regex#simpleMatchToAutomaton`). However, this introduced regression: operations like index template conflict checks, which only require intersection testing, now pay the cost of determinization—an expensive step that wasn’t needed before. In some cases, especially when many wildcard patterns are involved, determinization can even fail due to state explosion. This change removes the unnecessary determinization, restoring the pre-9.0 behavior and allowing valid index templates with many patterns to be registered again.
cbuescher added a commit to cbuescher/elasticsearch that referenced this pull request Sep 5, 2025
We already fixed an issue with this in elastic#128362 but apparently another instance of the unnecessary determinization was hiding elsewhere and in its current state throws exceptions starting with Lucene 10 on complext patterns. This change adds the same fix as elastic#128362 and adds a test that would have triggered this. Closes elastic#133652
cbuescher added a commit that referenced this pull request Sep 8, 2025
We already fixed an issue with this in #128362 but apparently another instance of the unnecessary determinization was hiding elsewhere and in its current state throws exceptions starting with Lucene 10 on complext patterns. This change adds the same fix as #128362 and adds a test that would have triggered this. Closes #133652
cbuescher added a commit to cbuescher/elasticsearch that referenced this pull request Sep 8, 2025
We already fixed an issue with this in elastic#128362 but apparently another instance of the unnecessary determinization was hiding elsewhere and in its current state throws exceptions starting with Lucene 10 on complext patterns. This change adds the same fix as elastic#128362 and adds a test that would have triggered this. Closes elastic#133652
cbuescher added a commit to cbuescher/elasticsearch that referenced this pull request Sep 8, 2025
We already fixed an issue with this in elastic#128362 but apparently another instance of the unnecessary determinization was hiding elsewhere and in its current state throws exceptions starting with Lucene 10 on complext patterns. This change adds the same fix as elastic#128362 and adds a test that would have triggered this. Closes elastic#133652
elasticsearchmachine pushed a commit that referenced this pull request Sep 8, 2025
We already fixed an issue with this in #128362 but apparently another instance of the unnecessary determinization was hiding elsewhere and in its current state throws exceptions starting with Lucene 10 on complext patterns. This change adds the same fix as #128362 and adds a test that would have triggered this. Closes #133652
elasticsearchmachine pushed a commit that referenced this pull request Sep 8, 2025
) * Fix exceptions in index pattern conflict checks (#134231) We already fixed an issue with this in #128362 but apparently another instance of the unnecessary determinization was hiding elsewhere and in its current state throws exceptions starting with Lucene 10 on complext patterns. This change adds the same fix as #128362 and adds a test that would have triggered this. Closes #133652 * Fix test compilation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Data Management/Indices APIs APIs to create and manage indices and templates Team:Data Management Meta label for data/management team v9.0.3 v9.1.0

6 participants