Skip to content

Conversation

@prwhelan
Copy link
Member

Bug: when the indexer state fails to save during a cluster state update, the Transform is stuck in STOPPING and cannot be restarted unless the user force stops to delete the task.

Fix: the task will continuously retry starting the indexer until the cluster state update can succeed.

Notes:

  • users can cancel the retry by force stopping the transform
  • the retry is displayed in the UI as "degraded" with a message as to why the transform is restarting
  • the transform now displays as STARTING rather than STOPPING until it successfully starts
  • the retry is audited so it displays in the Messages tab of the UI
  • the retry timer is randomly selected between 45s and 90s, this should help during rolling restarts for clusters that have a large amount of transforms

Fix #128221

Bug: when the indexer state fails to save during a cluster state update, the Transform is stuck in STOPPING and cannot be restarted unless the user force stops to delete the task. Fix: the task will continuously retry starting the indexer until the cluster state update can succeed. Notes: - users can cancel the retry by force stopping the transform - the retry is displayed in the UI as "degraded" with a message as to why the transform is restarting - the transform now displays as STARTING rather than STOPPING until it successfully starts - the retry is audited so it displays in the Messages tab of the UI - the retry timer is randomly selected between 45s and 90s, this should help during rolling restarts for clusters that have a large amount of transforms Fix elastic#128221
@prwhelan prwhelan added >bug :ml/Transform Transform Team:ML Meta label for the ML team v9.2.0 labels Jul 28, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @prwhelan, I've created a changelog YAML for you.

@prwhelan prwhelan marked this pull request as ready for review August 4, 2025 12:15
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

params.getId(),
Strings.format(
"Failed while starting Transform. Automatically retrying every [%s] seconds. "
+ "To cancel retries, force stop this transform. Failure: [%s]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about including the force stop command in the message here? Or is it expected that the user would do that via a UI button (I'm imagining it being done from the dev console)?

@prwhelan prwhelan merged commit 0f65d79 into elastic:main Aug 12, 2025
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :ml/Transform Transform Team:ML Meta label for the ML team v9.2.0

3 participants