- Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Affected version: 7.7 -
Problem
.ml-state-write is supposed to be an index alias, however by accident it can become an index. If .ml-state-write is a concrete index instead of an alias, starting a job can fail due to index rollover introduced in #52356.
The reason for .ml-state-write being an index instead of an alias is explained in #57645
From 7.9 the job fails with: Detected a problem with the internal machine learning data: the state index alias ... exists as index but must be an alias.
Mitigation
- if you are ok with re-creating ML models you can delete
.ml-state-write - if you want to preserve state:
- reindex
.ml-state-writeto.ml-state:
- reindex
#--- # reindex # - https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html #--- POST _reindex { "source": { "index": ".ml-state-write", "size": 100 }, "dest": { "index": ".ml-state" } } After the successful reindex, delete the old index and create an alias:
#--- # delete .ml-state-write #--- DELETE /.ml-state-write Now you should be able to start the jobs.
Solution
The issues #57645 and #55267 discuss solutions for preventing the .ml-state-write index. This will solve the root cause of this issue.
For users that have an .ml-state-write index by mistake, this won't help. Because reindex is an expensive operation it's not an option to reindex in the back.
2 possible improvements I can think of:
A: improve log message
The log message isn't very descriptive and does not help for finding a solution quickly. We can improve the message (concrete wording to be discussed): "Expected [.ml-state-write] to be an alias but it is an index, can't start the job. Please reindex [.ml-state-write] to [.ml-state]". It's not possible to write full instructions in a log message, but given the message is part of this, users should find this.
B: do not use ILM if ml-state-write is an index
We could be lenient and simply fall back to the old non-ILM way. We added ILM for a reason, that's why this solution is questionable, however, we talk about 7.x. For upgrading to 8.0 we can require using an update tool and reindex as part of migrating to 8, so eventually the state index will be managed. This solution requires that a .ml-state-write index does not cause problems in other parts of the code.