Skip to content

[ML] Job fail to start with "Invalid alias name [.ml-state-write] ..." #58482

@hendrikmuhs

Description

@hendrikmuhs

Affected version: 7.7 -

Problem

.ml-state-write is supposed to be an index alias, however by accident it can become an index. If .ml-state-write is a concrete index instead of an alias, starting a job can fail due to index rollover introduced in #52356.

The reason for .ml-state-write being an index instead of an alias is explained in #57645

From 7.9 the job fails with: Detected a problem with the internal machine learning data: the state index alias ... exists as index but must be an alias.

Mitigation

  • if you are ok with re-creating ML models you can delete .ml-state-write
  • if you want to preserve state:
    • reindex .ml-state-write to .ml-state:
#--- # reindex # - https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html #--- POST _reindex { "source": { "index": ".ml-state-write", "size": 100 }, "dest": { "index": ".ml-state" } } 

After the successful reindex, delete the old index and create an alias:

#--- # delete .ml-state-write #--- DELETE /.ml-state-write 

Now you should be able to start the jobs.

Solution

The issues #57645 and #55267 discuss solutions for preventing the .ml-state-write index. This will solve the root cause of this issue.

For users that have an .ml-state-write index by mistake, this won't help. Because reindex is an expensive operation it's not an option to reindex in the back.

2 possible improvements I can think of:

A: improve log message

The log message isn't very descriptive and does not help for finding a solution quickly. We can improve the message (concrete wording to be discussed): "Expected [.ml-state-write] to be an alias but it is an index, can't start the job. Please reindex [.ml-state-write] to [.ml-state]". It's not possible to write full instructions in a log message, but given the message is part of this, users should find this.

B: do not use ILM if ml-state-write is an index

We could be lenient and simply fall back to the old non-ILM way. We added ILM for a reason, that's why this solution is questionable, however, we talk about 7.x. For upgrading to 8.0 we can require using an update tool and reindex as part of migrating to 8, so eventually the state index will be managed. This solution requires that a .ml-state-write index does not cause problems in other parts of the code.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions