Skip to content

Conversation

@kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented Dec 18, 2024

Indexes configured with logsdb mode currently use sorting on [host.name, @timestamp] by default, with keyword host.name getting injected if not present. This can lead to problems, if e.g. host is a leaf field, while sorting on an empty host.name is wasteful.

To fix this, the logsdb index settings provider is extended to detect when field host.name is present and trigger sorting on it (in addition to @timestamp) by injecting an index setting that is later used in IndexSortConfig during IndexSettings initialization. Otherwise, logsdb indexes are only sorted on @timestamp by default.

The downside is that the provider may need to inspect mappings through MapperService more often, a costly operation. The impact will be monitored in nightlies, once this is submitted, and logic will be further optimized as needed.

Fixes #118686

@kkrik-es kkrik-es added >enhancement auto-backport Automatically create backport pull requests when merged v8.18.0 and removed >non-issue labels Dec 23, 2024
@kkrik-es kkrik-es marked this pull request as ready for review December 23, 2024 14:20
@kkrik-es kkrik-es requested a review from felixbarny December 23, 2024 14:31
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think introducing a setting that is set by index provider when we can index sort on hostname is more robust.

I left a few comments, otherwise looks good!

}
}

// TODO: remove this when _source.mode attribute has been removed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this TODO here. After #119072 has been merged, let's try to optimize this in order to address the elastic/elasticsearch-benchmarks#2232 regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we optimize it, though? With this change, we still need to check dynamic templates etc for the presence of host.name.

"index.logsdb.use_default_sort_config",
false,
Property.IndexScope,
Property.Final
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add Property.ServerlessPublic also here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iiuc Property.ServerlessPublic is used to allow users to set this. This setting should never be set by users, it's only introduced to have the logsdb settings provider control the sort config. Do I understand this correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in that case we should add Property.PrivateIndex? I think this would ensure that only we can use it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done.

@martijnvg
Copy link
Member

I didn't realize the fact that more integrations are relying on dynamic mappings for host.name field. I think in that case we do want to inject the host.name field, so that we can configure host.name as primary index sort field. Then I think for this PR, we want to just check mappings for cases when injecting host.name field leads to mapping conflicts? I think just check that host field isn't a leaf is sufficient? (in case subobjects is enabled) If that is the case we then we should just sort by timestamp.

@felixbarny
Copy link
Member

Then I think for this PR, we want to just check mappings for cases when injecting host.name field leads to mapping conflicts?

Makes sense to me.

I think just check that host field isn't a leaf is sufficient? (in case subobjects is enabled)

Do we also need to check if host.name is already defined (possibly with a different mapping) and avoid injecting it in that case? Whether or not we can sort on that host.name mapping depends on how it's mapped (only boolean, numeric, date and keyword fields with doc_values can be sorted).

@kkrik-es
Copy link
Contributor Author

kkrik-es commented Jan 9, 2025

Do we also need to check if host.name is already defined (possibly with a different mapping) and avoid injecting it in that case? Whether or not we can sort on that host.name mapping depends on how it's mapped (only boolean, numeric, date and keyword fields with doc_values can be sorted).

Let me try to incorporate all this in the PR and add tests.

# Conflicts: #	rest-api-spec/build.gradle
@martijnvg
Copy link
Member

I think we can try to merge in the static host.name mapping using mapper service and check if it succeeds. If it does succeed, then we should sort on host.name and timestamp otherwise just on timestamp.

@felixbarny
Copy link
Member

Will the merge succeed if the users' mapping also maps host.name as a keyword field but disables doc_values? I guess in this case, we should only sort by timestamp.

@martijnvg
Copy link
Member

I guess in this case, we should only sort by timestamp.

Good point. We should check whether it has doc values and otherwise fallback to index sorting only by timestamp.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

"index.logsdb.use_default_sort_config",
false,
Property.IndexScope,
Property.Final
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in that case we should add Property.PrivateIndex? I think this would ensure that only we can use it here.

@kkrik-es kkrik-es merged commit 5baf5af into elastic:main Jan 10, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 118968

@kkrik-es
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

kkrik-es added a commit to kkrik-es/elasticsearch that referenced this pull request Jan 10, 2025
…8968) * Skip injecting `host.name` for incompatible mappings in logsdb mode * spotless * Update docs/changelog/118856.yaml * fix tsdb * Configure index sorting through index settings for logsdb * fix synthetic source usage * skip injecting host.name * fix test * fix compat * more tests * add index versioning * add index versioning * add index versioning * minor refactoring * Update docs/changelog/118968.yaml * address comments * inject host.name when possible * check subobjects * private settings (cherry picked from commit 5baf5af) # Conflicts: #	rest-api-spec/build.gradle #	server/src/main/java/org/elasticsearch/index/IndexVersions.java #	server/src/main/resources/META-INF/services/org.elasticsearch.features.FeatureSpecification #	x-pack/plugin/logsdb/src/main/java/org/elasticsearch/xpack/logsdb/LogsDBPlugin.java
@kkrik-es kkrik-es deleted the fix/118686-2 branch January 10, 2025 12:04
elasticsearchmachine pushed a commit that referenced this pull request Jan 10, 2025
) (#119936) * Configure index sorting through index settings for logsdb (#118968) * Skip injecting `host.name` for incompatible mappings in logsdb mode * spotless * Update docs/changelog/118856.yaml * fix tsdb * Configure index sorting through index settings for logsdb * fix synthetic source usage * skip injecting host.name * fix test * fix compat * more tests * add index versioning * add index versioning * add index versioning * minor refactoring * Update docs/changelog/118968.yaml * address comments * inject host.name when possible * check subobjects * private settings (cherry picked from commit 5baf5af) # Conflicts: #	rest-api-spec/build.gradle #	server/src/main/java/org/elasticsearch/index/IndexVersions.java #	server/src/main/resources/META-INF/services/org.elasticsearch.features.FeatureSpecification #	x-pack/plugin/logsdb/src/main/java/org/elasticsearch/xpack/logsdb/LogsDBPlugin.java * update index versions * [CI] Auto commit changes from spotless * update index versions --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
kkrik-es added a commit to kkrik-es/elasticsearch that referenced this pull request Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 participants