Skip to content

Conversation

lkts
Copy link
Contributor

@lkts lkts commented Apr 3, 2025

This also covers annotated_text because AnnotatedTextFieldType inherits TextFieldType.

@elasticsearchmachine elasticsearchmachine added v9.1.0 needs:triage Requires assignment of a team area label labels Apr 3, 2025
// See TextFieldMapper.SyntheticSourceHelper#usingSyntheticSourceDelegate
// and TextFieldMapper#canUseSyntheticSourceDelegateForLoading().
boolean usingSyntheticSourceDelegate = docValues || store;
boolean canUseSyntheticSourceDelegateForLoading = usingSyntheticSourceDelegate && ignoreAbove == null && (index || store);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why do we require multi field to be indexed here (the actual impl is in TextFieldMapper).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is strange (only doc values / stored should matter). What failure occurs when we don't check for index here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll get different results sometimes, i can't remember now.

return KeywordFieldBlockLoaderTests.expectedValue(keywordMultiFieldMapping, value, params, testContext);
}

// Even if multi-field is not eligible for loading it can still be used to produce synthetic source
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing is due to the fact that not all syntheticSourceDelegate are eligible to be used by block loader.

// KeywordFieldBlockLoaderTest
// It is here since KeywordFieldBlockLoaderTest does not really need it
if (params.syntheticSource() && testContext.forceFallbackSyntheticSource() == false && usingSyntheticSourceDelegate) {
var nullValue = (String) keywordMultiFieldMapping.get("null_value");
Copy link
Contributor Author

@lkts lkts Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If syntheticSourceDelegate is used than null_value of the multi field will be in the resulting synthetic source. This is good because we'll preserve such values during reindex. On the other hand this is inconsistent - synthetic source for text behaves differently depending on the presence of multi field and mapping parameters of multi field (text ignores nulls).

if (params.syntheticSource() && testContext.forceFallbackSyntheticSource() == false && usingSyntheticSourceDelegate) {
var nullValue = (String) keywordMultiFieldMapping.get("null_value");

// Due to how TextFieldMapper#blockReaderDisiLookup works this is complicated.
Copy link
Contributor Author

@lkts lkts Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another inconsistency - block loader returns a null_value from multi field instead of null only if text field itself is not indexed. If it is, it returns nothing.

@lkts lkts added >enhancement auto-backport Automatically create backport pull requests when merged :StorageEngine/Mapping The storage related side of mappings v8.19.0 labels Apr 3, 2025
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Apr 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Sasha!

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

// See TextFieldMapper.SyntheticSourceHelper#usingSyntheticSourceDelegate
// and TextFieldMapper#canUseSyntheticSourceDelegateForLoading().
boolean usingSyntheticSourceDelegate = docValues || store;
boolean canUseSyntheticSourceDelegateForLoading = usingSyntheticSourceDelegate && ignoreAbove == null && (index || store);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is strange (only doc values / stored should matter). What failure occurs when we don't check for index here?

@lkts lkts merged commit 21ff72b into elastic:main Apr 7, 2025
17 checks passed
@lkts lkts deleted the synthetic_source_block_loader_text branch April 7, 2025 16:32
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126237

@lkts
Copy link
Contributor Author

lkts commented Apr 7, 2025

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

lkts added a commit that referenced this pull request Apr 7, 2025
…6430) (cherry picked from commit 21ff72b) # Conflicts: #	server/src/test/java/org/elasticsearch/index/mapper/blockloader/KeywordFieldBlockLoaderTests.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >enhancement :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.19.0 v9.1.0

4 participants