Skip to content

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Dec 5, 2024

This PR introduces a new highlighter, semantic, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query.

In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.

This PR introduces a new highlighter, `semantic`, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query. In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.
Copy link
Contributor

github-actions bot commented Dec 5, 2024

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Dec 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall! Could we also add highlighting YAML tests?

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall.

@jimczi
Copy link
Contributor Author

jimczi commented Dec 6, 2024

@Mikep86 @kderusso I added the yml tests and addressed your other comments.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once CI is green

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jimczi jimczi added the auto-backport Automatically create backport pull requests when merged label Dec 6, 2024
@jimczi jimczi merged commit c580024 into elastic:main Dec 6, 2024
16 checks passed
@jimczi jimczi deleted the semantic_highlighter branch December 6, 2024 18:42
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Dec 6, 2024
This PR introduces a new highlighter, `semantic`, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query. In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments.
elasticsearchmachine pushed a commit that referenced this pull request Dec 6, 2024
* Add Highlighter for Semantic Text Fields (#118064) This PR introduces a new highlighter, `semantic`, tailored for semantic text fields. It extracts the most relevant fragments by scoring nested chunks using the original semantic query. In this initial version, the highlighter returns only the original chunks computed during ingestion. However, this is an implementation detail, and future enhancements could combine multiple chunks to generate the fragments. * Update x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighterTests.java
patrykkopycinski added a commit to elastic/kibana that referenced this pull request Jan 10, 2025
… of inner_hits (#204962) ## Summary Switch to use elastic/elasticsearch#118064 when retrieving Knowledge base Index entry docs Followed testing instructions from #198020 Results: <img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28" src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc" /> <img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38" src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202" /> <img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43" src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd" /> <img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47" src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d" /> <img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50" src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae" />
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Jan 10, 2025
… of inner_hits (elastic#204962) ## Summary Switch to use elastic/elasticsearch#118064 when retrieving Knowledge base Index entry docs Followed testing instructions from elastic#198020 Results: <img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28" src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc" /> <img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38" src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202" /> <img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43" src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd" /> <img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47" src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d" /> <img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50" src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae" /> (cherry picked from commit 5539000)
patrykkopycinski added a commit to elastic/kibana that referenced this pull request Jan 14, 2025
…nstead of inner_hits (#204962) (#206509) # Backport This will backport the following commits from `main` to `8.x`: - [[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)](#204962) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Patryk Kopyciński","email":"contact@patrykkopycinski.com"},"sourceCommit":{"committedDate":"2025-01-10T15:51:38Z","message":"[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use elastic/elasticsearch#118064 when\r\nretrieving Knowledge base Index entry docs\r\n\r\nFollowed testing instructions from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4","branchLabelMapping":{"^v9.0.0$":"main","^v8.18.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","v9.0.0","Feature:Security Assistant","Team:Security Generative AI","backport:version","v8.18.0"],"number":204962,"url":"https://github.com/elastic/kibana/pull/204962","mergeCommit":{"message":"[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use elastic/elasticsearch#118064 when\r\nretrieving Knowledge base Index entry docs\r\n\r\nFollowed testing instructions from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4"}},"sourceBranch":"main","suggestedTargetBranches":["8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","labelRegex":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/204962","number":204962,"mergeCommit":{"message":"[Security Assistant] Migrate semantic_text to use highlighter instead of inner_hits (#204962)\n\n## Summary\r\n\r\nSwitch to use elastic/elasticsearch#118064 when\r\nretrieving Knowledge base Index entry docs\r\n\r\nFollowed testing instructions from\r\nhttps://github.com//pull/198020\r\n\r\nResults:\r\n<img width=\"1498\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 28\"\r\nsrc=\"https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc\"\r\n/>\r\n\r\n<img width=\"1495\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 38\"\r\nsrc=\"https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202\"\r\n/>\r\n\r\n<img width=\"1502\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 43\"\r\nsrc=\"https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd\"\r\n/>\r\n\r\n<img width=\"1491\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 47\"\r\nsrc=\"https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d\"\r\n/>\r\n\r\n<img width=\"1494\" alt=\"Zrzut ekranu 2024-12-19 o 16 32 50\"\r\nsrc=\"https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae\"\r\n/>","sha":"55390001adf8ea1eb1f50d46a4a8bb925a8a33d4"}},{"branch":"8.x","label":"v8.18.0","labelRegex":"^v8.18.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
viduni94 pushed a commit to viduni94/kibana that referenced this pull request Jan 23, 2025
… of inner_hits (elastic#204962) ## Summary Switch to use elastic/elasticsearch#118064 when retrieving Knowledge base Index entry docs Followed testing instructions from elastic#198020 Results: <img width="1498" alt="Zrzut ekranu 2024-12-19 o 16 32 28" src="https://github.com/user-attachments/assets/a16bf729-ac30-4ea7-9b11-6e9ecca842dc" /> <img width="1495" alt="Zrzut ekranu 2024-12-19 o 16 32 38" src="https://github.com/user-attachments/assets/016c08c3-9865-4461-86a5-638e9559b202" /> <img width="1502" alt="Zrzut ekranu 2024-12-19 o 16 32 43" src="https://github.com/user-attachments/assets/37a14a2d-191d-420c-940d-1de649e082fd" /> <img width="1491" alt="Zrzut ekranu 2024-12-19 o 16 32 47" src="https://github.com/user-attachments/assets/e2be1e95-6fc8-4149-b1ff-2e8b8a9a0a8d" /> <img width="1494" alt="Zrzut ekranu 2024-12-19 o 16 32 50" src="https://github.com/user-attachments/assets/38b17f44-e349-46ab-8069-80d1a3fd42ae" />
@Imran-ml
Copy link

Imran-ml commented Jun 4, 2025

Is this semantic highlight supoort the dense vectors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >feature :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 v9.0.0

5 participants