Skip to content

Allow scripted bulk updates on indices with semantic text fields #136074

@Mpdreamz

Description

@Mpdreamz

Description

Allow scripted bulk updates on indices with semantic text fields to determine whether a noop or full reindex of the doc is necessary.

Indexing semantic_text fields is a resource heavy tasl, a common access pattern would therefor be to only update documents if they have not changed.

Imagine a doc under GET /my-index/_doc/1

{ "hash": "SOME-HASH", "semantic": "TEXT" }

The following successfully results in a noop.

POST /my-index/_update/1 { "scripted_upsert": true, "script": { "source": """ if (ctx.op != 'create') { if (ctx._source.hash == params.hash ) { ctx.op = "noop" } } ctx._source = params.doc """, "params": { "hash": "SOME-HASH", "doc": { "hash": "SOME-HASH", "semantic": "TEXT" } } } }

Doing the same through bulk:

POST /semantic-docs-dev/_bulk {"update":{"_id":"1"}} { "scripted_upsert": true, "script": { "source": "if (ctx.op != 'create') { if (ctx._source.hash == params.hash ) { ctx.op = 'noop' } else { ctx._source = params.doc } }", "params": { "hash": "SOME-HASH", "doc": { "hash": "SOME-HASH-2", "semantic": "DIFFERENT TEXT" } } } }

Will result in:

{ "errors": true, "took": 0, "items": [ { "update": { "_index": "semantic-docs-dev", "_id": "/docs/reference/integrations/sonicwall_firewall", "status": 400, "error": { "type": "status_exception", "reason": "Cannot apply update with a script on indices that contain [semantic_text] field(s)" } } } ] }

I presume this is because of the optimizations listed here: https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#semantic-text-updates are harder/impossible to track using scripts.

It would be great to have a more controlled upsert through the bulk api to conditionally update semantic fields.

  • Requiring the script to set ctx.semantic_update = true or similar
  • Exposing the using new options on bulk Update upserts directly foregoing scripts alltogether.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions