- Notifications
You must be signed in to change notification settings - Fork 25.6k
Create .synonyms system index #95548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| pr: 95548 | ||
| summary: Create `.synonyms` system index | ||
| area: Analysis | ||
| type: enhancement | ||
| issues: [] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the Elastic License | ||
| * 2.0 and the Server Side Public License, v 1; you may not use this file except | ||
| * in compliance with, at your election, the Elastic License 2.0 or the Server | ||
| * Side Public License, v 1. | ||
| */ | ||
| | ||
| package org.elasticsearch.analysis.common.synonyms; | ||
| | ||
| import org.elasticsearch.Version; | ||
| import org.elasticsearch.cluster.metadata.IndexMetadata; | ||
| import org.elasticsearch.common.settings.Settings; | ||
| import org.elasticsearch.common.util.FeatureFlag; | ||
| import org.elasticsearch.indices.SystemIndexDescriptor; | ||
| import org.elasticsearch.xcontent.XContentBuilder; | ||
| | ||
| import java.io.IOException; | ||
| import java.io.UncheckedIOException; | ||
| | ||
| import static org.elasticsearch.index.mapper.MapperService.SINGLE_MAPPING_NAME; | ||
| import static org.elasticsearch.xcontent.XContentFactory.jsonBuilder; | ||
| | ||
| public class SynonymsManagementAPIService { | ||
cbuescher marked this conversation as resolved. Show resolved Hide resolved | ||
| private static final FeatureFlag SYNONYMS_API_FEATURE_FLAG = new FeatureFlag("synonyms_api"); | ||
| public static final String SYNONYMS_INDEX = ".synonyms"; | ||
| public static final String SYNONYMS_ORIGIN = "synonyms"; | ||
| | ||
| public static SystemIndexDescriptor getSystemIndexDescriptor() { | ||
| return SystemIndexDescriptor.builder() | ||
| .setIndexPattern(SYNONYMS_INDEX + "*") | ||
| .setDescription("Synonyms index for synonyms managed through APIs") | ||
| .setPrimaryIndex(SYNONYMS_INDEX) | ||
| .setMappings(mappings()) | ||
| .setSettings(settings()) | ||
| .setVersionMetaKey("version") | ||
| .setOrigin(SYNONYMS_ORIGIN) | ||
| .build(); | ||
| } | ||
| | ||
| private static XContentBuilder mappings() { | ||
| try { | ||
| XContentBuilder builder = jsonBuilder(); | ||
| builder.startObject(); | ||
| { | ||
| builder.startObject(SINGLE_MAPPING_NAME); | ||
| { | ||
| builder.startObject("_meta"); | ||
| { | ||
| builder.field("version", Version.CURRENT.toString()); | ||
| } | ||
| builder.endObject(); | ||
| builder.field("dynamic", "strict"); | ||
| builder.startObject("properties"); | ||
| { | ||
| builder.startObject("synonyms"); | ||
| { | ||
| builder.field("type", "object"); | ||
| builder.field("enabled", "false"); | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For my understanding, this looks like we're leaning towards storing synonyms source-only, that is e.g. in this format: Or maybe with more fields, but basically it would mean nothing in there is searchable and we'd have to load the whole source at once and parse stuff from a map? Also for updates this means we'd need to modify the source for every rule that is changed, appended etc... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The format, we are currently leaning towards is this: { "synonyms" : [ "foo => bar", "lol, laughing out loud", "i-phone, i phone => iphone" ] }Where indeed synonyms are stored in source only inside a single field. And indeed for updates we modify the entire document (with its implications). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great, so that would mean a slightly different mapping then I guess (probably synonyms as keyword type array then?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed, it looks more like
"synonyms": { "type": "keyword", "doc_values": false, "index" : false }will still parse every synonym rule and does some extra work for indexing before retuning with empty index options.
WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good to me, I didn't think about not parsing the docs here at all but its a nice idea as long as we don't have to implement any fast filtering or search on anything. So the parsing logic would only live in whatever piece of code is reading the synonyms from index instead of from file like right now and basically treats source like a blob. Sounds good to me given the current scope of the UI etc... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Drive-by comment: should we consider properly disabling parsing for fields that don't have any lucene footprint like in this example, rather than using type object? The current solution looks ok to me, but I wonder if it's solid enough given we discussed rethinking how to disable fields in the past (see #63369). This should not block progress on synonyms management API though. | ||
| } | ||
| builder.endObject(); | ||
| } | ||
| builder.endObject(); | ||
| } | ||
| builder.endObject(); | ||
| } | ||
| builder.endObject(); | ||
| return builder; | ||
| } catch (IOException e) { | ||
| throw new UncheckedIOException("Failed to build mappings for " + SYNONYMS_INDEX, e); | ||
| } | ||
| } | ||
| | ||
| static Settings settings() { | ||
| return Settings.builder() | ||
| .put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, 1) | ||
| .put(IndexMetadata.SETTING_NUMBER_OF_REPLICAS, 0) | ||
| .put(IndexMetadata.SETTING_AUTO_EXPAND_REPLICAS, "0-all") | ||
| .build(); | ||
| } | ||
| | ||
| public static boolean isEnabled() { | ||
| return SYNONYMS_API_FEATURE_FLAG.isEnabled(); | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.