如何探索Elasticsearch中的父子文档

发布时间：2021-12-16 17:17:34 来源：亿速云阅读：596 作者：柒染栏目：大数据

如何探索Elasticsearch中的父子文档，相信很多没有经验的人对此束手无策，为此本文总结了问题出现的原因和解决方法，通过这篇文章希望你能解决这个问题。

官网地址：

2.x中文版本
7.9版本

简介

父-子关系文档在实质上类似于 nested model ：允许将一个对象实体和另外一个对象实体关联起来。而这两种类型的主要区别是：在 nested objects 文档中，所有对象都是在同一个文档中，而在父-子关系文档中，父对象和子对象都是完全独立的文档。

父-子关系的主要作用是允许把一个 type 的文档和另外一个 type 的文档关联起来，构成一对多的关系：一个父文档可以对应多个子文档。与 nested objects 相比，父-子关系的主要优势有：

更新父文档时，不会重新索引子文档。
创建，修改或删除子文档时，不会影响父文档或其他子文档。这一点在这种场景下尤其有用：子文档数量较多，并且子文档创建和修改的频率高时。
子文档可以作为搜索结果独立返回。

Elasticsearch 维护了一个父文档和子文档的映射关系，得益于这个映射，父-子文档关联查询操作非常快。但是这个映射也对父-子文档关系有个限制条件：父文档和其所有子文档，都必须要存储在同一个分片中。

父-子文档ID映射存储在 Doc Values 中。当映射完全在内存中时， Doc Values 提供对映射的快速处理能力，另一方面当映射非常大时，可以通过溢出到磁盘提供足够的扩展能力

Has child query

因为has_child执行联接，所以它与其他查询相比速度较慢。随着指向唯一父文档的匹配子文档数量的增加，其性能会下降。搜索中的每个has_child查询都会大大增加查询时间。
如果您关心查询性能，请不要使用此查询。如果需要使用has_child查询，请尽可能少使用。

要使用has_child查询，您的索引必须包含一个联接字段映射。 例如： PUT /my-index-000001 {   "mappings": {     "properties": {       "my-join-field": {         "type": "join",         "relations": {           "parent": "child"         }       }     }   } } GET /_search {   "query": {     "has_child": {       "type": "child",       "query": {         "match_all": {}       },       "max_children": 10,       "min_children": 2,       "score_mode": "min"     }   } }

type:（必需，字符串）为联接字段映射的子关系的名称。
query：（必需的查询对象）要在type字段的子文档上运行的查询。如果子文档与搜索匹配，则查询返回父文档。
ignore_unmapped：（可选，布尔值）指示是否忽略未映射的类型并且不返回任何文档而不是返回错误。默认为false。

如果为false，则在未映射类型的情况下，Elasticsearch返回错误。您可以使用此参数查询可能不包含该类型的多个索引。

max_children：（可选，整数）与返回的父文档允许的查询相匹配的子文档的最大数量。如果父文档超出此限制，则将其从搜索结果中排除。
min_children：（可选，整数）与查询相匹配的子文档的最小数量，该查询与为返回的父文档的查询匹配所需。如果父文档不符合此限制，则将其从搜索结果中排除。
score_mode：（可选，字符串）指示匹配子文档的分数如何影响根父文档的相关性分数。有效值为：

none (Defaul不要使用匹配的子文档的相关性分数。该查询将父文档分配为0分。
avg：使用所有匹配的子文档的平均相关性得分。
max：使用所有匹配的子文档中的最高相关性得分。
min：使用所有匹配的子文档中最低的相关性得分。
sum：将所有匹配的子文档的相关性得分相加。

1. Sorting

您不能使用标准排序选项对has_child查询的结果进行排序。如果需要按子文档中的字段对返回的文档进行排序，请使用function_score查询并按_score进行排序。例如，以下查询按其子文档的click_count字段对返回的文档进行排序。

GET /_search {   "query": {     "has_child": {       "type": "child",       "query": {         "function_score": {           "script_score": {             "script": "_score * doc['click_count'].value"           }         }       },       "score_mode": "max"     }   } }

Has parent query

返回其子级父文档与提供的查询匹配的子文档。您可以使用联接字段映射在同一索引中的文档之间创建父子关系。

因为执行连接，所以has_parent查询比其他查询慢。随着匹配父文档数量的增加，其性能会下降。搜索中的每个has_parent查询都会大大增加查询时间。

要使用has_parent查询，您的索引必须包含一个联接字段映射。 例如： PUT /my-index-000001 {   "mappings": {     "properties": {       "my-join-field": {         "type": "join",         "relations": {           "parent": "child"         }       },       "tag": {         "type": "keyword"       }     }   } } GET /my-index-000001/_search {   "query": {     "has_parent": {       "parent_type": "parent",       "query": {         "term": {           "tag": {             "value": "Elasticsearch"           }         }       }     }   } }

parent_type：（必需，字符串）为联接字段映射的父级关系的名称。
query：（必需的查询对象）要在parent_type字段的父文档上运行的查询。如果父文档与搜索匹配，则查询返回其子文档。
score：（可选，布尔值）指示是否将匹配的父文档的相关性分数汇总到其子文档中。默认为false。

如果为false，Elasticsearch将忽略父文档的相关性得分。 Elasticsearch还会为每个子文档分配一个关联分数，该关联分数等于查询的提升值，默认为1。
如果为true，则将匹配的父文档的相关性分数汇总到其子文档的相关性分数中。

ignore_unmapped：（可选，布尔值）指示是否忽略未映射的parent_type而不返回任何文档而不是错误。默认为false。

如果为false，则在未映射parent_type的情况下，Elasticsearch返回错误。
您可以使用此参数查询可能不包含parent_type的多个索引。

1. Sorting

您不能使用标准排序选项对has_parent查询的结果进行排序。

如果需要按返回文档的父文档中的字段对它们进行排序，请使用function_score查询并按_score进行排序。例如，以下查询按其父文档的view_count字段对返回的文档进行排序。

GET /_search {   "query": {     "has_parent": {       "parent_type": "parent",       "score": true,       "query": {         "function_score": {           "script_score": {             "script": "_score * doc['view_count'].value"           }         }       }     }   } }

Parent ID query

返回加入特定父文档的子文档。您可以使用联接字段映射在同一索引中的文档之间创建父子关系。

要使用parent_id查询，您的索引必须包含一个联接字段映射。若要查看如何为parent_id查询设置索引，请尝试以下示例。

创建具有联接字段映射的索引。 PUT /my-index-000001 {   "mappings": {     "properties": {       "my-join-field": {         "type": "join",         "relations": {           "my-parent": "my-child"         }       }     }   } } 索引ID为1的父文档。 PUT /my-index-000001/_doc/1?refresh {   "text": "This is a parent document.",   "my-join-field": "my-parent" } 索引父文档的子文档。 PUT /my-index-000001/_doc/2?routing=1&refresh {   "text": "This is a child document.",   "my_join_field": {     "name": "my-child",     "parent": "1"   } } 以下搜索返回ID为1的父文档的子文档。 GET /my-index-000001/_search {   "query": {       "parent_id": {           "type": "my-child",           "id": "1"       }   } }

type：（必需，字符串）为联接字段映射的子关系的名称。
id：（必需，字符串）父文档的ID。查询将返回此父文档的子文档。
ignore_unmapped：（可选，布尔值）指示是否忽略未映射的类型并且不返回任何文档而不是返回错误。默认为false。

如果为false，则在未映射类型的情况下，Elasticsearch返回错误。
您可以使用此参数查询可能不包含该类型的多个索引。

实例分享

跟低版本的”_parent”的方式不一样，说明Es在后期高版本做了语法上的修改

父子文档在理解上来说，可以理解为一个关联查询，有些类似MySQL中的JOIN查询，通过某个字段关系来关联。父子文档与嵌套文档主要的区别在于，父子文档的父对象和子对象都是独立的文档，而嵌套文档中都在同一个文档中存储，如下图所示：

1. 构建父-子索引

新建Setting： PUT /test_doctor {   "settings": {     "number_of_shards": 1,     "analysis": {       "analyzer": {         "index_ansj_analyzer": {           "type": "custom",           "tokenizer": "index_ansj",           "filter": [             "my_synonym",             "asciifolding"           ]         },         "comma": {           "type": "pattern",           "pattern": ","         },         "shingle_analyzer": {           "type": "custom",           "tokenizer": "standard",           "filter": [             "lowercase",             "shingle_filter"           ]         }       },       "filter": {         "my_synonym": {           "type": "synonym",           "synonyms_path": "analysis/synonym.txt"         },         "shingle_filter": {           "type": "shingle",           "min_shingle_size": 2,           "max_shingle_size": 2,           "output_unigrams": false         }       }     }   } }  新建Mapping： PUT /test_doctor/_mapping/_doc {   "_doc": {     "properties": {       "date": {         "type": "date"       },       "name": {         "type": "text",         "fields": {           "keyword": {             "type": "keyword"           }         }       },       "comment": {         "type": "text",         "fields": {           "keyword": {             "type": "keyword"           }         }       },       "age": {         "type": "long"       },       "body": {         "type": "text",         "analyzer":"index_ansj_analyzer"         "fields": {           "keyword": {             "type": "keyword"           }         }       },       "title": {         "type": "text",         "analyzer":"index_ansj_analyzer",         "fields": {           "keyword": {             "type": "keyword"           }         }       },       "relation": {  # 这个relation相当于一个普通的字段名         "type": "join",         "relations": { # 该relations部分定义了文档内的一组可能的关系，每个关系是父名和子名           "question": "answer"         }       }     }   } } 这段代码建立了一个test_doctor的索引，其中relation是一个用于join的字段，type为join，关系relations为：父为question, 子为answer。 至于建立一父多子关系，只需要改为数组即可："question": ["answer", "comment"] 备注：question和answer是自定义的一种关系

2. 插入数据

插入父文档数据，需要指定上文索引结构中的relation为question PUT test_doctor/_doc/1 {     "title":"这是一篇文章",     "body":"这是一篇文章，从哪里说起呢？ ... ...",     "relation":"question"  # 这个relation是一个普通的字段，value值为question表示为父文档 } PUT test_doctor/_doc/2 {     "title":"这是一篇小说",     "body":"这是一篇小说，从哪里说起呢？ ... ...",     "relation":"question"  # 这个relation是一个普通的字段，value值为question表示为父文档 } 注意也可以写成这样"relation":{"name":"question"} 插入子文档，需要在请求地址上使用routing参数指定是谁的子文档，并且指定索引结构中的relation关系 PUT test_doctor/_doc/3?routing=1 {     "name":"张三",     "comment":"写的不错",     "age":28,     "date":"2020-05-04",     "relation":{  # 这个relation是一个普通的字段，value值为answer表示为子文档         "name":"answer",         "parent":1     } } PUT test_doctor/_doc/4?routing=1 {     "name":"李四",     "comment":"写的很好",     "age":20,     "date":"2020-05-04",     "relation":{  # 这个relation是一个普通的字段，value值为answer表示为子文档         "name":"answer",         "parent":1     } } PUT test_doctor/_doc/5?routing=2 {     "name":"王五",     "comment":"这是一篇非常棒的小说",     "age":31,     "date":"2020-05-01",     "relation":{  # 这个relation是一个普通的字段，value值为answer表示为子文档         "name":"answer",         "parent":2     } } PUT test_doctor/_doc/6?routing=2 {     "name":"小六",     "comment":"这是一篇非常棒的小说",     "age":31,     "date":"2020-05-01",     "relation":{  # 这个relation是一个普通的字段，value值为answer表示为子文档         "name":"answer",         "parent":2     } } 父文档： Map drugMap = Maps.newHashMap(); drugMap.put("id", "2"); //  drugMap.put("title", "这是一篇小说"); //  drugMap.put("body", "这是一篇小说，从哪里说起呢？ ... ..."); drugMap.put("relation", "question");// 固定写法 子文档： Map maps = Maps.newHashMap(); maps.put("name", "answer");  // 固定写法 maps.put("parent", "2");   // 这里的1是指的父文档所绑定的id Map doctorTeamMap = Maps.newHashMap(); doctorTeamMap.put("id", "6");   doctorTeamMap.put("name", "小六");  doctorTeamMap.put("comment", "这是一篇非常棒的小说");  doctorTeamMap.put("age", "31");   doctorTeamMap.put("date", "2020-05-01"); doctorTeamMap.put("relation", maps);    // 固定写法 Java代码实现： /**  * 使用BulkProcessor批量更新数据  * @param indexName 索引名称  * @param jsonString    索引的document数据  */ public boolean addIndexBulk(String indexName, Map<String, Object> jsonString, String id) {     IndexRequest request = new IndexRequest(indexName, "_doc", id);     request.source(jsonString, XContentType.JSON);     dataBulkProcessor.add(request);     return true; } /**  * 添加路由  * @param indexName  * @param jsonString  * @param id  * @param routing  * @return  */ public boolean addIndexBulk(String indexName, Map<String, Object> jsonString, String id, String routing) {     IndexRequest request = new IndexRequest(indexName, "_doc", id);     request.source(jsonString, XContentType.JSON);     request.routing(routing);     dataBulkProcessor.add(request);     return true; }

3. 查询数据

关系字段查询

es会自动生成一个额外的用于表示关系的字段：field#question 我们可以通过以下方式查询： POST test_doctor/_search {  "script_fields": {     "parent": {       "script": {          "source": "doc['relation#question']"        }     }   } } 响应结果： {   "took" : 124,   "timed_out" : false,   "_shards" : {     "total" : 1,     "successful" : 1,     "skipped" : 0,     "failed" : 0   },   "hits" : {     "total" : 7,     "max_score" : 1.0,     "hits" : [       {         "_index" : "test_doctor",         "_type" : "_doc",         "_id" : "1",         "_score" : 1.0,         "fields" : {           "parent" : [             "1"           ]         }       },       {         "_index" : "test_doctor",         "_type" : "_doc",         "_id" : "2",         "_score" : 1.0,         "_routing" : "1",         "fields" : {           "parent" : [             "1"           ]         }       },       {         "_index" : "test_doctor",         "_type" : "_doc",         "_id" : "3",         "_score" : 1.0,         "_routing" : "1",         "fields" : {           "parent" : [             "1"           ]         }       },       {         "_index" : "test_doctor",         "_type" : "_doc",         "_id" : "4",         "_score" : 1.0,         "_routing" : "1",         "fields" : {           "parent" : [             "1"           ]         }       },       {         "_index" : "test_doctor",         "_type" : "_doc",         "_id" : "5",         "_score" : 1.0,         "fields" : {           "parent" : [             "5"           ]         }       },       {         "_index" : "test_doctor",         "_type" : "_doc",         "_id" : "6",         "_score" : 1.0,         "_routing" : "5",         "fields" : {           "parent" : [             "5"           ]         }       },       {         "_index" : "test_doctor",         "_type" : "_doc",         "_id" : "7",         "_score" : 1.0,         "_routing" : "1",         "fields" : {           "parent" : [             "1"           ]         }       }     ]   } } 有_routing字段的说明是子文档，它的parent字段是父文档id，如果没有_routing就是父文档，它的parent指向当前id

通过parent_id查询子文档

通过parent_id query传入父文档id即可 POST test_doctor/_search {   "query": {     "parent_id": {        "type": "answer",       "id": "5"     }   } } Java API： //子文档名 String child_type = "answer"; //父文档ID String id = "5"; //ParentId查询 ParentIdQueryBuilder parentIdQueryBuilder = new ParentIdQueryBuilder(child_type, id); builder.query(parentIdQueryBuilder); builder.from(0); builder.size(10); 通过ID和routing ，访问子文档(不加routing查不到) GetRequest getRequest = new GetRequest(indexName, child_type); //必须指定路由（父ID） getRequest.routing(id);

通过子文档查询-has_child

使用has_child来根据子文档内容查询父文档，其实type就是创建文档时，子文档的标识。

查询包含特定子文档的父文档，这是一种很耗性能的查询，尽量少用。它的查询标准格式如下 POST test_doctor/_search {   "query": {     "has_child": {       "type": "answer",       "query": {         "match": {           "name": "张三"         }       },       "inner_hits": {} # 同时返回父子数据     }   } } POST test_doctor/_search {     "query": {         "has_child" : {             "type" : "answer",             "query" : {                 "match_all" : {}             },             "max_children": 10, //可选，符合查询条件的子文档最大返回数             "min_children": 2, //可选，符合查询条件的子文档最小返回数             "score_mode" : "min"         }     } } 如果也想根据父文档的字段进行过滤，采用后置过滤器的方法 POST test_doctor/_search {   "query": {     "has_child": {       "type": "answer",       "query": {         "match": {           "name": "张三"         }       },       "inner_hits": {}     }   },   "post_filter": {     "bool": {       "must": [         {           "term": {             "title": {               "value": "文章",               "boost": 1             }           }         }       ]     }   } } Java API： // 子文档查询条件 QueryBuilder matchQuery = QueryBuilders.termQuery("name", "张三"); // 是否计算评分 ScoreMode scoreMode = ScoreMode.Total; HasChildQueryBuilder childQueryBuilder = new HasChildQueryBuilder("answer", matchQuery, scoreMode); childQueryBuilder.innerHit(new InnerHitBuilder()); builder.query(childQueryBuilder); builder.postFilter(boolQueryBuilder);

通过父文档查询-has_parent

根据父文档查询子文档 has_parent。

{   "query": {     "has_parent": {       "parent_type":"question",       "query": {         "match": {           "title": "这是一篇文章"         }       }     }   } } // 是否计算评分 score = true; HasParentQueryBuilder hasParentQueryBuilder = new HasParentQueryBuilder("question", boolQueryBuilder, score); builder.query(hasParentQueryBuilder); builder.postFilter(QueryBuilders.termQuery("indextype", "answer")); // 子文档的过滤条件

看完上述内容，你们掌握如何探索Elasticsearch中的父子文档的方法了吗？如果还想学到更多技能或想了解更多相关内容，欢迎关注亿速云行业资讯频道，感谢各位的阅读！

向AI问一下细节

如何探索Elasticsearch中的父子文档

简介

Has child query

1. Sorting

Has parent query

1. Sorting

Parent ID query

实例分享

1. 构建父-子索引

2. 插入数据

3. 查询数据

关系字段查询

通过parent_id查询子文档

通过子文档查询-has_child

通过父文档查询-has_parent

猜你喜欢

最新资讯

相关推荐

相关标签