Skip to content
This repository was archived by the owner on Dec 13, 2023. It is now read-only.

Commit fbfe7a1

Browse files
author
Dronplane
authored
Search: storedValues, conditionOptimization, primarySortCompression (#402)
1 parent 49452fe commit fbfe7a1

File tree

4 files changed

+163
-11
lines changed

4 files changed

+163
-11
lines changed

3.7/aql/operations-search.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,12 @@ The `SEARCH` operation accepts an options object with the following attributes:
305305

306306
- `collections` (array, _optional_): array of strings with collection names to
307307
restrict the search to certain source collections
308+
- `conditionOptimization` (string, _optional_): controls how search criteria
309+
get optimized. Possible values:
310+
- `"auto"` (default): convert conditions to disjunctive normal form (DNF) and
311+
apply optimizations. Removes redundant or overlapping conditions, but can
312+
take quite some time even for a low number of nested conditions.
313+
- `"none"`: search the index without optimizing the conditions.
308314

309315
**Examples**
310316

3.7/arangosearch-views.md

Lines changed: 33 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,9 @@ Note that the `primarySort` option is immutable: it can not be changed after
221221
View creation. It is therefore not possible to configure it through the Web UI.
222222
The View needs to be created via the HTTP or JavaScript API (arangosh) to set it.
223223

224+
The primary sort data is LZ4 compressed by default (`primarySortCompression` is
225+
`"lz4"`). Set it to `"none"` on View creation to trade space for speed.
226+
224227
View Definition/Modification
225228
----------------------------
226229

@@ -245,7 +248,7 @@ During view modification the following directives apply:
245248
### Link Properties
246249

247250
- **analyzers** (_optional_; type: `array`; subtype: `string`; default: `[
248-
'identity' ]`)
251+
"identity" ]`)
249252

250253
A list of Analyzers, by name as defined via the [Analyzers](arangosearch-analyzers.html),
251254
that should be applied to values of processed document attributes.
@@ -271,11 +274,11 @@ During view modification the following directives apply:
271274
- **trackListPositions** (_optional_; type: `boolean`; default: `false`)
272275

273276
If set to `true`, then for array values track the value position in arrays.
274-
E.g., when querying for the input `{ attr: [ 'valueX', 'valueY', 'valueZ' ]
275-
}`, the user must specify: `doc.attr[1] == 'valueY'`. Otherwise, all values in
277+
E.g., when querying for the input `{ attr: [ "valueX", "valueY", "valueZ" ] }`,
278+
the user must specify: `doc.attr[1] == "valueY"`. Otherwise, all values in
276279
an array are treated as equal alternatives. E.g., when querying for the input
277-
`{ attr: [ 'valueX', 'valueY', 'valueZ' ] }`, the user must specify: `doc.attr
278-
== 'valueY'`.
280+
`{ attr: [ "valueX", "valueY", "valueZ" ] }`, the user must specify:
281+
`doc.attr == "valueY"`.
279282

280283
- **storeValues** (_optional_; type: `string`; default: `"none"`)
281284

@@ -294,7 +297,31 @@ During view modification the following directives apply:
294297
iterates over all documents of a View, wants to sort them by attribute values
295298
and the (left-most) fields to sort by as well as their sorting direction match
296299
with the *primarySort* definition, then the `SORT` operation is optimized away.
297-
Also see [Primary Sort Order](arangosearch-views.html#primary-sort-order)
300+
Also see [Primary Sort Order](#primary-sort-order)
301+
302+
- **primarySortCompression** (_optional_; type: `string`; default: `lz4`; _immutable_)
303+
304+
Defines how to compress the primary sort data (introduced in v3.7.0).
305+
ArangoDB v3.5 and v3.6 always compress the index using LZ4.
306+
307+
- `"lz4"` (default): use LZ4 fast compression.
308+
- `"none"`: disable compression to trade space for speed.
309+
310+
- **storedValues** (_optional_; type: `array`; default: `[]`; _immutable_)
311+
312+
An array of objects to describe which document attributes to store in the
313+
View index. It can then cover search queries, which means the data can be
314+
taken from the index directly and accessing the storage engine can be avoided.
315+
316+
Each object is expected in the form
317+
`{ fields: [ "attr1", "attr2", ... "attrN" ], compression: "none" }`,
318+
where the required `fields` attribute is an array of strings with one or more
319+
document attribute paths. The specified attributes are placed into a single
320+
column of the index. A column with all fields that are involved in common
321+
search queries is ideal for performance. The column should not include too
322+
many unneeded fields however. The optional `compression` attribute defines
323+
the compression type used for the internal column-store, which can be `"lz4"`
324+
(LZ4 fast compression, default) or `"none"` (no compression).
298325

299326
An inverted index is the heart of ArangoSearch Views.
300327
The index consists of several independent segments and the index **segment**

3.7/highlights.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,18 @@ Version 3.7
1111

1212
**All Editions**
1313

14-
<!--
1514
- **ArangoSearch**:
16-
Wildcard and fuzzy search (Levenshtein distance and n-gram based), enhanced
17-
phrase and proximity search, improved late document materialization and Views
18-
covering queries using their indexes without touching the storage engine,
19-
SIMD-based index format for faster processing
15+
[Wildcard](aql/functions-arangosearch.html#like) and fuzzy search
16+
([Levenshtein distance](aql/functions-string.html#levenshtein_distance) and
17+
[n-gram based](aql/functions-arangosearch.html#ngram_match)),
18+
[enhanced phrase](aql/functions-arangosearch.html#phrase) and
19+
[proximity search](aql/functions-array.html#jaccard),
20+
improved late document materialization and
21+
[Views covering queries](release-notes-new-features37.html#covering-indexes)
22+
using their indexes without touching the storage engine, as well as a new
23+
SIMD-based index format for faster processing.
2024

25+
<!--
2126
- **AQL**:
2227
Subquery and graph traversal performance improvements
2328

3.7/release-notes-new-features37.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,77 @@ FOR doc IN viewName
3030

3131
See [ArangoSearch functions](aql/functions-arangosearch.html#like)
3232

33+
### Covering Indexes
34+
35+
It is possible to directly store the values of document attributes in View
36+
indexes now via a new View property `storedValues` (not to be confused with
37+
the existing `storeValues`).
38+
39+
View indexes may fully cover `SEARCH` queries for improved performance.
40+
While late document materialization reduces the amount of fetched documents,
41+
this new optimization can avoid to access the storage engine entirely.
42+
43+
```json
44+
{
45+
"links": {
46+
"articles": {
47+
"fields": {
48+
"categories": {}
49+
}
50+
}
51+
},
52+
"primarySort": [
53+
{ "field": "publishedAt", "direction": "desc" }
54+
],
55+
"storedValues": [
56+
{ "fields": [ "title", "categories" ] }
57+
],
58+
...
59+
}
60+
```
61+
62+
In above View definition, the document attribute *categories* is indexed for
63+
searching, *publishedAt* is used as primary sort order and *title* as well as
64+
*categories* are stored in the View using the new `storedValues` property.
65+
66+
```js
67+
FOR doc IN articlesView
68+
SEARCH doc.categories == "recipes"
69+
SORT doc.publishedAt DESC
70+
RETURN {
71+
title: doc.title,
72+
date: doc.publishedAt,
73+
tags: doc.categories
74+
}
75+
```
76+
77+
The query searches for articles which contain a certain tag in the *categories*
78+
array and returns title, date and tags. All three values are stored in the View
79+
(`publishedAt` via `primarySort` and the two other via `storedValues`), thus
80+
no documents need to be fetched from the storage engine to answer the query.
81+
This is shown in the execution plan as a comment to the *EnumerateViewNode*:
82+
`/* view query without materialization */`
83+
84+
```js
85+
Execution plan:
86+
Id NodeType Est. Comment
87+
1 SingletonNode 1 * ROOT
88+
2 EnumerateViewNode 1 - FOR doc IN articlesView SEARCH (doc.`categories` == "recipes") SORT doc.`publishedAt` DESC LET #1 = doc.`publishedAt` LET #7 = doc.`categories` LET #5 = doc.`title` /* view query without materialization */
89+
5 CalculationNode 1 - LET #3 = { "title" : #5, "date" : #1, "tags" : #7 } /* simple expression */
90+
6 ReturnNode 1 - RETURN #3
91+
92+
Indexes used:
93+
none
94+
95+
Optimization rules applied:
96+
Id RuleName
97+
1 move-calculations-up
98+
2 move-calculations-up-2
99+
3 handle-arangosearch-views
100+
```
101+
102+
See [ArangoSearch Views](arangosearch-views.html#view-properties).
103+
33104
### Stemming support for more languages
34105

35106
The Snowball library was updated to the latest version 2, adding stemming
@@ -71,6 +142,49 @@ db._query(`RETURN TOKENS("αυτοκινητουσ πρωταγωνιστούσ
71142

72143
Also see [Analyzers: Supported Languages](arangosearch-analyzers.html#supported-languages)
73144

145+
### Condition Optimization Option
146+
147+
The `SEARCH` operation in AQL accepts a new option `conditionOptimization` to
148+
give users control over the search criteria optimization:
149+
150+
```js
151+
FOR doc IN myView
152+
SEARCH doc.val > 10 AND doc.val > 5 /* more conditions */
153+
OPTIONS { conditionOptimization: "none" }
154+
RETURN doc
155+
```
156+
157+
By default, all conditions get converted into disjunctive normal form (DNF).
158+
Numerous optimizations can be applied, like removing redundant or overlapping
159+
conditions (such as `doc.val > 10` which is included by `doc.val > 5`).
160+
However, converting to DNF and optimizing the conditions can take quite some
161+
time even for a low number of nested conditions which produce dozens of
162+
conjunctions / disjunctions. It can be faster to just search the index without
163+
optimizations.
164+
165+
See [SEARCH operation](aql/operations-search.html#search-options).
166+
167+
### Primary Sort Compression Option
168+
169+
There is a new option `primarySortCompression` which can be set on View
170+
creation to disable the compression of the primary sort data:
171+
172+
```json
173+
{
174+
"primarySort": [
175+
{ "field": "date", "direction": "desc" },
176+
{ "field": "title", "direction": "asc" }
177+
],
178+
"primarySortCompression": "none",
179+
...
180+
}
181+
```
182+
183+
It defaults to LZ4 compression (`"lz4"`), which was already used in ArangoDB
184+
v3.5 and v3.6.
185+
186+
See [ArangoSearch Views](arangosearch-views.html#view-properties).
187+
74188
SatelliteGraphs
75189
---------------
76190

0 commit comments

Comments
 (0)