Skip to content
This repository was archived by the owner on Dec 13, 2023. It is now read-only.
Merged
14 changes: 14 additions & 0 deletions 3.8/aql/functions-string.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,20 @@ value used is 0xFFFFFFFF, and the final xor value is also 0xFFFFFFFF.
CRC32("foobar") // "D5F5C7F"
```

NORMALIZE_UTF8
--------------

`NORMALIZE_UTF8(text) → normalized`

In Unicode there may be more than one representation of a glyph.
Use [ICU Normalization](http://www.unicode.org/reports/tr15/) in order to convert
to a similar character. It may also be useful to run this normalization before
writing the documents into the database. If you have strings with different representations,
functions like [`FIND_FIRST`](#FIND_FIRST) may not find all matches you desire.

- **text** (string): a UTF8-string
- returns **nomalized UTF8-string** (string): the normalized string.

ENCODE_URI_COMPONENT()
-----------

Expand Down
14 changes: 13 additions & 1 deletion 3.8/indexing-index-basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,19 @@ not be enabled for other types of queries or conditions.
For advanced full-text search capabilities consider [ArangoSearch](arangosearch.html).
{% endhint %}


Indexes and non-ASCII texts
---------------------------
Before strings are put into an index, they are
[normalized by using ICU](http://www.unicode.org/reports/tr15/). There are several characters
in the Unicode space, that have a similar meaning. In order to have all variants of them
in a result set when querying, the strings are normalized for the index.
This slightly changes the behaviour of `FILTER` statements with `==` -
comparisons when ran on non-indexed document attributes. While the index may still be useful
by fetching a little more results then you want to actually work with, you may want to have an
additional `FILTER MD5(doc.attr) == MD5(@comparisonstring)` to make sure in the end result
only contains the actual values you desire.

Indexing attributes and sub-attributes
--------------------------------------

Expand Down Expand Up @@ -689,4 +702,3 @@ become unsustainable if this list grows to tens of millions of entries.

Building an index is always a write heavy operation (internally), it is always a good idea to build indexes
during times with less load.