| 
 | 1 | +---  | 
 | 2 | +title: Vector search functions in AQL  | 
 | 3 | +menuTitle: Vector  | 
 | 4 | +weight: 60  | 
 | 5 | +description: >-  | 
 | 6 | + The functions for vector search let you quickly find semantically similar  | 
 | 7 | + documents utilizing indexed vector embeddings  | 
 | 8 | +---  | 
 | 9 | +<small>Introduced in: v3.12.4</small>  | 
 | 10 | + | 
 | 11 | +To use vector search, you need to have vector embeddings stored in documents  | 
 | 12 | +and the attribute that stores them needs to be indexed by a  | 
 | 13 | +[vector index](../../index-and-search/indexing/working-with-indexes/vector-indexes.md).  | 
 | 14 | + | 
 | 15 | +You can calculate vector embeddings using [ArangoDB's GraphML](../../data-science/arangographml/_index.md)  | 
 | 16 | +capabilities (available in ArangoGraph) or using external tools.  | 
 | 17 | + | 
 | 18 | +{{< warning >}}  | 
 | 19 | +The vector index is an experimental feature that you need to enable for the  | 
 | 20 | +ArangoDB server with the `--experimental-vector-index` startup option.  | 
 | 21 | +Once enabled for a deployment, it cannot be disabled anymore because it  | 
 | 22 | +permanently changes how the data is managed by the RocksDB storage engine  | 
 | 23 | +(it adds an additional column family).  | 
 | 24 | + | 
 | 25 | +To restore a dump that contains vector indexes, the `--experimental-vector-index`  | 
 | 26 | +startup option needs to be enabled on the deployment you want to restore to.  | 
 | 27 | +{{< /warning >}}  | 
 | 28 | + | 
 | 29 | +## Vector similarity functions  | 
 | 30 | + | 
 | 31 | +In order to utilize a vector index, you need to do the following in an AQL query:  | 
 | 32 | + | 
 | 33 | +- Use one of the following vector similarity functions in a query.  | 
 | 34 | +- `SORT` by the similarity so that the most similar documents come first.  | 
 | 35 | +- Specify the maximum number of documents to retrieve with a `LIMIT` operation.  | 
 | 36 | + | 
 | 37 | +As a result, you get up to the specified number of documents whose vector embeddings  | 
 | 38 | +are the most similar to the reference vector embedding you provided in the query,  | 
 | 39 | +as approximated by the vector index.  | 
 | 40 | + | 
 | 41 | +Example:  | 
 | 42 | + | 
 | 43 | +```aql  | 
 | 44 | +FOR doc IN coll  | 
 | 45 | + SORT APPROX_NEAR_COSINE(doc.vector, @q) DESC  | 
 | 46 | + LIMIT 5  | 
 | 47 | + RETURN doc  | 
 | 48 | +```  | 
 | 49 | + | 
 | 50 | +For this query, a vector index over the `vector` attribute and with the `cosine`  | 
 | 51 | +metric is required. The `@q` bind variable needs to be a vector (array of numbers)  | 
 | 52 | +with the dimension as specified in the vector index. It defines the point at  | 
 | 53 | +which to look for similar documents (up to `5` in this case). How many documents can  | 
 | 54 | +be found depends on the data as well as the search effort (see the `nProbe` option).  | 
 | 55 | + | 
 | 56 | +{{< info >}}  | 
 | 57 | +- If there is more than one suitable vector index over the same attribute, it is  | 
 | 58 | + undefined which one is selected.  | 
 | 59 | +- You cannot have any `FILTER` operation between `FOR` and `LIMIT` for  | 
 | 60 | + pre-filtering.  | 
 | 61 | +{{< /info >}}  | 
 | 62 | + | 
 | 63 | +### APPROX_NEAR_COSINE()  | 
 | 64 | + | 
 | 65 | +`APPROX_NEAR_COSINE(vector1, vector2, options) → similarity`  | 
 | 66 | + | 
 | 67 | +Retrieve the approximate angular similarity using the cosine metric, accelerated  | 
 | 68 | +by a matching vector index.  | 
 | 69 | + | 
 | 70 | +The higher the cosine similarity value is, the more similar the two vectors  | 
 | 71 | +are. The closer it is to 0, the more different they are. The value can also  | 
 | 72 | +be negative, indicating that the vectors are not similar and point in opposite  | 
 | 73 | +directions. You need to sort in descending order so that the most similar  | 
 | 74 | +documents come first, which is what a vector index using the `cosine` metric  | 
 | 75 | +can provide.  | 
 | 76 | + | 
 | 77 | +- **vector1** (array of numbers): The first vector. Either this parameter or  | 
 | 78 | + `vector2` needs to reference a stored attribute holding the vector embedding.  | 
 | 79 | +- **vector2** (array of numbers): The second vector. Either this parameter or  | 
 | 80 | + `vector1` needs to reference a stored attribute holding the vector embedding.  | 
 | 81 | +- **options** (object, _optional_):  | 
 | 82 | + - **nProbe** (number, _optional_): How many neighboring centroids respectively  | 
 | 83 | + closest Voronoi cells to consider for the search results. The larger the number,  | 
 | 84 | + the slower the search but the better the search results. If not specified, the  | 
 | 85 | + `defaultNProbe` value of the vector index is used.  | 
 | 86 | +- returns **similarity** (number): The approximate angular similarity between  | 
 | 87 | + both vectors.  | 
 | 88 | + | 
 | 89 | +**Examples**  | 
 | 90 | + | 
 | 91 | +Return up to `10` similar documents based on their closeness to the vector  | 
 | 92 | +`@q` according to the cosine metric:  | 
 | 93 | + | 
 | 94 | +```aql  | 
 | 95 | +FOR doc IN coll  | 
 | 96 | + SORT APPROX_NEAR_COSINE(doc.vector, @q) DESC  | 
 | 97 | + LIMIT 10  | 
 | 98 | + RETURN doc  | 
 | 99 | +```  | 
 | 100 | + | 
 | 101 | +Return up to `5` similar documents as well as the similarity value,  | 
 | 102 | +considering `20` neighboring centroids respectively closest Voronoi cells:  | 
 | 103 | + | 
 | 104 | +```aql  | 
 | 105 | +FOR doc IN coll  | 
 | 106 | + LET similarity = APPROX_NEAR_COSINE(doc.vector, @q, { nProbe: 20 })  | 
 | 107 | + SORT similarity DESC  | 
 | 108 | + LIMIT 5  | 
 | 109 | + RETURN MERGE( { similarity }, doc)  | 
 | 110 | +```  | 
 | 111 | + | 
 | 112 | +Return the similarity value and the document keys of up to `3` similar documents  | 
 | 113 | +for multiple input vectors using a subquery. In this example, the input vectors  | 
 | 114 | +are taken from ten random documents of the same collection:  | 
 | 115 | + | 
 | 116 | +```aql  | 
 | 117 | +FOR docOuter IN coll  | 
 | 118 | + LIMIT 10  | 
 | 119 | + LET neighbors = (  | 
 | 120 | + FOR docInner IN coll  | 
 | 121 | + LET similarity = APPROX_NEAR_COSINE(docInner.vector, docOuter.vector)  | 
 | 122 | + SORT similarity DESC  | 
 | 123 | + LIMIT 3  | 
 | 124 | + RETURN { key: docInner._key, similarity }  | 
 | 125 | + )  | 
 | 126 | + RETURN { key: docOuter._key, neighbors }  | 
 | 127 | +```  | 
 | 128 | + | 
 | 129 | +### APPROX_NEAR_L2()  | 
 | 130 | + | 
 | 131 | +`APPROX_NEAR_L2(vector1, vector2, options) → similarity`  | 
 | 132 | + | 
 | 133 | +Retrieve the approximate distance using the L2 (Euclidean) metric, accelerated  | 
 | 134 | +by a matching vector index.  | 
 | 135 | + | 
 | 136 | +The closer the distance is to 0, the more similar the two vectors are. The higher  | 
 | 137 | +the value, the more different the they are. You need to sort in ascending order  | 
 | 138 | +so that the most similar documents come first, which is what a vector index using  | 
 | 139 | +the `l2` metric can provide.  | 
 | 140 | + | 
 | 141 | +- **vector1** (array of numbers): The first vector. Either this parameter or  | 
 | 142 | + `vector2` needs to reference a stored attribute holding the vector embedding.  | 
 | 143 | +- **vector2** (array of numbers): The second vector. Either this parameter or  | 
 | 144 | + `vector1` needs to reference a stored attribute holding the vector embedding.  | 
 | 145 | +- **options** (object, _optional_):  | 
 | 146 | + - **nProbe** (number, _optional_): How many neighboring centroids to consider  | 
 | 147 | + for the search results. The larger the number, the slower the search but the  | 
 | 148 | + better the search results. If not specified, the `defaultNProbe` value of  | 
 | 149 | + the vector index is used.  | 
 | 150 | +- returns **similarity** (number): The approximate L2 (Euclidean) distance between  | 
 | 151 | + both vectors.  | 
 | 152 | + | 
 | 153 | +**Examples**  | 
 | 154 | + | 
 | 155 | +Return up to `10` similar documents based on their closeness to the vector  | 
 | 156 | +`@q` according to the L2 (Euclidean) metric:  | 
 | 157 | + | 
 | 158 | +```aql  | 
 | 159 | +FOR doc IN coll  | 
 | 160 | + SORT APPROX_NEAR_L2(doc.vector, @q)  | 
 | 161 | + LIMIT 10  | 
 | 162 | + RETURN doc  | 
 | 163 | +```  | 
 | 164 | + | 
 | 165 | +Return up to `5` similar documents as well as the similarity value,  | 
 | 166 | +considering `20` neighboring centroids respectively closest Voronoi cells:  | 
 | 167 | + | 
 | 168 | +```aql  | 
 | 169 | +FOR doc IN coll  | 
 | 170 | + LET similarity = APPROX_NEAR_L2(doc.vector, @q, { nProbe: 20 })  | 
 | 171 | + SORT similarity  | 
 | 172 | + LIMIT 5  | 
 | 173 | + RETURN MERGE( { similarity }, doc)  | 
 | 174 | +```  | 
 | 175 | + | 
 | 176 | +Return the similarity value and the document keys of up to `3` similar documents  | 
 | 177 | +for multiple input vectors using a subquery. In this example, the input vectors  | 
 | 178 | +are taken from ten random documents of the same collection:  | 
 | 179 | + | 
 | 180 | +```aql  | 
 | 181 | +FOR docOuter IN coll  | 
 | 182 | + LIMIT 10  | 
 | 183 | + LET neighbors = (  | 
 | 184 | + FOR docInner IN coll  | 
 | 185 | + LET similarity = APPROX_NEAR_L2(docInner.vector, docOuter.vector)  | 
 | 186 | + SORT similarity  | 
 | 187 | + LIMIT 3  | 
 | 188 | + RETURN { key: docInner._key, similarity }  | 
 | 189 | + )  | 
 | 190 | + RETURN { key: docOuter._key, neighbors }  | 
 | 191 | +```  | 
0 commit comments