Skip to content
This repository was archived by the owner on Dec 13, 2023. It is now read-only.

Commit c4e2c06

Browse files
authored
Merge pull request #93 from arangodb/smartjoins-in-a-nutshell
2 parents aa34532 + 7a65a57 commit c4e2c06

File tree

5 files changed

+61
-49
lines changed

5 files changed

+61
-49
lines changed

3.4/smart-joins.md renamed to 3.4/smartjoins.md

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
---
22
layout: default
3-
description: Introduced in
3+
description: SmartJoins allow to execute co-located join operations among identically sharded collections.
4+
title: SmartJoins for ArangoDB Clusters
5+
redirect_from:
6+
- /3.4/smart-joins.html
47
---
5-
Smart Joins
6-
===========
8+
SmartJoins
9+
==========
710

811
<small>Introduced in: v3.4.5, v3.5.0</small>
912

@@ -12,8 +15,12 @@ This feature is only available in the
1215
[**Enterprise Edition**](https://www.arangodb.com/why-arangodb/arangodb-enterprise/){:target="_blank"}
1316
{% endhint %}
1417

15-
When doing joins in an ArangoDB cluster, data has to be exchanged between different servers.
18+
SmartJoins allow to execute co-located join operations among identically sharded collections.
19+
20+
Cluster joins without being smart
21+
---------------------------------
1622

23+
When doing joins in an ArangoDB cluster, data has to be exchanged between different servers.
1724
Joins between different collections in a cluster normally require roundtrips between the
1825
shards of these collections for fetching the data. Requests are routed through an extra
1926
coordinator hop.
@@ -140,8 +147,8 @@ This is a precondition for running joins locally, and thanks to the effects of
140147
`distributeShardsLike` it is now satisfied!
141148

142149

143-
Smart joins using distributeShardsLike
144-
--------------------------------------
150+
SmartJoins using distributeShardsLike
151+
-------------------------------------
145152

146153
With the two collections in place like this, an AQL query that uses a FILTER condition
147154
that refers from the shard key of the one collection to the shard key of the other collection
@@ -166,7 +173,7 @@ As can be seen above, the extra hop via the coordinator is gone here, which will
166173
less cluster-internal traffic and a faster response time.
167174

168175

169-
Smart joins will also work if the shard key of the second collection is not *_key*,
176+
SmartJoins will also work if the shard key of the second collection is not *_key*,
170177
and even for non-unique shard key values, e.g.:
171178

172179
arangosh> db._create("c1", {numberOfShards: 4, shardKeys: ["_key"]});
@@ -194,13 +201,13 @@ and even for non-unique shard key values, e.g.:
194201
6 ReturnNode COOR 2000 - RETURN doc1
195202

196203
{% hint 'tip' %}
197-
All above examples used two collections only. Smart joins will also work when joining
204+
All above examples used two collections only. SmartJoins will also work when joining
198205
more than two collections which have the same data distribution enforced via their
199206
`distributeShardsLike` attribute and using the shard keys as the join criteria as shown above.
200207
{% endhint %}
201208

202-
Smart joins using smartJoinAttribute
203-
------------------------------------
209+
SmartJoins using smartJoinAttribute
210+
-----------------------------------
204211

205212
In case the join on the second collection must be performed on a non-shard key
206213
attribute, there is the option to specify a *smartJoinAttribute* for the collection.
@@ -252,8 +259,8 @@ The join can now be performed via the collection's *smartJoinAttribute*:
252259
6 ReturnNode COOR 101 - RETURN doc1
253260

254261

255-
Restricting smart joins to a single shard
256-
-----------------------------------------
262+
Restricting SmartJoins to a single shard
263+
----------------------------------------
257264

258265
If a FILTER condition is used on one of the shard keys, the optimizer will also try
259266
to restrict the queries to just the required shards:
@@ -277,12 +284,12 @@ to restrict the queries to just the required shards:
277284
Limitations
278285
-----------
279286

280-
In ArangoDB 3.4, the smart join optimization must explicitly be turned on in the
287+
In ArangoDB 3.4, the SmartJoin optimization must explicitly be turned on in the
281288
server configuration, using the startup option `--query.smart-joins true`. If that
282-
configuration is not set, the smart join optimization will not be performed.
289+
configuration is not set, the SmartJoin optimization will not be performed.
283290
Future versions ArangoDB will lift that requirement.
284291

285-
The smart join optimization is currently triggered only for data selection queries,
292+
The SmartJoin optimization is currently triggered only for data selection queries,
286293
but not for any data-manipulation operations such as INSERT, UPDATE, REPLACE, REMOVE
287294
or UPSERT, neither traversals or subqueries.
288295

@@ -294,5 +301,5 @@ It is restricted to be used with simple shard key attributes (such as `_key`, `p
294301
but not with nested attributes (e.g. `name.first`). There should be exactly one shard
295302
key attribute defined for each collection.
296303

297-
Finally, the smart join optimization requires that the collections are joined on their
304+
Finally, the SmartJoin optimization requires that the collections are joined on their
298305
shard key attributes (or smartJoinAttribute) using an equality comparison.

3.5/release-notes-new-features35.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -73,12 +73,12 @@ paths of increasing length from a start vertex to a target vertex. For more deta
7373
see the [k Shortest Paths documentation](aql/graphs-kshortest-paths.html).
7474

7575

76-
Smart Joins
77-
-----------
76+
SmartJoins
77+
----------
7878

79-
The "smart joins" feature available in the ArangoDB Enterprise Edition allows running
80-
joins between two sharded collections with performance close to that of a local join
81-
operation.
79+
The SmartJoins feature available in the ArangoDB Enterprise Edition allows
80+
running joins between two sharded collections with performance close to that
81+
of a local join operation.
8282

8383
The prerequisite for this is that the two collections have an identical sharding setup,
8484
established via the `distributeShardsLike` attribute of one of the collections.
@@ -90,15 +90,15 @@ Quick example setup for two collections with identical sharding:
9090
> db.orders.ensureIndex({ type: "hash", fields: ["productId"] });
9191

9292
Now an AQL query that joins the two collections via their shard keys will benefit from
93-
the smart join optimization, e.g.
93+
the SmartJoin optimization, e.g.
9494

9595
FOR p IN products
9696
FOR o IN orders
9797
FILTER p._key == o.productId
9898
RETURN o
9999

100100
In this query's execution plan, the extra hop via the coordinator can be saved
101-
that is normally there for generic joins. Thanks to the smart join optimization,
101+
that is normally there for generic joins. Thanks to the SmartJoin optimization,
102102
the query's execution is as simple as:
103103

104104
Execution plan:
@@ -110,7 +110,7 @@ the query's execution is as simple as:
110110
11 GatherNode COOR 0 - GATHER
111111
6 ReturnNode COOR 0 - RETURN o
112112

113-
Without the smart join optimization, there will be an extra hop via the
113+
Without the SmartJoin optimization, there will be an extra hop via the
114114
coordinator for shipping the data from each shard of the one collection to
115115
each shard of the other collection, which will be a lot more expensive:
116116

@@ -127,25 +127,25 @@ each shard of the other collection, which will be a lot more expensive:
127127
11 GatherNode COOR 3 - GATHER
128128
6 ReturnNode COOR 3 - RETURN o
129129

130-
In the end, smart joins can optimize away a lot of the inter-node network
130+
In the end, SmartJoins can optimize away a lot of the inter-node network
131131
requests normally required for performing a join between sharded collections.
132-
The performance advantage of smart joins compared to regular joins will grow
132+
The performance advantage of SmartJoins compared to regular joins will grow
133133
with the number of shards of the underlying collections.
134134

135135
In general, for two collections with `n` shards each, the minimal number of
136-
network requests for the general join (_no_ smart joins optimization) will be
136+
network requests for the general join (_no_ SmartJoins optimization) will be
137137
`n * (n + 2)`. The number of network requests increases quadratically with the
138138
number of shards.
139139

140-
Smart joins can get away with a minimal number of `n` requests here, which scales
140+
SmartJoins can get away with a minimal number of `n` requests here, which scales
141141
linearly with the number of shards.
142142

143-
Smart joins will also be especially advantageous for queries that have to ship a lot
143+
SmartJoins will also be especially advantageous for queries that have to ship a lot
144144
of data around for performing the join, but that will filter out most of the data
145-
after the join. In this case smart joins should greatly outperform the general join,
145+
after the join. In this case SmartJoins should greatly outperform the general join,
146146
as they will eliminate most of the inter-node data shipping overhead.
147147

148-
Also see the [Smart Joins](smart-joins.html) page.
148+
Also see the [SmartJoins](smartjoins.html) page.
149149

150150

151151
Background Index Creation

3.5/smart-joins.md renamed to 3.5/smartjoins.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
---
22
layout: default
3-
description: Introduced in
3+
description: SmartJoins allow to execute co-located join operations among identically sharded collections.
4+
title: SmartJoins for ArangoDB Clusters
45
---
5-
Smart Joins
6-
===========
6+
SmartJoins
7+
==========
78

89
<small>Introduced in: v3.4.5, v3.5.0</small>
910

@@ -12,8 +13,12 @@ This feature is only available in the
1213
[**Enterprise Edition**](https://www.arangodb.com/why-arangodb/arangodb-enterprise/){:target="_blank"}
1314
{% endhint %}
1415

15-
When doing joins in an ArangoDB cluster, data has to be exchanged between different servers.
16+
SmartJoins allow to execute co-located join operations among identically sharded collections.
17+
18+
Cluster joins without being smart
19+
---------------------------------
1620

21+
When doing joins in an ArangoDB cluster, data has to be exchanged between different servers.
1722
Joins between different collections in a cluster normally require roundtrips between the
1823
shards of these collections for fetching the data. Requests are routed through an extra
1924
coordinator hop.
@@ -140,8 +145,8 @@ This is a precondition for running joins locally, and thanks to the effects of
140145
`distributeShardsLike` it is now satisfied!
141146

142147

143-
Smart joins using distributeShardsLike
144-
--------------------------------------
148+
SmartJoins using distributeShardsLike
149+
-------------------------------------
145150

146151
With the two collections in place like this, an AQL query that uses a FILTER condition
147152
that refers from the shard key of the one collection to the shard key of the other collection
@@ -166,7 +171,7 @@ As can be seen above, the extra hop via the coordinator is gone here, which will
166171
less cluster-internal traffic and a faster response time.
167172

168173

169-
Smart joins will also work if the shard key of the second collection is not *_key*,
174+
SmartJoins will also work if the shard key of the second collection is not *_key*,
170175
and even for non-unique shard key values, e.g.:
171176

172177
arangosh> db._create("c1", {numberOfShards: 4, shardKeys: ["_key"]});
@@ -194,13 +199,13 @@ and even for non-unique shard key values, e.g.:
194199
6 ReturnNode COOR 2000 - RETURN doc1
195200

196201
{% hint 'tip' %}
197-
All above examples used two collections only. Smart joins will also work when joining
202+
All above examples used two collections only. SmartJoins will also work when joining
198203
more than two collections which have the same data distribution enforced via their
199204
`distributeShardsLike` attribute and using the shard keys as the join criteria as shown above.
200205
{% endhint %}
201206

202-
Smart joins using smartJoinAttribute
203-
------------------------------------
207+
SmartJoins using smartJoinAttribute
208+
-----------------------------------
204209

205210
In case the join on the second collection must be performed on a non-shard key
206211
attribute, there is the option to specify a *smartJoinAttribute* for the collection.
@@ -252,8 +257,8 @@ The join can now be performed via the collection's *smartJoinAttribute*:
252257
6 ReturnNode COOR 101 - RETURN doc1
253258

254259

255-
Restricting smart joins to a single shard
256-
-----------------------------------------
260+
Restricting SmartJoins to a single shard
261+
----------------------------------------
257262

258263
If a FILTER condition is used on one of the shard keys, the optimizer will also try
259264
to restrict the queries to just the required shards:
@@ -277,7 +282,7 @@ to restrict the queries to just the required shards:
277282
Limitations
278283
-----------
279284

280-
The smart join optimization is currently triggered only for data selection queries,
285+
The SmartJoin optimization is currently triggered only for data selection queries,
281286
but not for any data-manipulation operations such as INSERT, UPDATE, REPLACE, REMOVE
282287
or UPSERT, neither traversals, subqueries or views.
283288

@@ -289,5 +294,5 @@ It is restricted to be used with simple shard key attributes (such as `_key`, `p
289294
but not with nested attributes (e.g. `name.first`). There should be exactly one shard
290295
key attribute defined for each collection.
291296

292-
Finally, the smart join optimization requires that the collections are joined on their
297+
Finally, the SmartJoin optimization requires that the collections are joined on their
293298
shard key attributes (or smartJoinAttribute) using an equality comparison.

_data/3.4-manual.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -498,8 +498,8 @@
498498
href: foxx-migrating2x-queries.html
499499
- text: Satellite Collections
500500
href: satellites.html
501-
- text: Smart Joins
502-
href: smart-joins.html
501+
- text: SmartJoins
502+
href: smartjoins.html
503503
- subtitle: OPERATIONS
504504
- text: Installation
505505
href: installation.html

_data/3.5-manual.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -504,8 +504,8 @@
504504
href: foxx-migrating2x-queries.html
505505
- text: Satellite Collections
506506
href: satellites.html
507-
- text: Smart Joins
508-
href: smart-joins.html
507+
- text: SmartJoins
508+
href: smartjoins.html
509509
- subtitle: OPERATIONS
510510
- text: Installation
511511
href: installation.html

0 commit comments

Comments
 (0)