This repository was archived by the owner on Dec 13, 2023. It is now read-only.
- Notifications
You must be signed in to change notification settings - Fork 57
DOC-43/Pregel algorithms - restructured content #1047
Merged
Merged
Changes from 1 commit
Commits
Show all changes
17 commits Select commit Hold shift + click to select a range
4e2bb12 Pregel algorithms - restructured content
nerpaula f088c98 Merge branch 'main' into pregel-restructure
ansoboleva 04cf309 Update graphs-pregel-algorithms.md
ansoboleva e4ebf09 Update 3.9-manual.yml
ansoboleva 5de407a fixed broken links (data science 3.9)
nerpaula 2616881 fixed other broken links
nerpaula eaaa2e4 Apply suggestions from code review
nerpaula 793c0b5 applied other suggestions
nerpaula 368ab68 fixed broken link
nerpaula 92e7941 Apply suggestions from code review
nerpaula 4ac01dd Apply suggestions from code review
nerpaula 001d1b0 Update 3.9/graphs-pregel-algorithms.md
nerpaula aff2a0a applied changes to 3.10
nerpaula 9228da0 fix broken link
nerpaula d29d901 Merge branch 'main' into pregel-restructure
nerpaula 516b2b2 Update graphs-pregel-algorithms.md
ansoboleva 63d1423 Update graphs-pregel-algorithms.md
ansoboleva File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Apply suggestions from code review
Co-authored-by: Roman Rabinovich <89132743+romanatarango@users.noreply.github.com>
- Loading branch information
commit eaaa2e4887b99f3a160d1aa38daf2d2dd1563b01
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -56,7 +56,13 @@ PageRank is a well known algorithm to rank documents in a graph. The algorithm | |
| runs until the execution converges. Specify a custom threshold with the | ||
| parameter `threshold`, to run for a fixed number of iterations use the | ||
| `maxGSS` parameter. | ||
| PageRank is a well-known algorithm to rank vertices in a graph: the more important a vertex, the higher rank it gets. It goes back to L. Page and S. Brin's [paper](http://infolab.stanford.edu/pub/papers/google.pdf) and is used to rank pages in in search engines (hence the name). | ||
| | ||
| The rank of a vertex is a positive real number. The algorithm starts with every vertex having the same rank (one divided by the number of vertices) and sends its rank to its out-neighbors. The computation proceeds in iterations. In each iteration, the new rank is computed according to the formula "(0.15/total number of vertices) + (0.85 * the sum of all incoming ranks)". The value sent to each of the out-neighbors is the new rank divided by the number of those neighbors, thus every out-neighbor gets the same part of the new rank. | ||
| | ||
| The algorithm stops when at least one of the two conditions is satisfied: | ||
| - The maximum number of iterations is reached. This is the same parameter `maxGSS` as for the other algorithms. | ||
| - Every vertex changes its rank in the last iteration by less than a certain threshold. The default threshold is 0.00001, a custom value can be set with the parameter `threshold`. | ||
| ```js | ||
| var pregel = require("@arangodb/pregel"); | ||
| pregel.start("pagerank", "graphname", {maxGSS: 100, threshold: 0.00000001, resultField: "rank"}) | ||
| | @@ -77,15 +83,15 @@ pregel.start("pagerank", "graphname", {maxGSS: 20, threshold: 0.00000001, source | |
| | ||
| ### Single-Source Shortest Path | ||
| | ||
| Calculates the shortest path length between the source and all other vertices. | ||
| Calculates the shortest path length between the given source and all other vertices, called _targets_. The result is written to the specified property of the respective target. | ||
nerpaula marked this conversation as resolved. Outdated Show resolved Hide resolved | ||
| The distance to the source vertex itself is returned as `0` and a length above | ||
| `9007199254740991` (max safe integer) means that there is no connection between | ||
| `9007199254740991` (max safe integer) means that there is no path from the source to the vertex in the graph. | ||
| a pair of vertices. | ||
nerpaula marked this conversation as resolved. Outdated Show resolved Hide resolved | ||
| | ||
| The algorithm runs until it converges, the iterations are bound by the | ||
| The algorithm runs until all distances are computed. The number of iterations is bounded by the | ||
| diameter (the longest shortest path) of your graph. | ||
nerpaula marked this conversation as resolved. Outdated Show resolved Hide resolved | ||
| | ||
| Requires a `source` document ID parameter. The result field needs to be | ||
| An call of the algorithm requires the `source` parameter whose value is the document ID of the source vertex. The result field needs to be | ||
| specified in `_resultField` (note the underscore). | ||
| | ||
| ```js | ||
| | @@ -97,30 +103,30 @@ pregel.start("sssp", "graphname", {source: "vertices/1337", _resultField: "dista | |
| | ||
| There are three algorithms to find connected components in a graph: | ||
| | ||
| 1. If your graph is effectively undirected (you have edges in both directions | ||
| between vertices) then the simple **connected components** algorithm named | ||
| 1. If your graph is effectively undirected (for every edge from vertex A to vertex B there is also an edge from B to A) | ||
| , then the simple **connected components** algorithm named | ||
| `"connectedcomponents"` is suitable. | ||
| | ||
| It is a very simple and fast algorithm, but only works correctly on | ||
| It is a very simple and fast algorithm, but it only works correctly on | ||
| undirected graphs. Your results on directed graphs may vary, depending on | ||
| how connected your components are. | ||
| Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like this sentence. The result will just not be correct, one should be discouraged to use the algorithm on such graphs. (Also, in our setting undirected graphs are a special case of directed graphs with the property I wrote into a comment, so, strictly speaking, the sentence is not correct.) | ||
| | ||
| In an undirected graph, a _connected component_ is a subgraph | ||
| - where there is a path between every pair of vertices from this component and | ||
| - which is maximal with this property: adding any other vertex would destroy it. In other words, there is no path between any vertex from the component and any vertex not in the component. | ||
| 2. To find **weakly connected components** (WCC) you can use the algorithm | ||
| named `"wcc"`. Weakly connected means that there exists a path from every | ||
| vertex pair in that component. | ||
| A _weakly connected component_ in a directed graph is a maximal subgraph such that there is a path between each pair of vertices where _we can walk also against the direction of edges._ More formally, it is a connected component (see the definition above) in the _underlying undirected graph_, i.e., in the undirected graph obtained by adding an edge from vertex B to vertex A (if it does not already exist), if there is an edge from vertex A to vertex B. | ||
| | ||
| This algorithm works on directed graphs but requires a greater amount of | ||
| This algorithm works on directed graphs but, in general, requires a greater amount of | ||
| traffic between your DB-Servers. | ||
| | ||
| 3. To find **strongly connected components** (SCC) you can use the algorithm | ||
| named `"scc"`. Strongly connected means every vertex is reachable from any | ||
| other vertex in the same component. | ||
| named `"scc"`. A _strongly connected component_ is a maximal subgraph where, for every two vertices, there is a path from one of them to the other. It is thus defined as a weakly connected component, but one is not allowed to run against the edge directions. | ||
| | ||
| The algorithm is more complex than the WCC algorithm and requires more | ||
| memory, because each vertex needs to store much more state. Consider using | ||
| WCC if you think your data may be suitable for it. | ||
| The algorithm is more complex than the WCC algorithm and, in general, requires more | ||
| memory. | ||
| | ||
| All above algorithms will assign a component ID to each vertex. | ||
| All above algorithms will assign to each vertex a component ID, a number which will be written into the specified `resultField`. All vertices from the same component obtain the same component ID, every two vertices from different components obtain different IDs. | ||
| | ||
| ```js | ||
| var pregel = require("@arangodb/pregel"); | ||
| | @@ -264,10 +270,11 @@ distribution of the initial IDs over the vertices. | |
| | ||
| Then, in each iteration, a vertex sends its current Community | ||
| ID to all its neighbor vertices. After that each vertex adopts the Community ID it | ||
| received most frequently in the last step. If a vertex obtains more than one | ||
| most frequent IDs, it chooses the lowest number (as IDs are numbers). If no ID arrived more | ||
| than once and the ID of the vertex from the previous step is less than the | ||
| lowest obtained ID number, the old ID is kept. | ||
| received most frequently in the last step. | ||
| | ||
| The details are somewhat subtle. If a vertex obtains only one ID and the ID of the vertex from the previous step, its old ID, is less than the obtained ID, the old ID is kept. (IDs are numbers and thus comparable to each other.) If a vertex obtains more than one ID, its new ID is the lowest ID among the most frequently obtained IDs. (For example, if the obtained IDs are 1, 2, 2, 3, 3, then 2 is the new ID. ) If, however, no ID arrives more than once, the new ID is the minimum of the lowest obtained IDs and the old ID. (For example, if the old ID is 5 and the obtained IDs are 3, 4, 6, then the new ID is 3. If the old ID is 2, it is kept.) | ||
nerpaula marked this conversation as resolved. Outdated Show resolved Hide resolved | ||
| | ||
| If a vertex keeps its ID 20 times or more in a row, it does not send its ID. Vertices that did not obtain any IDs do not update their ID and do not send it. | ||
| | ||
| The algorithm runs until it converges, which likely never really happens on | ||
| large graphs. Therefore you need to specify a maximum iteration bound. | ||
| | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.