@@ -7,33 +7,37 @@ description: >-
77aliases :
88 - graphml
99---
10- Traditional machine learning overlooks the connections and relationships
10+ Traditional Machine Learning (ML) overlooks the connections and relationships
1111between data points, which is where graph machine learning excels. However,
1212accessibility to GraphML has been limited to sizable enterprises equipped with
13- specialized teams of data scientists. ArangoGraphML, on the other hand,
14- simplifies the utilization of GraphML, enabling a broader range of personas to
15- extract profound insights from their data.
13+ specialized teams of data scientists. ArangoGraphML simplifies the utilization of GraphML,
14+ enabling a broader range of personas to extract profound insights from their data.
1615
1716## How GraphML works
1817
19- GraphML focuses on the utilization of neural networks specifically for
20- graph-related tasks. It is well-suited for addressing vague or fuzzy problems
21- and facilitating their resolution. The process involves incorporating a graph's
22- topology (node and edge structure) and the node and edge characteristics and
23- features to create a numerical representation known as an embedding.
18+ Graph machine learning leverages the inherent structure of graph data, where
19+ entities (nodes) and their relationships (edges) form a network. Unlike
20+ traditional ML, which primarily operates on tabular data, GraphML applies
21+ specialized algorithms like Graph Neural Networks (GNNs), node embeddings, and
22+ link prediction to uncover complex patterns and insights.
23+
24+ 1 . ** Graph Construction** :
25+ Raw data is transformed into a graph structure, defining nodes and edges based
26+ on real-world relationships.
27+ 2 . ** Featurization** :
28+ Nodes and edges are enriched with features that help in training predictive models.
29+ 3 . ** Model Training** :
30+ Machine learning techniques are applied on GNNs to identify patterns and make predictions.
31+ 4 . ** Inference & Insights** :
32+ The trained model is used to classify nodes, detect anomalies, recommend items,
33+ or predict future connections.
34+
35+ ArangoGraphML streamlines these steps, providing an intuitive and scalable
36+ framework to integrate GraphML into various applications, from fraud detection
37+ to recommendation systems.
2438
2539![ GraphML Embeddings] ( ../../../images/GraphML-Embeddings.webp )
2640
27- Graph Neural Networks (GNNs) are explicitly designed to learn meaningful
28- numerical representations, or embeddings, for nodes and edges in a graph.
29-
30- By applying a series of steps, GNNs effectively create graph embeddings,
31- which are numerical representations that encode the essential information
32- about the nodes and edges in the graph. These embeddings can then be used
33- for various tasks, such as node classification, link prediction, and
34- graph-level classification, where the model can make predictions based on the
35- learned patterns and relationships within the graph.
36-
3741![ GraphML Workflow] ( ../../../images/GraphML-How-it-works.webp )
3842
3943It is no longer necessary to understand the complexities involved with graph
@@ -45,71 +49,133 @@ The platform comes preloaded with all the tools needed to prepare your graph
4549for machine learning, high-accuracy training, and persisting predictions back
4650to the database for application use.
4751
48- ### Classification
49-
50- Node classification is a natural fit for graph databases as it can leverage
51- existing graph analytics insights during model training. For instance, if you
52- have performed some community detection, potentially using ArangoDB's built-in
53- Pregel support, you can use these insights as inputs for graph machine learning.
54-
55- #### What is Node Classification
56-
57- The goal of node classification is to categorize the nodes in a graph based on
58- their neighborhood connections and characteristics in the graph. Based on the
59- behaviors or patterns in the graph, the Graph Neural Network (GNN) will be able
60- to learn what makes a node belong to a category.
61-
62- Node classification can be used to solve complex problems such as:
63- - Entity Categorization
64- - Email
65- - Books
66- - WebPage
67- - Transaction
68- - Social Networks
69- - Events
70- - Friends
71- - Interests
72- - BioPharmaceutical
73- - Protein-protein interaction
74- - Drug Categorization
75- - Sequence grouping
76- - Behavior
77- - Fraud
78- - Purchase/decision making
79- - Anomaly
80-
81- Many use cases can be solved with node classification. With many challenges,
82- there are multiple ways to attempt to solve them, and that's why the
83- ArangoGraphML node classification is only the first of many techniques to be
84- introduced. You can sign up to get immediate access to our latest stable
85- features and also try out other features included in the pipeline, such as
86- embedding similarity or link prediction.
87-
88- For more information, [ get in touch] ( https://www.arangodb.com/contact/ )
89- with the ArangoDB team.
90-
91- ### Metrics and Compliance
92-
93- #### Training Performance
94-
95- Before using a model to provide predictions to your application, there needs
96- to be a way to determine its level of accuracy. Additionally, a mechanism must
97- be in place to ensure the experiments comply with auditor requirements.
98-
99- ArangoGraphML supports these objectives by storing all relevant training data
100- and metrics in a metadata graph, which is only available to you and is never
101- viewable by ArangoDB. This metagraph contains valuable training metrics such as
102- average accuracy (the general metric for determining model performance), F1,
103- Recall, Precision, and confusion matrix data. This graph links all experiments
52+ ## Supported Tasks
53+
54+ ### Node Classification
55+
56+ Node classification is a ** supervised learning** task where the goal is to
57+ predict the label of a node based on both its own features and its relationships
58+ within the graph. It requires a set of labeled nodes to train a model, which then
59+ classifies unlabeled nodes based on learned patterns.
60+
61+ ** How it works in ArangoGraphML**
62+
63+ - A portion of the nodes in a graph is labeled for training.
64+ - The model learns patterns from both ** node features** and
65+ ** structural relationships** (neighboring nodes and connections).
66+ - It predicts labels for unlabeled nodes based on these learned patterns.
67+
68+ ** Example Use Cases**
69+
70+ - ** Fraud Detection in Financial Networks**
71+ - ** Problem:** Fraudsters often create multiple accounts or interact within
72+ suspicious clusters to evade detection.
73+ - ** Solution:** A transaction graph is built where nodes represent users and
74+ edges represent transactions. The model learns patterns from labeled
75+ fraudulent and legitimate users, detecting hidden fraud rings based on
76+ ** both user attributes and transaction relationships** .
77+
78+ - ** Customer Segmentation in E-Commerce & Social Media**
79+ - ** Problem:** Businesses need to categorize customers based on purchasing
80+ behavior and engagement.
81+ - ** Solution:** A graph is built where nodes represent customers and edges
82+ represent interactions (purchases, reviews, social connections). The model
83+ predicts the category of each user based on how similar they are to other users
84+ ** not just by their personal data, but also by how they are connected to others** .
85+
86+ - ** Disease Classification in Biomedical Networks**
87+ - ** Problem:** Identifying proteins or genes associated with a disease.
88+ - ** Solution:** A protein interaction graph is built where nodes are proteins
89+ and edges represent biochemical interactions. The model classifies unknown
90+ proteins based on their interactions with known disease-related proteins,
91+ rather than just their individual properties.
92+
93+ ### Node Embedding Generation
94+
95+ Node embedding is an ** unsupervised learning** technique that converts nodes
96+ into numerical vector representations, preserving their ** structural relationships**
97+ within the graph. Unlike simple feature aggregation, node embeddings
98+ ** capture the influence of neighboring nodes and graph topology** , making
99+ them powerful for downstream tasks like clustering, anomaly detection,
100+ and link prediction. These combinations can provide valuable insights.
101+ Consider using [ ArangoDB's Vector Search] ( https://arangodb.com/2024/11/vector-search-in-arangodb-practical-insights-and-hands-on-examples/ )
102+ capabilities to find similar nodes based on their embeddings.
103+
104+ ** Feature Embeddings versus Node Embeddings**
105+
106+ ** Feature Embeddings** are vector representations derived from the attributes or
107+ features associated with nodes. These embeddings aim to capture the inherent
108+ characteristics of the data. For example, in a social network, a
109+ feature embedding might encode user attributes like age, location, and
110+ interests. Techniques like ** Word2Vec** , ** TF-IDF** , or ** autoencoders** are
111+ commonly used to generate such embeddings.
112+
113+ In the context of graphs, ** Node Embeddings** are a
114+ ** combination of a node's feature embedding and the structural information from its connected edges** .
115+ Essentially, they aggregate both the node's attributes and the connectivity patterns
116+ within the graph. This fusion helps capture not only the individual properties of
117+ a node but also its position and role within the network.
118+
119+ ** How it works in ArangoGraphML**
120+
121+ - The model learns an embedding (a vector representation) for each node based on its
122+ ** position within the graph and its connections** .
123+ - It ** does not rely on labeled data** – instead, it captures structural patterns
124+ through graph traversal and aggregation of neighbor information.
125+ - These embeddings can be used for similarity searches, clustering, and predictive tasks.
126+
127+ ** Example Use Cases**
128+
129+ - ** Recommendation Systems (E-commerce & Streaming Platforms)**
130+ - ** Problem:** Platforms like Amazon, Netflix, and Spotify need to recommend products,
131+ movies, or songs.
132+ - ** Solution:** A user-item interaction graph is built where nodes are users
133+ and products, and edges represent interactions (purchases, ratings, listens).
134+ ** Embeddings encode relationships** , allowing the system to recommend similar
135+ items based on user behavior and network influence rather than just individual
136+ preferences.
137+
138+ - ** Anomaly Detection in Cybersecurity & Finance**
139+ - ** Problem:** Detecting unusual activity (e.g., cyber attacks, money laundering)
140+ in complex networks.
141+ - ** Solution:** A network of IP addresses, users, and transactions is represented as
142+ a graph. Nodes with embeddings that significantly deviate from normal patterns
143+ are flagged as potential threats. The key advantage here is that anomalies are
144+ detected based on ** network structure, not just individual activity logs** .
145+
146+ - ** Link Prediction (Social & Knowledge Graphs)**
147+ - ** Problem:** Predicting new relationships, such as suggesting friends on
148+ social media or forecasting research paper citations.
149+ - ** Solution:** A social network graph is created where nodes are users, and
150+ edges represent friendships. ** Embeddings capture the likelihood of
151+ connections forming based on shared neighborhoods and structural
152+ similarities, even if users have never interacted before** .
153+
154+ ### Key Differences
155+
156+ | Feature | Node Classification | Node Embedding Generation |
157+ | -----------------------| ---------------------| ----------------------------|
158+ | ** Learning Type** | Supervised | Unsupervised |
159+ | ** Input Data** | Labeled nodes | Graph structure & features |
160+ | ** Output** | Predicted labels | Node embeddings (vectors) |
161+ | ** Key Advantage** | Learns labels based on node connections and attributes | Learns structural patterns and node relationships |
162+ | ** Use Cases** | Fraud detection, customer segmentation, disease classification | Recommendations, anomaly detection, link prediction |
163+
164+ ArangoGraphML provides the infrastructure to efficiently train and apply these
165+ models, helping users extract meaningful insights from complex graph data.
166+
167+ ## Metrics and Compliance
168+
169+ ArangoGraphML supports tracking your ML pipeline by storing all relevant metadata
170+ and metrics in a Graph called ArangoPipe. This is only available to you and is never
171+ viewable by ArangoDB. This metadata graph links all experiments
104172to the source data, feature generation activities, training runs, and prediction
105- jobs. Having everything linked across the entire pipeline ensures that, at any
106- time, anything done that could be considered associated with sensitive user data,
107- it is logged and easily accessible.
173+ jobs, allowing you to track the entire ML pipeline without having to leave ArangoDB.
108174
109175### Security
110176
111177Each deployment that uses ArangoGraphML has an ` arangopipe ` database created,
112- which houses all this information. Since the data lives with the deployment,
178+ which houses all ML Metadata information. Since this data lives within the deployment,
113179it benefits from the ArangoGraph SOC 2 compliance and Enterprise security features.
114180All ArangoGraphML services live alongside the ArangoGraph deployment and are only
115- accessible within that organization.
181+ accessible within that organization.
0 commit comments