- Notifications
You must be signed in to change notification settings - Fork 155
Description
1. Overview
1.1 Algorithm Background
Louvain is an efficient multi-level community detection algorithm that discovers community structures in graphs by optimizing the modularity metric. The algorithm has the following characteristics:
- Efficiency: Time complexity of O(n log n), suitable for large-scale graph processing
- Multi-level: Supports multi-resolution community detection
- Modularity Optimization: Maximizes the modularity metric of graphs
- Distributed Adaptation: Easy to implement on distributed frameworks
1.2 Implementation Framework
- AlgorithmUserFunction Interface: Standard user-defined graph algorithm interface
- Message Passing Model: Supports asynchronous communication between vertices
- Iterative Execution Framework: Built-in iteration control and convergence detection
- Distributed Execution Capability: Supports distributed graph computing
1.3 Core Objectives
- Implement the first phase of the standard Louvain algorithm (local optimization)
- Support both weighted and unweighted graphs
- Pass all unit tests and integration tests
- Provide clear code comments and documentation
- Reserve interfaces for future multi-level extensions
2. System Architecture
2.1 Core Class Structure
geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/org/apache/geaflow/dsl/udf/graph/ ├── Louvain.java # Main algorithm implementation class ├── LouvainMessage.java # Message passing class ├── LouvainVertexValue.java # Vertex state value class ├── LouvainAggregator.java # Community aggregator class ├── LouvainCommunityInfo.java # Community information class └── LouvainMessageCombiner.java # Message combiner class 2.2 Algorithm Flow Design
graph TD A[Algorithm Entry] --> B[Iteration 1] B --> B1[Initialize: Each vertex is independent community] B1 --> B2[Broadcast community info to neighbors] B2 --> C[Iteration 2-N] C --> C1[Receive neighbor community information] C1 --> C2[Calculate ΔQ for moving to each community] C2 --> C3[Select optimal community] C3 --> C4{Converged?} C4 -->|No| C5[Update community assignment] C5 --> C2 C4 -->|Yes| D[Output final community result] 2.3 Component Interaction Flow
┌─────────────────────────────────────────────────────┐ │ Louvain.process() │ ├─────────────────────────────────────────────────────┤ │ │ │ 1. deserializeVertexValue() ← Get vertex state │ │ 2. loadEdges(BOTH) ← Load neighbor edges │ │ 3. getCurrentIterationId() ← Determine iteration │ │ │ │ IF Iteration 1: │ │ ├─ initializeVertex() │ │ │ ├─ Calculate vertex total weight │ │ │ └─ Broadcast initial community info │ │ └─ updateVertexValue() ← Save state │ │ │ │ ELSE IF Iterating: │ │ ├─ optimizeVertexCommunity() │ │ │ ├─ Aggregate neighbor messages │ │ │ ├─ Calculate ΔQ │ │ │ ├─ Select best community │ │ │ └─ Broadcast update info │ │ └─ updateVertexValue() ← Save state │ │ │ └─────────────────────────────────────────────────────┘ 3. Data Structure Details
3.1 LouvainVertexValue (Vertex Value)
Stores community information and weight statistics for each vertex:
public class LouvainVertexValue implements Serializable { // Current community ID that this vertex belongs to private Object communityId; // Total weight (degree) of edges connected to this vertex private double totalWeight; // Weight of edges within the community private double internalWeight; // Mapping of neighbor community ID to weight to that community private Map<Object, Double> neighborCommunityWeights; }Key Field Meanings:
| Field | Meaning | Purpose |
|---|---|---|
communityId | Community ID that vertex currently belongs to | Identify community membership |
totalWeight | Total weight of all edges connected to vertex | Used as ki in modularity calculation |
internalWeight | Weight of internal edges within community | Community internal cohesion indicator |
neighborCommunityWeights | Weight mapping to neighboring communities | Evaluate candidate community moves |
3.2 LouvainMessage (Message Class)
Messages transmitted between vertices to exchange community information and weights:
public class LouvainMessage implements Serializable { // Community ID that the message sender belongs to private Object communityId; // Edge weight from sender to receiver private double edgeWeight; // Message type enumeration private MessageType messageType; // COMMUNITY_INFO or WEIGHT_UPDATE }Message Flow:
- Vertex v initialization: Send
LouvainMessage(v.id, 1.0)to all neighbors - Neighbor u reception: Accumulate weight from v to
neighborCommunityWeights[v.communityId] - Calculate ΔQ: Evaluate movement benefit based on neighbor community weights
- Broadcast update: If community changes, resend new community info to neighbors
3.3 LouvainAggregator (Community Aggregator)
Handles community-level statistics and aggregation:
public class LouvainAggregator { // Community ID to community information mapping private Map<Object, LouvainCommunityInfo> communityMap; // Supergraph edge weight: (community1, community2) -> weight private Map<String, Double> newEdgeWeights; // Supernodes list (aggregated communities) private List<Object> superNodes; // Total edge weight of the graph private double totalEdgeWeight; }Key Methods:
addVertexToCommunity(): Add vertex to communityaddEdgeBetweenCommunities(): Record edge weight between communitiescalculateModularityContributions(): Calculate modularity contribution of each communitygetTotalModularity(): Calculate total modularity
3.4 LouvainCommunityInfo (Community Information)
Encapsulates statistical information for a single community:
public class LouvainCommunityInfo implements Serializable { // Unique identifier of the community private Object communityId; // Set of vertices contained in the community private Set<Object> memberVertices; // Total weight of internal edges within the community private double internalWeight; // Total degree of community members private double totalWeight; // Mapping of edge weight between community and external communities private Map<Object, Double> externalWeights; }4. Core Algorithm Implementation
4.1 Initialization Phase (Iteration 1)
Purpose: Initialize each vertex as an independent community, calculate basic statistics
Implementation Steps:
private void initializeVertex(RowVertex vertex, LouvainVertexValue vertexValue, List<RowEdge> edges) { // Step 1: Calculate vertex's total weight (degree) double totalWeight = 0.0; for (RowEdge edge : edges) { double weight = getEdgeWeight(edge); // For weighted graphs extract edge weight, for unweighted return 1.0 totalWeight += weight; } // Step 2: Set initial community to vertex itself vertexValue.setCommunityId(vertex.getId()); vertexValue.setTotalWeight(totalWeight); vertexValue.setInternalWeight(0.0); // No internal edges initially // Step 3: Broadcast initial community information to all neighbors sendCommunityInfoToNeighbors(vertex, edges, vertexValue); // Neighbors receive this vertex's community info in iteration 2 }Key Calculation:
- Total weight = Σ(weight of edges connected to vertex)
- For unweighted graphs, each edge weight is 1.0
- For weighted graphs, extract weight from edge's value field
4.2 Optimization Phase (Iteration 2-N)
Purpose: Move vertices to optimal communities based on modularity gain
Core Logic:
private void optimizeVertexCommunity(RowVertex vertex, LouvainVertexValue vertexValue, List<RowEdge> edges, Iterator<LouvainMessage> messages) { // Step 1: Aggregate community information from neighbors LouvainMessageCombiner combiner = new LouvainMessageCombiner(); Map<Object, Double> aggregatedWeights = combiner.combineMessages(messages); // Result: aggregatedWeights[communityId] = total weight from vertex to that community // Step 2: Iterate through all adjacent communities, calculate modularity gain of moving double maxDeltaQ = 0.0; Object bestCommunity = vertexValue.getCommunityId(); // Default: no move for (Object communityId : aggregatedWeights.keySet()) { double deltaQ = calculateModularityGain(vertex.getId(), vertexValue, communityId, edges); if (deltaQ > maxDeltaQ) { maxDeltaQ = deltaQ; bestCommunity = communityId; } } // Step 3: If better community found, update community assignment if (!bestCommunity.equals(vertexValue.getCommunityId())) { vertexValue.setCommunityId(bestCommunity); // Note: In actual implementation, may need threshold judgment for convergence } // Step 4: Broadcast updated community information to neighbors sendCommunityInfoToNeighbors(vertex, edges, vertexValue); }4.3 Modularity Gain Calculation
Formula:
ΔQ = [Σin + ki,in / 2m] - [Σtot + ki / 2m]² - [Σin / 2m - (Σtot / 2m)² - (ki / 2m)²] Parameter Explanation:
| Parameter | Meaning |
|---|---|
| m | Total number of edges in graph (for weighted graphs, total weight) |
| ki | Degree of vertex i (for weighted graphs, sum of weights) |
| ki,in | Edge weight from vertex i to target community |
| Σin | Internal edge weight of target community |
| Σtot | Total weight of all vertices in target community |
Implementation Code:
private double calculateModularityGain(Object vertexId, LouvainVertexValue vertexValue, Object targetCommunity, List<RowEdge> edges) { // Ensure total edge weight is calculated if (totalEdgeWeight == 0) { for (RowEdge edge : edges) { totalEdgeWeight += getEdgeWeight(edge); } } double m = totalEdgeWeight; double ki = vertexValue.getTotalWeight(); // Vertex total weight double kiIn = vertexValue.getNeighborCommunityWeights() .getOrDefault(targetCommunity, 0.0); // Weight to target community // Simplified implementation: sigmaTot and sigmaIn set to 0 // Full implementation needs to maintain community-level statistics double sigmaTot = 0.0; double sigmaIn = 0.0; if (m == 0) { return 0.0; // Empty graph has no modularity gain } // Calculate ΔQ double a = (kiIn + sigmaIn / (2 * m)) - ((sigmaTot + ki) / (2 * m)) * ((sigmaTot + ki) / (2 * m)); double b = (kiIn / (2 * m)) - (sigmaTot / (2 * m)) * (sigmaTot / (2 * m)) - (ki / (2 * m)) * (ki / (2 * m)); return a - b; }Note: Current implementation uses simplified modularity calculation. Complete multi-level implementation requires maintaining community-level statistics.
4.4 Message Aggregation (LouvainMessageCombiner)
Purpose: Merge duplicate community information from multiple neighbors, reduce subsequent computation
Implementation:
public class LouvainMessageCombiner { /** * Merge multiple messages from the same community into a single weight value */ public Map<Object, Double> combineMessages(Iterator<LouvainMessage> messages) { Map<Object, Double> combined = new HashMap<>(); while (messages.hasNext()) { LouvainMessage msg = messages.next(); Object communityId = msg.getCommunityId(); double weight = msg.getEdgeWeight(); // Accumulate weights with same community ID combined.put(communityId, combined.getOrDefault(communityId, 0.0) + weight); } return combined; } }Advantages:
- Reduce iteration count in subsequent forEach loops
- Avoid duplicate modularity gain calculations for same community
- Improve computation efficiency for high-degree vertices
4.5 Convergence Detection
Current Implementation:
@Override public void finishIteration(long iterationId) { // Reserved interface for global convergence detection // Full implementation can add: // - Count vertices moved in this iteration // - Global aggregation convergence check // - Voting termination mechanism }Future Optimization:
Use framework-provided voting termination mechanism:
// If no vertices moved in this iteration, vote for termination if (noVertexMoved) { context.voteToTerminate("CONVERGED", 1); }5. Integration and Registration
5.1 Framework Registration
Register algorithm in BuildInSqlFunctionTable.java:
// Import import org.apache.geaflow.dsl.udf.graph.Louvain; // Add in UDGA registration section .add(GeaFlowFunction.of(Louvain.class))5.2 Annotation Marking
Mark algorithm using @Description annotation:
@Description( name = "louvain", description = "built-in udga for Louvain community detection" ) public class Louvain implements AlgorithmUserFunction<Object, LouvainMessage> { // ... }5.3 SQL Invocation Method
-- Basic invocation (use all default parameters) CALL louvain() YIELD (vid, community) RETURN vid, community; -- Invocation with parameters CALL louvain(10, 0.001, false) YIELD (vid, community) RETURN vid, community; -- Parameters: maxIterations=10, modularity=0.001, isWeighted=false6. Testing Strategy
6.1 Test File Locations
geaflow/geaflow-dsl/geaflow-dsl-runtime/src/test/ ├── java/org/apache/geaflow/dsl/runtime/query/ │ └── GQLAlgorithmTest.java # Test class └── resources/ ├── query/gql_algorithm_louvain.sql # Test query └── expect/gql_algorithm_louvain.txt # Expected output 6.2 Test Query (SQL)
-- Create test graph g4 CREATE GRAPH IF NOT EXISTS g4 ( Vertex v4 (vid varchar ID, vvalue int), Edge e4 (srcId varchar SOURCE ID, targetId varchar DESTINATION ID) ); -- Load test data from file INSERT INTO g4.v4(vid, vvalue) SELECT v_id, v_value FROM v_source; INSERT INTO g4.e4(srcId, targetId) SELECT src_id, dst_id FROM e_source; -- Execute Louvain algorithm INSERT INTO tbl_result(v_id, community_id) CALL louvain() YIELD (vid, community) RETURN vid, community;6.3 Test Data
Vertex Set (test_vertex):
1, 1 2, 2 3, 3 4, 4 5, 5 6, 6 Edge Set (test_edge):
1-2, 1-3, 1-4, 1-5 2-1, 2-3, 2-5 3-1, 3-2, 3-4, 3-5 4-1, 4-3, 4-6 5-1, 5-2, 5-3, 5-6 6-4, 6-5 Graph Characteristics:
- 6 vertices, 9 edges (undirected)
- Dense connectivity topology
- Forms single community structure
6.4 Expected Output
For the test graph above, all vertices are expected to belong to the same community (community ID=1):
1,1 5,1 3,1 4,1 2,1 6,1 6.5 Test Execution
Run Tests:
# Run single test method mvn test -pl geaflow/geaflow-dsl/geaflow-dsl-runtime \ -Dtest=GQLAlgorithmTest#testAlgorithmLouvain # Run all algorithm tests mvn test -pl geaflow/geaflow-dsl/geaflow-dsl-runtime \ -Dtest=GQLAlgorithmTest7. Parameter Details
7.1 Initialization Parameters
public void init(AlgorithmRuntimeContext<Object, LouvainMessage> context, Object[] parameters)Parameter Description:
| Position | Parameter Name | Type | Default | Description |
|---|---|---|---|---|
| 0 | maxIterations | int | 20 | Maximum number of iterations |
| 1 | modularity | double | 0.001 | Modularity convergence threshold |
| 2 | minCommunitySize | int | 1 | Minimum community size |
| 3 | isWeighted | boolean | false | Whether graph is weighted |
Usage Examples:
-- Default parameters CALL louvain(); -- Custom maximum iterations CALL louvain(30); -- Custom iterations and convergence threshold CALL louvain(30, 0.0001); -- Full parameters CALL louvain(30, 0.0001, 2, true);7.2 Edge Weight Handling
Unweighted Graphs (isWeighted=false):
- All edge weights default to 1.0
- Whether edges have value field doesn't affect calculation
Weighted Graphs (isWeighted=true):
- Extract weight from edge's value field
- Support Double type weight values
- Fallback to 1.0 if extraction fails
8. Code Standards and Quality
8.1 Coding Conventions
-
Comment Standards:
- Classes and methods use JavaDoc format
- Add inline comments for complex logic
- Keep comments in English
-
Naming Conventions:
- Class names use PascalCase (LouvainVertexValue)
- Method names use camelCase (calculateModularityGain)
- Constants use UPPER_CASE (COMMUNITY_INFO)
-
Code Style:
- Follow Apache code style
- Use 4-space indentation
- Single line length not exceeding 100 characters
8.2 Serialization and Deserialization
Vertex Value Serialization:
// Serialize to Row object (for storage) private Row serializeVertexValue(LouvainVertexValue value) { return ObjectRow.create( value.getCommunityId(), // Field 0: Community ID value.getTotalWeight(), // Field 1: Total weight value.getInternalWeight() // Field 2: Internal weight ); } // Deserialize from Row object private LouvainVertexValue deserializeVertexValue(Row row) { Object communityId = row.getField(0, ObjectType.INSTANCE); Object totalWeightObj = row.getField(1, DoubleType.INSTANCE); Object internalWeightObj = row.getField(2, DoubleType.INSTANCE); double totalWeight = totalWeightObj instanceof Number ? ((Number) totalWeightObj).doubleValue() : 0.0; double internalWeight = internalWeightObj instanceof Number ? ((Number) internalWeightObj).doubleValue() : 0.0; LouvainVertexValue value = new LouvainVertexValue(); value.setCommunityId(communityId); value.setTotalWeight(totalWeight); value.setInternalWeight(internalWeight); return value; }8.3 Exception Handling
Edge Weight Extraction Fault Tolerance:
private double getEdgeWeight(RowEdge edge) { if (isWeighted) { try { Row value = edge.getValue(); if (value != null) { Object weightObj = value.getField(0, ObjectType.INSTANCE); if (weightObj instanceof Number) { return ((Number) weightObj).doubleValue(); } } } catch (Exception e) { // If extraction fails, silently fallback to default weight // This improves robustness } } return 1.0; // Default weight }9. Performance Characteristics
9.1 Time Complexity
Single Iteration Time Complexity:
- Initialization: O(n + m), where n is number of vertices, m is number of edges
- Message processing: O(avg_degree), average adjacency list size
- Modularity calculation: O(avg_community_neighbors)
- Overall: O(n + m + c*avg_degree), where c is number of communities
Total Iterations:
- Typical graphs: 3-5 iterations for convergence
- Worst case: 20 iterations (default upper limit)
9.2 Space Complexity
Vertex Storage:
- Per vertex: ~200 bytes (communityId, weights, neighbor community map)
- Neighbor community map: O(avg_degree)
Message Storage:
- Per message: ~48 bytes (communityId, edgeWeight, messageType)
- Total messages: O(m)
Overall: O(n + m) space for storing vertices and messages
9.3 Optimization Techniques
- Message Aggregation: Use LouvainMessageCombiner to reduce subsequent processing
- Weight Caching: Cache repeatedly accessed weights during computation
- Conditional Judgment: Avoid unnecessary floating-point operations (e.g., when deltaQ=0)
10. Extension Directions
10.1 Multi-level Implementation
Second Phase of Complete Louvain:
// 1. Collect community statistics in finishIteration @Override public void finishIteration(long iterationId) { // Use global aggregation to collect: // - Number of vertices moved in this iteration // - Modularity contribution of each community // - Global modularity value } // 2. Check if second phase needed if (globalModularityImproved && currentLevel < maxLevels) { // Trigger community aggregation, generate new supergraph // Re-enter optimization phase }10.2 Weighted Graph Optimization
Current modularity formula needs complete community statistics for weighted graphs:
// Complete weighted modularity calculation private double calculateModularityGainWeighted( Object vertexId, LouvainVertexValue vertexValue, Object targetCommunity, LouvainAggregator aggregator) { double m = totalEdgeWeight; double ki = vertexValue.getTotalWeight(); double kiIn = vertexValue.getNeighborCommunityWeights() .getOrDefault(targetCommunity, 0.0); // Get community-level statistics from aggregator LouvainCommunityInfo community = aggregator.getCommunityInfo(targetCommunity); double sigmaTot = community != null ? community.getTotalWeight() : 0.0; double sigmaIn = community != null ? community.getInternalWeight() : 0.0; // Calculate complete ΔQ // ... }10.3 Dynamic Graph Support
Support incremental updates for graphs:
// 1. Check if vertex is newly added in process if (vertex.getValue() == null) { // Special initialization for new vertex initializeNewVertex(vertex); } // 2. Check for edge changes in process List<RowEdge> dynamicEdges = context.loadDynamicEdges(EdgeDirection.BOTH); if (!dynamicEdges.isEmpty()) { // Process newly added or modified edges handleDynamicEdgeChanges(dynamicEdges); }10.4 Visualization Support
Output additional debug information for visualization:
// Extend output type with more information @Override public StructType getOutputType(GraphSchema graphSchema) { return new StructType( new TableField("id", graphSchema.getIdType(), false), new TableField("community", graphSchema.getIdType(), false), new TableField("level", IntegerType.INSTANCE, false), // Community level new TableField("modularity", DoubleType.INSTANCE, false) // Modularity ); }11. Common Issues and Solutions
Issue 1: All Vertices Belong to Single Community
Cause: Graph has dense connections or complete connected subgraph
Verification Method:
Check graph structure: - Number of edges close to vertex count squared? - Shape of minimum spanning tree? Solution:
- Verify if it matches expectations (some graphs indeed are single community)
- Adjust modularity convergence threshold
- Check if weight calculation is correct
Issue 2: Too Many Iterations Without Convergence
Causes:
- Vertex oscillates between two communities
- Modularity gain calculation error
- Iteration limit set too small
Solution:
// Add anti-oscillation mechanism private Set<String> previousStates = new HashSet<>(); private boolean isOscillating(String currentState) { if (previousStates.contains(currentState)) { return true; // Detect state repetition } previousStates.add(currentState); return false; }Issue 3: High Memory Consumption
Causes:
- neighborCommunityWeights map too large (high-degree vertices)
- Temporary data structures not cleaned up timely
- Messages accumulated without merging
Optimization Strategy:
// Periodically clean temporary data @Override public void finishIteration(long iterationId) { if (iterationId % 5 == 0) { // Clean expired data vertexValue.clearNeighborCommunityWeights(); } } // Limit neighbor community map size private static final int MAX_COMMUNITIES_PER_VERTEX = 100; if (neighborCommunityWeights.size() > MAX_COMMUNITIES_PER_VERTEX) { // Keep only top N communities by weight keepTopNCommunities(N); }12. Summary and Best Practices
12.1 Implementation Characteristics
Advantages:
- Clear code structure and comments
- Support both weighted and unweighted graphs
- Include test cases and expected outputs
- Well-integrated with framework
Limitations:
- Current implementation covers first phase only, no community aggregation
- Simplified modularity calculation (sigmaTot and sigmaIn are 0)
- Requires maintaining community-level statistics for complete functionality
12.2 Usage Recommendations
-
Graph Scale:
- Small graphs (<1000 vertices): Use directly, no performance pressure
- Medium graphs (1000-1M): Pay attention to memory usage, adjust parameters
- Large graphs (>1M): Consider distributed execution or sampling
-
Parameter Tuning:
- Default parameters work for most graphs
- Social networks: maxIterations=20, modularity=0.001
- Biological networks: Can increase modularity value to 0.01
-
Performance Optimization:
- Use message merging mechanism
- For high-degree vertices, limit neighbor community count
- Consider sampling or hierarchical sampling for acceleration
Appendix A: Key Classes Detailed API
Louvain.java
public void process()
- Called once per vertex per iteration
- Handle initialization or community optimization
- Responsible for message broadcasting and state updates
public void finish()
- Called after all iterations complete
- Output final community assignment result
- Use context.take() to send output
public StructType getOutputType()
- Define schema of output result
- Return structure containing vertex ID and community ID
- Used for framework validation and storage
LouvainVertexValue.java
public void addNeighborCommunityWeight(Object communityId, double weight)
- Accumulate weight to adjacent community
- Base data for modularity calculation
public void clearNeighborCommunityWeights()
- Clear neighbor community weight map
- Called before each iteration to prepare for new round
LouvainMessage.java
public LouvainMessage(Object communityId, double edgeWeight)
- Create standard community information message
- communityId: Sender's community
- edgeWeight: Weight from sender to receiver
Appendix B: Reference Resources
Standard Louvain Paper:
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.