Skip to content

feat: support louvin algorithm #690

@kitalkuyo-gita

Description

@kitalkuyo-gita

1. Overview

1.1 Algorithm Background

Louvain is an efficient multi-level community detection algorithm that discovers community structures in graphs by optimizing the modularity metric. The algorithm has the following characteristics:

  • Efficiency: Time complexity of O(n log n), suitable for large-scale graph processing
  • Multi-level: Supports multi-resolution community detection
  • Modularity Optimization: Maximizes the modularity metric of graphs
  • Distributed Adaptation: Easy to implement on distributed frameworks

1.2 Implementation Framework

  • AlgorithmUserFunction Interface: Standard user-defined graph algorithm interface
  • Message Passing Model: Supports asynchronous communication between vertices
  • Iterative Execution Framework: Built-in iteration control and convergence detection
  • Distributed Execution Capability: Supports distributed graph computing

1.3 Core Objectives

  • Implement the first phase of the standard Louvain algorithm (local optimization)
  • Support both weighted and unweighted graphs
  • Pass all unit tests and integration tests
  • Provide clear code comments and documentation
  • Reserve interfaces for future multi-level extensions

2. System Architecture

2.1 Core Class Structure

geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/org/apache/geaflow/dsl/udf/graph/ ├── Louvain.java # Main algorithm implementation class ├── LouvainMessage.java # Message passing class ├── LouvainVertexValue.java # Vertex state value class ├── LouvainAggregator.java # Community aggregator class ├── LouvainCommunityInfo.java # Community information class └── LouvainMessageCombiner.java # Message combiner class 

2.2 Algorithm Flow Design

graph TD A[Algorithm Entry] --> B[Iteration 1] B --> B1[Initialize: Each vertex is independent community] B1 --> B2[Broadcast community info to neighbors] B2 --> C[Iteration 2-N] C --> C1[Receive neighbor community information] C1 --> C2[Calculate ΔQ for moving to each community] C2 --> C3[Select optimal community] C3 --> C4{Converged?} C4 -->|No| C5[Update community assignment] C5 --> C2 C4 -->|Yes| D[Output final community result] 
Loading

2.3 Component Interaction Flow

┌─────────────────────────────────────────────────────┐ │ Louvain.process() │ ├─────────────────────────────────────────────────────┤ │ │ │ 1. deserializeVertexValue() ← Get vertex state │ │ 2. loadEdges(BOTH) ← Load neighbor edges │ │ 3. getCurrentIterationId() ← Determine iteration │ │ │ │ IF Iteration 1: │ │ ├─ initializeVertex() │ │ │ ├─ Calculate vertex total weight │ │ │ └─ Broadcast initial community info │ │ └─ updateVertexValue() ← Save state │ │ │ │ ELSE IF Iterating: │ │ ├─ optimizeVertexCommunity() │ │ │ ├─ Aggregate neighbor messages │ │ │ ├─ Calculate ΔQ │ │ │ ├─ Select best community │ │ │ └─ Broadcast update info │ │ └─ updateVertexValue() ← Save state │ │ │ └─────────────────────────────────────────────────────┘ 

3. Data Structure Details

3.1 LouvainVertexValue (Vertex Value)

Stores community information and weight statistics for each vertex:

public class LouvainVertexValue implements Serializable { // Current community ID that this vertex belongs to private Object communityId; // Total weight (degree) of edges connected to this vertex private double totalWeight; // Weight of edges within the community private double internalWeight; // Mapping of neighbor community ID to weight to that community private Map<Object, Double> neighborCommunityWeights; }

Key Field Meanings:

Field Meaning Purpose
communityId Community ID that vertex currently belongs to Identify community membership
totalWeight Total weight of all edges connected to vertex Used as ki in modularity calculation
internalWeight Weight of internal edges within community Community internal cohesion indicator
neighborCommunityWeights Weight mapping to neighboring communities Evaluate candidate community moves

3.2 LouvainMessage (Message Class)

Messages transmitted between vertices to exchange community information and weights:

public class LouvainMessage implements Serializable { // Community ID that the message sender belongs to private Object communityId; // Edge weight from sender to receiver private double edgeWeight; // Message type enumeration private MessageType messageType; // COMMUNITY_INFO or WEIGHT_UPDATE }

Message Flow:

  1. Vertex v initialization: Send LouvainMessage(v.id, 1.0) to all neighbors
  2. Neighbor u reception: Accumulate weight from v to neighborCommunityWeights[v.communityId]
  3. Calculate ΔQ: Evaluate movement benefit based on neighbor community weights
  4. Broadcast update: If community changes, resend new community info to neighbors

3.3 LouvainAggregator (Community Aggregator)

Handles community-level statistics and aggregation:

public class LouvainAggregator { // Community ID to community information mapping private Map<Object, LouvainCommunityInfo> communityMap; // Supergraph edge weight: (community1, community2) -> weight private Map<String, Double> newEdgeWeights; // Supernodes list (aggregated communities) private List<Object> superNodes; // Total edge weight of the graph private double totalEdgeWeight; }

Key Methods:

  • addVertexToCommunity(): Add vertex to community
  • addEdgeBetweenCommunities(): Record edge weight between communities
  • calculateModularityContributions(): Calculate modularity contribution of each community
  • getTotalModularity(): Calculate total modularity

3.4 LouvainCommunityInfo (Community Information)

Encapsulates statistical information for a single community:

public class LouvainCommunityInfo implements Serializable { // Unique identifier of the community private Object communityId; // Set of vertices contained in the community private Set<Object> memberVertices; // Total weight of internal edges within the community private double internalWeight; // Total degree of community members private double totalWeight; // Mapping of edge weight between community and external communities private Map<Object, Double> externalWeights; }

4. Core Algorithm Implementation

4.1 Initialization Phase (Iteration 1)

Purpose: Initialize each vertex as an independent community, calculate basic statistics

Implementation Steps:

private void initializeVertex(RowVertex vertex, LouvainVertexValue vertexValue, List<RowEdge> edges) { // Step 1: Calculate vertex's total weight (degree) double totalWeight = 0.0; for (RowEdge edge : edges) { double weight = getEdgeWeight(edge); // For weighted graphs extract edge weight, for unweighted return 1.0 totalWeight += weight; } // Step 2: Set initial community to vertex itself vertexValue.setCommunityId(vertex.getId()); vertexValue.setTotalWeight(totalWeight); vertexValue.setInternalWeight(0.0); // No internal edges initially // Step 3: Broadcast initial community information to all neighbors sendCommunityInfoToNeighbors(vertex, edges, vertexValue); // Neighbors receive this vertex's community info in iteration 2 }

Key Calculation:

  • Total weight = Σ(weight of edges connected to vertex)
  • For unweighted graphs, each edge weight is 1.0
  • For weighted graphs, extract weight from edge's value field

4.2 Optimization Phase (Iteration 2-N)

Purpose: Move vertices to optimal communities based on modularity gain

Core Logic:

private void optimizeVertexCommunity(RowVertex vertex, LouvainVertexValue vertexValue, List<RowEdge> edges, Iterator<LouvainMessage> messages) { // Step 1: Aggregate community information from neighbors LouvainMessageCombiner combiner = new LouvainMessageCombiner(); Map<Object, Double> aggregatedWeights = combiner.combineMessages(messages); // Result: aggregatedWeights[communityId] = total weight from vertex to that community // Step 2: Iterate through all adjacent communities, calculate modularity gain of moving double maxDeltaQ = 0.0; Object bestCommunity = vertexValue.getCommunityId(); // Default: no move for (Object communityId : aggregatedWeights.keySet()) { double deltaQ = calculateModularityGain(vertex.getId(), vertexValue, communityId, edges); if (deltaQ > maxDeltaQ) { maxDeltaQ = deltaQ; bestCommunity = communityId; } } // Step 3: If better community found, update community assignment if (!bestCommunity.equals(vertexValue.getCommunityId())) { vertexValue.setCommunityId(bestCommunity); // Note: In actual implementation, may need threshold judgment for convergence } // Step 4: Broadcast updated community information to neighbors sendCommunityInfoToNeighbors(vertex, edges, vertexValue); }

4.3 Modularity Gain Calculation

Formula:

ΔQ = [Σin + ki,in / 2m] - [Σtot + ki / 2m]² - [Σin / 2m - (Σtot / 2m)² - (ki / 2m)²] 

Parameter Explanation:

Parameter Meaning
m Total number of edges in graph (for weighted graphs, total weight)
ki Degree of vertex i (for weighted graphs, sum of weights)
ki,in Edge weight from vertex i to target community
Σin Internal edge weight of target community
Σtot Total weight of all vertices in target community

Implementation Code:

private double calculateModularityGain(Object vertexId, LouvainVertexValue vertexValue, Object targetCommunity, List<RowEdge> edges) { // Ensure total edge weight is calculated if (totalEdgeWeight == 0) { for (RowEdge edge : edges) { totalEdgeWeight += getEdgeWeight(edge); } } double m = totalEdgeWeight; double ki = vertexValue.getTotalWeight(); // Vertex total weight double kiIn = vertexValue.getNeighborCommunityWeights() .getOrDefault(targetCommunity, 0.0); // Weight to target community // Simplified implementation: sigmaTot and sigmaIn set to 0 // Full implementation needs to maintain community-level statistics double sigmaTot = 0.0; double sigmaIn = 0.0; if (m == 0) { return 0.0; // Empty graph has no modularity gain } // Calculate ΔQ double a = (kiIn + sigmaIn / (2 * m)) - ((sigmaTot + ki) / (2 * m)) * ((sigmaTot + ki) / (2 * m)); double b = (kiIn / (2 * m)) - (sigmaTot / (2 * m)) * (sigmaTot / (2 * m)) - (ki / (2 * m)) * (ki / (2 * m)); return a - b; }

Note: Current implementation uses simplified modularity calculation. Complete multi-level implementation requires maintaining community-level statistics.

4.4 Message Aggregation (LouvainMessageCombiner)

Purpose: Merge duplicate community information from multiple neighbors, reduce subsequent computation

Implementation:

public class LouvainMessageCombiner { /**  * Merge multiple messages from the same community into a single weight value  */ public Map<Object, Double> combineMessages(Iterator<LouvainMessage> messages) { Map<Object, Double> combined = new HashMap<>(); while (messages.hasNext()) { LouvainMessage msg = messages.next(); Object communityId = msg.getCommunityId(); double weight = msg.getEdgeWeight(); // Accumulate weights with same community ID combined.put(communityId, combined.getOrDefault(communityId, 0.0) + weight); } return combined; } }

Advantages:

  • Reduce iteration count in subsequent forEach loops
  • Avoid duplicate modularity gain calculations for same community
  • Improve computation efficiency for high-degree vertices

4.5 Convergence Detection

Current Implementation:

@Override public void finishIteration(long iterationId) { // Reserved interface for global convergence detection // Full implementation can add: // - Count vertices moved in this iteration // - Global aggregation convergence check // - Voting termination mechanism }

Future Optimization:

Use framework-provided voting termination mechanism:

// If no vertices moved in this iteration, vote for termination if (noVertexMoved) { context.voteToTerminate("CONVERGED", 1); }

5. Integration and Registration

5.1 Framework Registration

Register algorithm in BuildInSqlFunctionTable.java:

// Import import org.apache.geaflow.dsl.udf.graph.Louvain; // Add in UDGA registration section .add(GeaFlowFunction.of(Louvain.class))

5.2 Annotation Marking

Mark algorithm using @Description annotation:

@Description( name = "louvain", description = "built-in udga for Louvain community detection" ) public class Louvain implements AlgorithmUserFunction<Object, LouvainMessage> { // ... }

5.3 SQL Invocation Method

-- Basic invocation (use all default parameters) CALL louvain() YIELD (vid, community) RETURN vid, community; -- Invocation with parameters CALL louvain(10, 0.001, false) YIELD (vid, community) RETURN vid, community; -- Parameters: maxIterations=10, modularity=0.001, isWeighted=false

6. Testing Strategy

6.1 Test File Locations

geaflow/geaflow-dsl/geaflow-dsl-runtime/src/test/ ├── java/org/apache/geaflow/dsl/runtime/query/ │ └── GQLAlgorithmTest.java # Test class └── resources/ ├── query/gql_algorithm_louvain.sql # Test query └── expect/gql_algorithm_louvain.txt # Expected output 

6.2 Test Query (SQL)

-- Create test graph g4 CREATE GRAPH IF NOT EXISTS g4 ( Vertex v4 (vid varchar ID, vvalue int), Edge e4 (srcId varchar SOURCE ID, targetId varchar DESTINATION ID) ); -- Load test data from file INSERT INTO g4.v4(vid, vvalue) SELECT v_id, v_value FROM v_source; INSERT INTO g4.e4(srcId, targetId) SELECT src_id, dst_id FROM e_source; -- Execute Louvain algorithm INSERT INTO tbl_result(v_id, community_id) CALL louvain() YIELD (vid, community) RETURN vid, community;

6.3 Test Data

Vertex Set (test_vertex):

1, 1 2, 2 3, 3 4, 4 5, 5 6, 6 

Edge Set (test_edge):

1-2, 1-3, 1-4, 1-5 2-1, 2-3, 2-5 3-1, 3-2, 3-4, 3-5 4-1, 4-3, 4-6 5-1, 5-2, 5-3, 5-6 6-4, 6-5 

Graph Characteristics:

  • 6 vertices, 9 edges (undirected)
  • Dense connectivity topology
  • Forms single community structure

6.4 Expected Output

For the test graph above, all vertices are expected to belong to the same community (community ID=1):

1,1 5,1 3,1 4,1 2,1 6,1 

6.5 Test Execution

Run Tests:

# Run single test method mvn test -pl geaflow/geaflow-dsl/geaflow-dsl-runtime \ -Dtest=GQLAlgorithmTest#testAlgorithmLouvain # Run all algorithm tests mvn test -pl geaflow/geaflow-dsl/geaflow-dsl-runtime \ -Dtest=GQLAlgorithmTest

7. Parameter Details

7.1 Initialization Parameters

public void init(AlgorithmRuntimeContext<Object, LouvainMessage> context, Object[] parameters)

Parameter Description:

Position Parameter Name Type Default Description
0 maxIterations int 20 Maximum number of iterations
1 modularity double 0.001 Modularity convergence threshold
2 minCommunitySize int 1 Minimum community size
3 isWeighted boolean false Whether graph is weighted

Usage Examples:

-- Default parameters CALL louvain(); -- Custom maximum iterations CALL louvain(30); -- Custom iterations and convergence threshold CALL louvain(30, 0.0001); -- Full parameters CALL louvain(30, 0.0001, 2, true);

7.2 Edge Weight Handling

Unweighted Graphs (isWeighted=false):

  • All edge weights default to 1.0
  • Whether edges have value field doesn't affect calculation

Weighted Graphs (isWeighted=true):

  • Extract weight from edge's value field
  • Support Double type weight values
  • Fallback to 1.0 if extraction fails

8. Code Standards and Quality

8.1 Coding Conventions

  1. Comment Standards:

    • Classes and methods use JavaDoc format
    • Add inline comments for complex logic
    • Keep comments in English
  2. Naming Conventions:

    • Class names use PascalCase (LouvainVertexValue)
    • Method names use camelCase (calculateModularityGain)
    • Constants use UPPER_CASE (COMMUNITY_INFO)
  3. Code Style:

    • Follow Apache code style
    • Use 4-space indentation
    • Single line length not exceeding 100 characters

8.2 Serialization and Deserialization

Vertex Value Serialization:

// Serialize to Row object (for storage) private Row serializeVertexValue(LouvainVertexValue value) { return ObjectRow.create( value.getCommunityId(), // Field 0: Community ID value.getTotalWeight(), // Field 1: Total weight value.getInternalWeight() // Field 2: Internal weight ); } // Deserialize from Row object private LouvainVertexValue deserializeVertexValue(Row row) { Object communityId = row.getField(0, ObjectType.INSTANCE); Object totalWeightObj = row.getField(1, DoubleType.INSTANCE); Object internalWeightObj = row.getField(2, DoubleType.INSTANCE); double totalWeight = totalWeightObj instanceof Number ? ((Number) totalWeightObj).doubleValue() : 0.0; double internalWeight = internalWeightObj instanceof Number ? ((Number) internalWeightObj).doubleValue() : 0.0; LouvainVertexValue value = new LouvainVertexValue(); value.setCommunityId(communityId); value.setTotalWeight(totalWeight); value.setInternalWeight(internalWeight); return value; }

8.3 Exception Handling

Edge Weight Extraction Fault Tolerance:

private double getEdgeWeight(RowEdge edge) { if (isWeighted) { try { Row value = edge.getValue(); if (value != null) { Object weightObj = value.getField(0, ObjectType.INSTANCE); if (weightObj instanceof Number) { return ((Number) weightObj).doubleValue(); } } } catch (Exception e) { // If extraction fails, silently fallback to default weight // This improves robustness } } return 1.0; // Default weight }

9. Performance Characteristics

9.1 Time Complexity

Single Iteration Time Complexity:

  • Initialization: O(n + m), where n is number of vertices, m is number of edges
  • Message processing: O(avg_degree), average adjacency list size
  • Modularity calculation: O(avg_community_neighbors)
  • Overall: O(n + m + c*avg_degree), where c is number of communities

Total Iterations:

  • Typical graphs: 3-5 iterations for convergence
  • Worst case: 20 iterations (default upper limit)

9.2 Space Complexity

Vertex Storage:

  • Per vertex: ~200 bytes (communityId, weights, neighbor community map)
  • Neighbor community map: O(avg_degree)

Message Storage:

  • Per message: ~48 bytes (communityId, edgeWeight, messageType)
  • Total messages: O(m)

Overall: O(n + m) space for storing vertices and messages

9.3 Optimization Techniques

  1. Message Aggregation: Use LouvainMessageCombiner to reduce subsequent processing
  2. Weight Caching: Cache repeatedly accessed weights during computation
  3. Conditional Judgment: Avoid unnecessary floating-point operations (e.g., when deltaQ=0)

10. Extension Directions

10.1 Multi-level Implementation

Second Phase of Complete Louvain:

// 1. Collect community statistics in finishIteration @Override public void finishIteration(long iterationId) { // Use global aggregation to collect: // - Number of vertices moved in this iteration // - Modularity contribution of each community // - Global modularity value } // 2. Check if second phase needed if (globalModularityImproved && currentLevel < maxLevels) { // Trigger community aggregation, generate new supergraph // Re-enter optimization phase }

10.2 Weighted Graph Optimization

Current modularity formula needs complete community statistics for weighted graphs:

// Complete weighted modularity calculation private double calculateModularityGainWeighted( Object vertexId, LouvainVertexValue vertexValue, Object targetCommunity, LouvainAggregator aggregator) { double m = totalEdgeWeight; double ki = vertexValue.getTotalWeight(); double kiIn = vertexValue.getNeighborCommunityWeights() .getOrDefault(targetCommunity, 0.0); // Get community-level statistics from aggregator LouvainCommunityInfo community = aggregator.getCommunityInfo(targetCommunity); double sigmaTot = community != null ? community.getTotalWeight() : 0.0; double sigmaIn = community != null ? community.getInternalWeight() : 0.0; // Calculate complete ΔQ // ... }

10.3 Dynamic Graph Support

Support incremental updates for graphs:

// 1. Check if vertex is newly added in process if (vertex.getValue() == null) { // Special initialization for new vertex initializeNewVertex(vertex); } // 2. Check for edge changes in process List<RowEdge> dynamicEdges = context.loadDynamicEdges(EdgeDirection.BOTH); if (!dynamicEdges.isEmpty()) { // Process newly added or modified edges handleDynamicEdgeChanges(dynamicEdges); }

10.4 Visualization Support

Output additional debug information for visualization:

// Extend output type with more information @Override public StructType getOutputType(GraphSchema graphSchema) { return new StructType( new TableField("id", graphSchema.getIdType(), false), new TableField("community", graphSchema.getIdType(), false), new TableField("level", IntegerType.INSTANCE, false), // Community level new TableField("modularity", DoubleType.INSTANCE, false) // Modularity ); }

11. Common Issues and Solutions

Issue 1: All Vertices Belong to Single Community

Cause: Graph has dense connections or complete connected subgraph

Verification Method:

Check graph structure: - Number of edges close to vertex count squared? - Shape of minimum spanning tree? 

Solution:

  • Verify if it matches expectations (some graphs indeed are single community)
  • Adjust modularity convergence threshold
  • Check if weight calculation is correct

Issue 2: Too Many Iterations Without Convergence

Causes:

  1. Vertex oscillates between two communities
  2. Modularity gain calculation error
  3. Iteration limit set too small

Solution:

// Add anti-oscillation mechanism private Set<String> previousStates = new HashSet<>(); private boolean isOscillating(String currentState) { if (previousStates.contains(currentState)) { return true; // Detect state repetition } previousStates.add(currentState); return false; }

Issue 3: High Memory Consumption

Causes:

  1. neighborCommunityWeights map too large (high-degree vertices)
  2. Temporary data structures not cleaned up timely
  3. Messages accumulated without merging

Optimization Strategy:

// Periodically clean temporary data @Override public void finishIteration(long iterationId) { if (iterationId % 5 == 0) { // Clean expired data vertexValue.clearNeighborCommunityWeights(); } } // Limit neighbor community map size private static final int MAX_COMMUNITIES_PER_VERTEX = 100; if (neighborCommunityWeights.size() > MAX_COMMUNITIES_PER_VERTEX) { // Keep only top N communities by weight keepTopNCommunities(N); }

12. Summary and Best Practices

12.1 Implementation Characteristics

Advantages:

  • Clear code structure and comments
  • Support both weighted and unweighted graphs
  • Include test cases and expected outputs
  • Well-integrated with framework

Limitations:

  • Current implementation covers first phase only, no community aggregation
  • Simplified modularity calculation (sigmaTot and sigmaIn are 0)
  • Requires maintaining community-level statistics for complete functionality

12.2 Usage Recommendations

  1. Graph Scale:

    • Small graphs (<1000 vertices): Use directly, no performance pressure
    • Medium graphs (1000-1M): Pay attention to memory usage, adjust parameters
    • Large graphs (>1M): Consider distributed execution or sampling
  2. Parameter Tuning:

    • Default parameters work for most graphs
    • Social networks: maxIterations=20, modularity=0.001
    • Biological networks: Can increase modularity value to 0.01
  3. Performance Optimization:

    • Use message merging mechanism
    • For high-degree vertices, limit neighbor community count
    • Consider sampling or hierarchical sampling for acceleration

Appendix A: Key Classes Detailed API

Louvain.java

public void process()

  • Called once per vertex per iteration
  • Handle initialization or community optimization
  • Responsible for message broadcasting and state updates

public void finish()

  • Called after all iterations complete
  • Output final community assignment result
  • Use context.take() to send output

public StructType getOutputType()

  • Define schema of output result
  • Return structure containing vertex ID and community ID
  • Used for framework validation and storage

LouvainVertexValue.java

public void addNeighborCommunityWeight(Object communityId, double weight)

  • Accumulate weight to adjacent community
  • Base data for modularity calculation

public void clearNeighborCommunityWeights()

  • Clear neighbor community weight map
  • Called before each iteration to prepare for new round

LouvainMessage.java

public LouvainMessage(Object communityId, double edgeWeight)

  • Create standard community information message
  • communityId: Sender's community
  • edgeWeight: Weight from sender to receiver

Appendix B: Reference Resources

Standard Louvain Paper:

Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions