-
Notifications
You must be signed in to change notification settings - Fork 154
Description
1. Overview
1.1 Algorithm Background
Louvain is an efficient multi-level community detection algorithm that discovers community structures in graphs by optimizing the modularity metric. The algorithm has the following characteristics:
- Efficiency: Time complexity of O(n log n), suitable for large-scale graph processing
- Multi-level: Supports multi-resolution community detection
- Modularity Optimization: Maximizes the modularity metric of graphs
- Distributed Adaptation: Easy to implement on distributed frameworks
1.2 Implementation Framework
- AlgorithmUserFunction Interface: Standard user-defined graph algorithm interface
- Message Passing Model: Supports asynchronous communication between vertices
- Iterative Execution Framework: Built-in iteration control and convergence detection
- Distributed Execution Capability: Supports distributed graph computing
1.3 Core Objectives
- Implement the first phase of the standard Louvain algorithm (local optimization)
- Support both weighted and unweighted graphs
- Pass all unit tests and integration tests
- Provide clear code comments and documentation
- Reserve interfaces for future multi-level extensions
2. System Architecture
2.1 Core Class Structure
geaflow/geaflow-dsl/geaflow-dsl-plan/src/main/java/org/apache/geaflow/dsl/udf/graph/
├── Louvain.java # Main algorithm implementation class
├── LouvainMessage.java # Message passing class
├── LouvainVertexValue.java # Vertex state value class
├── LouvainAggregator.java # Community aggregator class
├── LouvainCommunityInfo.java # Community information class
└── LouvainMessageCombiner.java # Message combiner class
2.2 Algorithm Flow Design
graph TD
A[Algorithm Entry] --> B[Iteration 1]
B --> B1[Initialize: Each vertex is independent community]
B1 --> B2[Broadcast community info to neighbors]
B2 --> C[Iteration 2-N]
C --> C1[Receive neighbor community information]
C1 --> C2[Calculate ΔQ for moving to each community]
C2 --> C3[Select optimal community]
C3 --> C4{Converged?}
C4 -->|No| C5[Update community assignment]
C5 --> C2
C4 -->|Yes| D[Output final community result]
2.3 Component Interaction Flow
┌─────────────────────────────────────────────────────┐
│ Louvain.process() │
├─────────────────────────────────────────────────────┤
│ │
│ 1. deserializeVertexValue() ← Get vertex state │
│ 2. loadEdges(BOTH) ← Load neighbor edges │
│ 3. getCurrentIterationId() ← Determine iteration │
│ │
│ IF Iteration 1: │
│ ├─ initializeVertex() │
│ │ ├─ Calculate vertex total weight │
│ │ └─ Broadcast initial community info │
│ └─ updateVertexValue() ← Save state │
│ │
│ ELSE IF Iterating: │
│ ├─ optimizeVertexCommunity() │
│ │ ├─ Aggregate neighbor messages │
│ │ ├─ Calculate ΔQ │
│ │ ├─ Select best community │
│ │ └─ Broadcast update info │
│ └─ updateVertexValue() ← Save state │
│ │
└─────────────────────────────────────────────────────┘
3. Data Structure Details
3.1 LouvainVertexValue (Vertex Value)
Stores community information and weight statistics for each vertex:
public class LouvainVertexValue implements Serializable {
// Current community ID that this vertex belongs to
private Object communityId;
// Total weight (degree) of edges connected to this vertex
private double totalWeight;
// Weight of edges within the community
private double internalWeight;
// Mapping of neighbor community ID to weight to that community
private Map<Object, Double> neighborCommunityWeights;
}Key Field Meanings:
| Field | Meaning | Purpose |
|---|---|---|
communityId |
Community ID that vertex currently belongs to | Identify community membership |
totalWeight |
Total weight of all edges connected to vertex | Used as ki in modularity calculation |
internalWeight |
Weight of internal edges within community | Community internal cohesion indicator |
neighborCommunityWeights |
Weight mapping to neighboring communities | Evaluate candidate community moves |
3.2 LouvainMessage (Message Class)
Messages transmitted between vertices to exchange community information and weights:
public class LouvainMessage implements Serializable {
// Community ID that the message sender belongs to
private Object communityId;
// Edge weight from sender to receiver
private double edgeWeight;
// Message type enumeration
private MessageType messageType; // COMMUNITY_INFO or WEIGHT_UPDATE
}Message Flow:
- Vertex v initialization: Send
LouvainMessage(v.id, 1.0)to all neighbors - Neighbor u reception: Accumulate weight from v to
neighborCommunityWeights[v.communityId] - Calculate ΔQ: Evaluate movement benefit based on neighbor community weights
- Broadcast update: If community changes, resend new community info to neighbors
3.3 LouvainAggregator (Community Aggregator)
Handles community-level statistics and aggregation:
public class LouvainAggregator {
// Community ID to community information mapping
private Map<Object, LouvainCommunityInfo> communityMap;
// Supergraph edge weight: (community1, community2) -> weight
private Map<String, Double> newEdgeWeights;
// Supernodes list (aggregated communities)
private List<Object> superNodes;
// Total edge weight of the graph
private double totalEdgeWeight;
}Key Methods:
addVertexToCommunity(): Add vertex to communityaddEdgeBetweenCommunities(): Record edge weight between communitiescalculateModularityContributions(): Calculate modularity contribution of each communitygetTotalModularity(): Calculate total modularity
3.4 LouvainCommunityInfo (Community Information)
Encapsulates statistical information for a single community:
public class LouvainCommunityInfo implements Serializable {
// Unique identifier of the community
private Object communityId;
// Set of vertices contained in the community
private Set<Object> memberVertices;
// Total weight of internal edges within the community
private double internalWeight;
// Total degree of community members
private double totalWeight;
// Mapping of edge weight between community and external communities
private Map<Object, Double> externalWeights;
}4. Core Algorithm Implementation
4.1 Initialization Phase (Iteration 1)
Purpose: Initialize each vertex as an independent community, calculate basic statistics
Implementation Steps:
private void initializeVertex(RowVertex vertex, LouvainVertexValue vertexValue,
List<RowEdge> edges) {
// Step 1: Calculate vertex's total weight (degree)
double totalWeight = 0.0;
for (RowEdge edge : edges) {
double weight = getEdgeWeight(edge); // For weighted graphs extract edge weight, for unweighted return 1.0
totalWeight += weight;
}
// Step 2: Set initial community to vertex itself
vertexValue.setCommunityId(vertex.getId());
vertexValue.setTotalWeight(totalWeight);
vertexValue.setInternalWeight(0.0); // No internal edges initially
// Step 3: Broadcast initial community information to all neighbors
sendCommunityInfoToNeighbors(vertex, edges, vertexValue);
// Neighbors receive this vertex's community info in iteration 2
}Key Calculation:
- Total weight = Σ(weight of edges connected to vertex)
- For unweighted graphs, each edge weight is 1.0
- For weighted graphs, extract weight from edge's value field
4.2 Optimization Phase (Iteration 2-N)
Purpose: Move vertices to optimal communities based on modularity gain
Core Logic:
private void optimizeVertexCommunity(RowVertex vertex, LouvainVertexValue vertexValue,
List<RowEdge> edges,
Iterator<LouvainMessage> messages) {
// Step 1: Aggregate community information from neighbors
LouvainMessageCombiner combiner = new LouvainMessageCombiner();
Map<Object, Double> aggregatedWeights = combiner.combineMessages(messages);
// Result: aggregatedWeights[communityId] = total weight from vertex to that community
// Step 2: Iterate through all adjacent communities, calculate modularity gain of moving
double maxDeltaQ = 0.0;
Object bestCommunity = vertexValue.getCommunityId(); // Default: no move
for (Object communityId : aggregatedWeights.keySet()) {
double deltaQ = calculateModularityGain(vertex.getId(), vertexValue,
communityId, edges);
if (deltaQ > maxDeltaQ) {
maxDeltaQ = deltaQ;
bestCommunity = communityId;
}
}
// Step 3: If better community found, update community assignment
if (!bestCommunity.equals(vertexValue.getCommunityId())) {
vertexValue.setCommunityId(bestCommunity);
// Note: In actual implementation, may need threshold judgment for convergence
}
// Step 4: Broadcast updated community information to neighbors
sendCommunityInfoToNeighbors(vertex, edges, vertexValue);
}4.3 Modularity Gain Calculation
Formula:
ΔQ = [Σin + ki,in / 2m] - [Σtot + ki / 2m]²
- [Σin / 2m - (Σtot / 2m)² - (ki / 2m)²]
Parameter Explanation:
| Parameter | Meaning |
|---|---|
| m | Total number of edges in graph (for weighted graphs, total weight) |
| ki | Degree of vertex i (for weighted graphs, sum of weights) |
| ki,in | Edge weight from vertex i to target community |
| Σin | Internal edge weight of target community |
| Σtot | Total weight of all vertices in target community |
Implementation Code:
private double calculateModularityGain(Object vertexId, LouvainVertexValue vertexValue,
Object targetCommunity, List<RowEdge> edges) {
// Ensure total edge weight is calculated
if (totalEdgeWeight == 0) {
for (RowEdge edge : edges) {
totalEdgeWeight += getEdgeWeight(edge);
}
}
double m = totalEdgeWeight;
double ki = vertexValue.getTotalWeight(); // Vertex total weight
double kiIn = vertexValue.getNeighborCommunityWeights()
.getOrDefault(targetCommunity, 0.0); // Weight to target community
// Simplified implementation: sigmaTot and sigmaIn set to 0
// Full implementation needs to maintain community-level statistics
double sigmaTot = 0.0;
double sigmaIn = 0.0;
if (m == 0) {
return 0.0; // Empty graph has no modularity gain
}
// Calculate ΔQ
double a = (kiIn + sigmaIn / (2 * m)) -
((sigmaTot + ki) / (2 * m)) * ((sigmaTot + ki) / (2 * m));
double b = (kiIn / (2 * m)) -
(sigmaTot / (2 * m)) * (sigmaTot / (2 * m)) -
(ki / (2 * m)) * (ki / (2 * m));
return a - b;
}Note: Current implementation uses simplified modularity calculation. Complete multi-level implementation requires maintaining community-level statistics.
4.4 Message Aggregation (LouvainMessageCombiner)
Purpose: Merge duplicate community information from multiple neighbors, reduce subsequent computation
Implementation:
public class LouvainMessageCombiner {
/**
* Merge multiple messages from the same community into a single weight value
*/
public Map<Object, Double> combineMessages(Iterator<LouvainMessage> messages) {
Map<Object, Double> combined = new HashMap<>();
while (messages.hasNext()) {
LouvainMessage msg = messages.next();
Object communityId = msg.getCommunityId();
double weight = msg.getEdgeWeight();
// Accumulate weights with same community ID
combined.put(communityId,
combined.getOrDefault(communityId, 0.0) + weight);
}
return combined;
}
}Advantages:
- Reduce iteration count in subsequent forEach loops
- Avoid duplicate modularity gain calculations for same community
- Improve computation efficiency for high-degree vertices
4.5 Convergence Detection
Current Implementation:
@Override
public void finishIteration(long iterationId) {
// Reserved interface for global convergence detection
// Full implementation can add:
// - Count vertices moved in this iteration
// - Global aggregation convergence check
// - Voting termination mechanism
}Future Optimization:
Use framework-provided voting termination mechanism:
// If no vertices moved in this iteration, vote for termination
if (noVertexMoved) {
context.voteToTerminate("CONVERGED", 1);
}5. Integration and Registration
5.1 Framework Registration
Register algorithm in BuildInSqlFunctionTable.java:
// Import
import org.apache.geaflow.dsl.udf.graph.Louvain;
// Add in UDGA registration section
.add(GeaFlowFunction.of(Louvain.class))5.2 Annotation Marking
Mark algorithm using @Description annotation:
@Description(
name = "louvain",
description = "built-in udga for Louvain community detection"
)
public class Louvain implements AlgorithmUserFunction<Object, LouvainMessage> {
// ...
}5.3 SQL Invocation Method
-- Basic invocation (use all default parameters)
CALL louvain() YIELD (vid, community) RETURN vid, community;
-- Invocation with parameters
CALL louvain(10, 0.001, false)
YIELD (vid, community) RETURN vid, community;
-- Parameters: maxIterations=10, modularity=0.001, isWeighted=false6. Testing Strategy
6.1 Test File Locations
geaflow/geaflow-dsl/geaflow-dsl-runtime/src/test/
├── java/org/apache/geaflow/dsl/runtime/query/
│ └── GQLAlgorithmTest.java # Test class
└── resources/
├── query/gql_algorithm_louvain.sql # Test query
└── expect/gql_algorithm_louvain.txt # Expected output
6.2 Test Query (SQL)
-- Create test graph g4
CREATE GRAPH IF NOT EXISTS g4 (
Vertex v4 (vid varchar ID, vvalue int),
Edge e4 (srcId varchar SOURCE ID, targetId varchar DESTINATION ID)
);
-- Load test data from file
INSERT INTO g4.v4(vid, vvalue)
SELECT v_id, v_value FROM v_source;
INSERT INTO g4.e4(srcId, targetId)
SELECT src_id, dst_id FROM e_source;
-- Execute Louvain algorithm
INSERT INTO tbl_result(v_id, community_id)
CALL louvain() YIELD (vid, community)
RETURN vid, community;6.3 Test Data
Vertex Set (test_vertex):
1, 1
2, 2
3, 3
4, 4
5, 5
6, 6
Edge Set (test_edge):
1-2, 1-3, 1-4, 1-5
2-1, 2-3, 2-5
3-1, 3-2, 3-4, 3-5
4-1, 4-3, 4-6
5-1, 5-2, 5-3, 5-6
6-4, 6-5
Graph Characteristics:
- 6 vertices, 9 edges (undirected)
- Dense connectivity topology
- Forms single community structure
6.4 Expected Output
For the test graph above, all vertices are expected to belong to the same community (community ID=1):
1,1
5,1
3,1
4,1
2,1
6,1
6.5 Test Execution
Run Tests:
# Run single test method
mvn test -pl geaflow/geaflow-dsl/geaflow-dsl-runtime \
-Dtest=GQLAlgorithmTest#testAlgorithmLouvain
# Run all algorithm tests
mvn test -pl geaflow/geaflow-dsl/geaflow-dsl-runtime \
-Dtest=GQLAlgorithmTest7. Parameter Details
7.1 Initialization Parameters
public void init(AlgorithmRuntimeContext<Object, LouvainMessage> context,
Object[] parameters)Parameter Description:
| Position | Parameter Name | Type | Default | Description |
|---|---|---|---|---|
| 0 | maxIterations | int | 20 | Maximum number of iterations |
| 1 | modularity | double | 0.001 | Modularity convergence threshold |
| 2 | minCommunitySize | int | 1 | Minimum community size |
| 3 | isWeighted | boolean | false | Whether graph is weighted |
Usage Examples:
-- Default parameters
CALL louvain();
-- Custom maximum iterations
CALL louvain(30);
-- Custom iterations and convergence threshold
CALL louvain(30, 0.0001);
-- Full parameters
CALL louvain(30, 0.0001, 2, true);7.2 Edge Weight Handling
Unweighted Graphs (isWeighted=false):
- All edge weights default to 1.0
- Whether edges have value field doesn't affect calculation
Weighted Graphs (isWeighted=true):
- Extract weight from edge's value field
- Support Double type weight values
- Fallback to 1.0 if extraction fails
8. Code Standards and Quality
8.1 Coding Conventions
-
Comment Standards:
- Classes and methods use JavaDoc format
- Add inline comments for complex logic
- Keep comments in English
-
Naming Conventions:
- Class names use PascalCase (LouvainVertexValue)
- Method names use camelCase (calculateModularityGain)
- Constants use UPPER_CASE (COMMUNITY_INFO)
-
Code Style:
- Follow Apache code style
- Use 4-space indentation
- Single line length not exceeding 100 characters
8.2 Serialization and Deserialization
Vertex Value Serialization:
// Serialize to Row object (for storage)
private Row serializeVertexValue(LouvainVertexValue value) {
return ObjectRow.create(
value.getCommunityId(), // Field 0: Community ID
value.getTotalWeight(), // Field 1: Total weight
value.getInternalWeight() // Field 2: Internal weight
);
}
// Deserialize from Row object
private LouvainVertexValue deserializeVertexValue(Row row) {
Object communityId = row.getField(0, ObjectType.INSTANCE);
Object totalWeightObj = row.getField(1, DoubleType.INSTANCE);
Object internalWeightObj = row.getField(2, DoubleType.INSTANCE);
double totalWeight = totalWeightObj instanceof Number
? ((Number) totalWeightObj).doubleValue() : 0.0;
double internalWeight = internalWeightObj instanceof Number
? ((Number) internalWeightObj).doubleValue() : 0.0;
LouvainVertexValue value = new LouvainVertexValue();
value.setCommunityId(communityId);
value.setTotalWeight(totalWeight);
value.setInternalWeight(internalWeight);
return value;
}8.3 Exception Handling
Edge Weight Extraction Fault Tolerance:
private double getEdgeWeight(RowEdge edge) {
if (isWeighted) {
try {
Row value = edge.getValue();
if (value != null) {
Object weightObj = value.getField(0, ObjectType.INSTANCE);
if (weightObj instanceof Number) {
return ((Number) weightObj).doubleValue();
}
}
} catch (Exception e) {
// If extraction fails, silently fallback to default weight
// This improves robustness
}
}
return 1.0; // Default weight
}9. Performance Characteristics
9.1 Time Complexity
Single Iteration Time Complexity:
- Initialization: O(n + m), where n is number of vertices, m is number of edges
- Message processing: O(avg_degree), average adjacency list size
- Modularity calculation: O(avg_community_neighbors)
- Overall: O(n + m + c*avg_degree), where c is number of communities
Total Iterations:
- Typical graphs: 3-5 iterations for convergence
- Worst case: 20 iterations (default upper limit)
9.2 Space Complexity
Vertex Storage:
- Per vertex: ~200 bytes (communityId, weights, neighbor community map)
- Neighbor community map: O(avg_degree)
Message Storage:
- Per message: ~48 bytes (communityId, edgeWeight, messageType)
- Total messages: O(m)
Overall: O(n + m) space for storing vertices and messages
9.3 Optimization Techniques
- Message Aggregation: Use LouvainMessageCombiner to reduce subsequent processing
- Weight Caching: Cache repeatedly accessed weights during computation
- Conditional Judgment: Avoid unnecessary floating-point operations (e.g., when deltaQ=0)
10. Extension Directions
10.1 Multi-level Implementation
Second Phase of Complete Louvain:
// 1. Collect community statistics in finishIteration
@Override
public void finishIteration(long iterationId) {
// Use global aggregation to collect:
// - Number of vertices moved in this iteration
// - Modularity contribution of each community
// - Global modularity value
}
// 2. Check if second phase needed
if (globalModularityImproved && currentLevel < maxLevels) {
// Trigger community aggregation, generate new supergraph
// Re-enter optimization phase
}10.2 Weighted Graph Optimization
Current modularity formula needs complete community statistics for weighted graphs:
// Complete weighted modularity calculation
private double calculateModularityGainWeighted(
Object vertexId,
LouvainVertexValue vertexValue,
Object targetCommunity,
LouvainAggregator aggregator) {
double m = totalEdgeWeight;
double ki = vertexValue.getTotalWeight();
double kiIn = vertexValue.getNeighborCommunityWeights()
.getOrDefault(targetCommunity, 0.0);
// Get community-level statistics from aggregator
LouvainCommunityInfo community = aggregator.getCommunityInfo(targetCommunity);
double sigmaTot = community != null ? community.getTotalWeight() : 0.0;
double sigmaIn = community != null ? community.getInternalWeight() : 0.0;
// Calculate complete ΔQ
// ...
}10.3 Dynamic Graph Support
Support incremental updates for graphs:
// 1. Check if vertex is newly added in process
if (vertex.getValue() == null) {
// Special initialization for new vertex
initializeNewVertex(vertex);
}
// 2. Check for edge changes in process
List<RowEdge> dynamicEdges = context.loadDynamicEdges(EdgeDirection.BOTH);
if (!dynamicEdges.isEmpty()) {
// Process newly added or modified edges
handleDynamicEdgeChanges(dynamicEdges);
}10.4 Visualization Support
Output additional debug information for visualization:
// Extend output type with more information
@Override
public StructType getOutputType(GraphSchema graphSchema) {
return new StructType(
new TableField("id", graphSchema.getIdType(), false),
new TableField("community", graphSchema.getIdType(), false),
new TableField("level", IntegerType.INSTANCE, false), // Community level
new TableField("modularity", DoubleType.INSTANCE, false) // Modularity
);
}11. Common Issues and Solutions
Issue 1: All Vertices Belong to Single Community
Cause: Graph has dense connections or complete connected subgraph
Verification Method:
Check graph structure:
- Number of edges close to vertex count squared?
- Shape of minimum spanning tree?
Solution:
- Verify if it matches expectations (some graphs indeed are single community)
- Adjust modularity convergence threshold
- Check if weight calculation is correct
Issue 2: Too Many Iterations Without Convergence
Causes:
- Vertex oscillates between two communities
- Modularity gain calculation error
- Iteration limit set too small
Solution:
// Add anti-oscillation mechanism
private Set<String> previousStates = new HashSet<>();
private boolean isOscillating(String currentState) {
if (previousStates.contains(currentState)) {
return true; // Detect state repetition
}
previousStates.add(currentState);
return false;
}Issue 3: High Memory Consumption
Causes:
- neighborCommunityWeights map too large (high-degree vertices)
- Temporary data structures not cleaned up timely
- Messages accumulated without merging
Optimization Strategy:
// Periodically clean temporary data
@Override
public void finishIteration(long iterationId) {
if (iterationId % 5 == 0) {
// Clean expired data
vertexValue.clearNeighborCommunityWeights();
}
}
// Limit neighbor community map size
private static final int MAX_COMMUNITIES_PER_VERTEX = 100;
if (neighborCommunityWeights.size() > MAX_COMMUNITIES_PER_VERTEX) {
// Keep only top N communities by weight
keepTopNCommunities(N);
}12. Summary and Best Practices
12.1 Implementation Characteristics
Advantages:
- Clear code structure and comments
- Support both weighted and unweighted graphs
- Include test cases and expected outputs
- Well-integrated with framework
Limitations:
- Current implementation covers first phase only, no community aggregation
- Simplified modularity calculation (sigmaTot and sigmaIn are 0)
- Requires maintaining community-level statistics for complete functionality
12.2 Usage Recommendations
-
Graph Scale:
- Small graphs (<1000 vertices): Use directly, no performance pressure
- Medium graphs (1000-1M): Pay attention to memory usage, adjust parameters
- Large graphs (>1M): Consider distributed execution or sampling
-
Parameter Tuning:
- Default parameters work for most graphs
- Social networks: maxIterations=20, modularity=0.001
- Biological networks: Can increase modularity value to 0.01
-
Performance Optimization:
- Use message merging mechanism
- For high-degree vertices, limit neighbor community count
- Consider sampling or hierarchical sampling for acceleration
Appendix A: Key Classes Detailed API
Louvain.java
public void process()
- Called once per vertex per iteration
- Handle initialization or community optimization
- Responsible for message broadcasting and state updates
public void finish()
- Called after all iterations complete
- Output final community assignment result
- Use context.take() to send output
public StructType getOutputType()
- Define schema of output result
- Return structure containing vertex ID and community ID
- Used for framework validation and storage
LouvainVertexValue.java
public void addNeighborCommunityWeight(Object communityId, double weight)
- Accumulate weight to adjacent community
- Base data for modularity calculation
public void clearNeighborCommunityWeights()
- Clear neighbor community weight map
- Called before each iteration to prepare for new round
LouvainMessage.java
public LouvainMessage(Object communityId, double edgeWeight)
- Create standard community information message
- communityId: Sender's community
- edgeWeight: Weight from sender to receiver
Appendix B: Reference Resources
Standard Louvain Paper:
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008).
Fast unfolding of communities in large networks.
Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.