Skip to content

Code review issues embedding generator with cluster visualization

License

Notifications You must be signed in to change notification settings

ecorkran/embedding-cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Embedding Cluster Analysis

Semantic clustering tool for finding patterns in AI-generated code review feedback.

What it does

  • Embeds text using sentence-transformers (all-MiniLM-L6-v2)
  • Computes pairwise cosine similarity
  • Clusters using three methods: graph-based, agglomerative, HDBSCAN
  • Visualizes with t-SNE projections
  • Includes threshold sweep for parameter tuning

Why

Built to identify duplicate/similar issues in large sets of AI-generated code review comments. Reduces noise, surfaces patterns.

Usage

python main.py "data/tasks.*.md" --cluster-threshold 0.75 --sweep --dump

Tech

Python, NumPy, scikit-learn, HDBSCAN, NetworkX, sentence-transformers, matplotlib

About

Code review issues embedding generator with cluster visualization

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages