Embedding Cluster Analysis

Semantic clustering tool for finding patterns in AI-generated code review feedback.

What it does

Built to identify duplicate/similar issues in large sets of AI-generated code review comments. Reduces noise, surfaces patterns.

python main.py "data/tasks.*.md" --cluster-threshold 0.75 --sweep --dump

Python, NumPy, scikit-learn, HDBSCAN, NetworkX, sentence-transformers, matplotlib

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
project-documents		project-documents
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clusters.py		clusters.py
dump_utils.py		dump_utils.py
embeddings.py		embeddings.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock