A Python library for clustering customer questions using large language models.
Explore the docs »
Report Bug
·
Request Feature
Table of Contents
QCluster is a powerful Python library designed to help you make sense of large volumes of customer feedback. By leveraging the power of Large Language Models (LLMs), QCluster can automatically group similar customer questions, allowing you to identify trends, pain points, and frequently asked questions with ease.
This project provides a complete pipeline for:
- Extracting customer questions from your data sources.
- Generating embeddings for each question.
- Clustering the questions based on their semantic similarity.
- Evaluating the quality of the clustering results.
- Generating insightful reports.
Follow these simple steps to get your local copy of QCluster up and running.
This project was tested on macOS with Apple Silicon, but it should work on other
systems as well.
- Python
3.12+ - uv: A fast Python package installer and resolver.
- ollama: Run large language models locally.
- You will also need the
qwen2.5:3bmodel, but you can configure other models as well.
- You will also need the
- Clone the repo
git clone https://github.com/dbudaghyan/qcluster.git cd qcluster - Set up the environment variables
You can modify the
cp .env.example .env
.envfile to change the default settings. - Install
ollama- Using
Homebrew(on macOS):brew install ollama
- Or download the binary directly from the official website.
- Using
- Pull the LLM model
If you have defined other models in your
ollama pull qwen2.5:3b
.envfile, make sure to pull them as well. - Start the
ollamaserverollama serve
- Install the Python dependencies
uv sync
You can run the clustering pipeline either as a simple Python script or through a Jupyter Notebook for a more interactive experience.
uv run qcluster.pipelineAdd the project root to the Python path:
export PYTHONPATH=$(pwd)Then run Jupyter Lab:
uv run --with jupyter jupyter-labThe notebook is located at notebooks/pipeline.ipynb.
The reports will be saved in the EVALUATION_RESULTS_DIR defined in the .env file.
cd qcluster
cp .env.example .env
# Modify the .env file if needed
brew install ollama
ollama pull qwen2.5:3b
# pull other models if needed (if defined in the .env file)
ollama serve
uv sync
uv run qcluster.pipeline
# or
# export PYTHONPATH=$(pwd)
# uv run --with jupyter jupyter-lab
# and open notebooks/pipeline.ipynb and run the cellsThe reports will be saved in the EVALUATION_RESULTS_DIR defined in the .env file.
See the Evaluation Results for more details on the results
structure.
Distributed under the GPL-2.0 License. See LICENSE for more information.