Skip to content

dbudaghyan/qcluster

Repository files navigation

QCluster

A Python library for clustering customer questions using large language models.
Explore the docs »

Report Bug · Request Feature

Stars Forks Issues Contributors License
Flake8 CodeFactor Coverage Status Last Commit Repo Size Python Version
Commit Activity Top Language PRs Welcome

Table of Contents
  1. 🎯 About The Project
  2. 🚀 Getting Started
  3. ▶️ Usage
  4. 📄 License

🎯 About The Project

QCluster is a powerful Python library designed to help you make sense of large volumes of customer feedback. By leveraging the power of Large Language Models (LLMs), QCluster can automatically group similar customer questions, allowing you to identify trends, pain points, and frequently asked questions with ease.

This project provides a complete pipeline for:

  • Extracting customer questions from your data sources.
  • Generating embeddings for each question.
  • Clustering the questions based on their semantic similarity.
  • Evaluating the quality of the clustering results.
  • Generating insightful reports.

🚀 Getting Started

Follow these simple steps to get your local copy of QCluster up and running.

🛠️ Prerequisites

This project was tested on macOS with Apple Silicon, but it should work on other systems as well.

  • Python 3.12+
  • uv: A fast Python package installer and resolver.
  • ollama: Run large language models locally.
    • You will also need the qwen2.5:3b model, but you can configure other models as well.

📦 Installation

  1. Clone the repo
    git clone https://github.com/dbudaghyan/qcluster.git
    cd qcluster
  2. Set up the environment variables
    cp .env.example .env
    You can modify the .env file to change the default settings.
  3. Install ollama
    • Using Homebrew (on macOS):
      brew install ollama
    • Or download the binary directly from the official website.
  4. Pull the LLM model
    ollama pull qwen2.5:3b
    If you have defined other models in your .env file, make sure to pull them as well.
  5. Start the ollama server
    ollama serve
  6. Install the Python dependencies
    uv sync

▶️ Usage

You can run the clustering pipeline either as a simple Python script or through a Jupyter Notebook for a more interactive experience.

Option 1: Python Script

uv run qcluster.pipeline

Option 2: Jupyter Notebook

Add the project root to the Python path:

export PYTHONPATH=$(pwd)

Then run Jupyter Lab:

uv run --with jupyter jupyter-lab

The notebook is located at notebooks/pipeline.ipynb. The reports will be saved in the EVALUATION_RESULTS_DIR defined in the .env file.

TL;DR

cd qcluster
cp .env.example .env
# Modify the .env file if needed
brew install ollama
ollama pull qwen2.5:3b
# pull other models if needed (if defined in the .env file)
ollama serve
uv sync
uv run qcluster.pipeline
# or
# export PYTHONPATH=$(pwd)
# uv run --with jupyter jupyter-lab
# and open notebooks/pipeline.ipynb and run the cells

The reports will be saved in the EVALUATION_RESULTS_DIR defined in the .env file. See the Evaluation Results for more details on the results structure.

📄 License

Distributed under the GPL-2.0 License. See LICENSE for more information.

About

A Python library for experimenting with clustering pipelines.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors