Skip to content

tomron87/local-chat-bot

Repository files navigation

Local ChatBot

A local chatbot application that runs HuggingFace LLM models on your machine. Features a native macOS app, Streamlit web UI, and command-line interface.

Features

  • Load any HuggingFace model - Paste a model name and run it locally
  • Streaming responses - See responses as they're generated
  • Thinking/reasoning support - Models with <think> tags display reasoning in a collapsible section
  • RTL language support - Automatic right-to-left text detection for Hebrew, Arabic, etc.
  • Conversation history - Maintains full context across messages
  • Markdown rendering - Responses render with proper formatting
  • Quantization support - 4-bit and 8-bit quantization for large models (CUDA only)
  • Export chat - Download conversation history as markdown

Installation

# Clone or navigate to the project directory
cd Local_ChatBot

# Create a virtual environment
python3 -m venv venv

# Activate the virtual environment
source venv/bin/activate  # Linux/macOS
# or
venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

Usage

Native macOS App

The native macOS app requires running the API server first:

# Install server dependencies
pip install -r requirements-server.txt

# Start the API server
python run_server.py

Then open the Xcode project and run:

open LocalChatBotApp/LocalChatBotApp.xcodeproj

Or build and run from command line:

cd LocalChatBotApp
xcodebuild -scheme LocalChatBotApp -configuration Debug build

The macOS app connects to http://127.0.0.1:8000 by default.

Streamlit Web UI

streamlit run app.py

Then open your browser to http://localhost:8501

  1. Enter a HuggingFace model name in the sidebar (e.g., TinyLlama/TinyLlama-1.1B-Chat-v1.0)
  2. Click Load Model and wait for download
  3. Start chatting!

Debug Mode

CHATBOT_DEBUG=true streamlit run app.py

Command Line Interface

Single Prompt

python cli.py -m microsoft/DialoGPT-medium -p "Hello, how are you?"

Interactive Mode

python cli.py -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 -i

With System Prompt

python cli.py -m MODEL -i -s "You are a helpful coding assistant."

With Quantization (CUDA only)

python cli.py -m dicta-il/DictaLM-3.0-24B-Thinking -i --4bit

CLI Options

Option Description
-m, --model HuggingFace model name (required)
-p, --prompt Single prompt to send
-i, --interactive Run in interactive chat mode
-s, --system-prompt System prompt to set behavior
--max-tokens Maximum tokens to generate (default: 512)
--temperature Sampling temperature (default: 0.7)
--top-p Top-p sampling threshold (default: 0.9)
--device Device: cuda, mps, or cpu
--4bit Load in 4-bit quantization
--8bit Load in 8-bit quantization
--debug Enable debug logging
--show-thinking Show model's thinking process

Interactive Commands

Command Description
/quit Exit the chat
/clear Clear conversation history
/history Show conversation history
/think Toggle thinking display

Project Structure

Local_ChatBot/
├── app.py                  # Streamlit web UI
├── cli.py                  # Command-line interface
├── chatbot.py              # Core ChatBot class
├── run_server.py           # FastAPI server entry point
├── requirements.txt        # Base dependencies
├── requirements-server.txt # Server dependencies (includes FastAPI)
├── server/                 # FastAPI backend
│   ├── main.py             # FastAPI app
│   ├── dependencies.py     # Singleton ChatBot
│   ├── schemas.py          # Pydantic models
│   └── routes/
│       ├── model.py        # Model management endpoints
│       ├── chat.py         # Chat endpoints
│       └── websocket.py    # Streaming WebSocket
├── LocalChatBotApp/        # Native macOS SwiftUI app
│   ├── LocalChatBotApp.xcodeproj
│   └── LocalChatBotApp/
│       ├── Models/         # Data models
│       ├── ViewModels/     # State management
│       ├── Views/          # SwiftUI views
│       ├── Services/       # API & WebSocket clients
│       └── Utilities/      # RTL detection, etc.
└── README.md

Supported Models

Any HuggingFace causal language model should work. Some tested examples:

  • TinyLlama/TinyLlama-1.1B-Chat-v1.0 - Small, fast model
  • microsoft/DialoGPT-medium - Conversational model
  • dicta-il/DictaLM-3.0-24B-Thinking - Hebrew model with reasoning
  • meta-llama/Llama-2-7b-chat-hf - Llama 2 (requires access)

Device Support

  • CUDA - NVIDIA GPUs with full quantization support
  • MPS - Apple Silicon (M1/M2/M3)
  • CPU - Any system (slower)

Thinking Models

Models that output reasoning in <think>...</think> tags are automatically handled:

  • Thinking content appears in a collapsible "Thinking" section
  • Only the final answer is shown during streaming
  • Toggle visibility in sidebar or with /think command

Requirements

  • Python 3.9+
  • PyTorch 2.0+
  • Transformers 4.36+
  • Streamlit 1.28+

See requirements.txt for full dependencies.

License

MIT

About

Local LLM chatbot with RAG capabilities

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published