Local ChatBot

A local chatbot application that runs HuggingFace LLM models on your machine. Features a native macOS app, Streamlit web UI, and command-line interface.

Features

Load any HuggingFace model - Paste a model name and run it locally
Streaming responses - See responses as they're generated
Thinking/reasoning support - Models with <think> tags display reasoning in a collapsible section
RTL language support - Automatic right-to-left text detection for Hebrew, Arabic, etc.
Conversation history - Maintains full context across messages
Markdown rendering - Responses render with proper formatting
Quantization support - 4-bit and 8-bit quantization for large models (CUDA only)
Export chat - Download conversation history as markdown

Installation

# Clone or navigate to the project directory
cd Local_ChatBot

# Create a virtual environment
python3 -m venv venv

# Activate the virtual environment
source venv/bin/activate  # Linux/macOS
# or
venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

Usage

Native macOS App

The native macOS app requires running the API server first:

# Install server dependencies
pip install -r requirements-server.txt

# Start the API server
python run_server.py

Then open the Xcode project and run:

open LocalChatBotApp/LocalChatBotApp.xcodeproj

Or build and run from command line:

cd LocalChatBotApp
xcodebuild -scheme LocalChatBotApp -configuration Debug build

The macOS app connects to http://127.0.0.1:8000 by default.

Streamlit Web UI

streamlit run app.py

Then open your browser to http://localhost:8501

Enter a HuggingFace model name in the sidebar (e.g., TinyLlama/TinyLlama-1.1B-Chat-v1.0)
Click Load Model and wait for download
Start chatting!

Debug Mode

CHATBOT_DEBUG=true streamlit run app.py

Command Line Interface

Single Prompt

python cli.py -m microsoft/DialoGPT-medium -p "Hello, how are you?"

Interactive Mode

python cli.py -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 -i

With System Prompt

python cli.py -m MODEL -i -s "You are a helpful coding assistant."

With Quantization (CUDA only)

python cli.py -m dicta-il/DictaLM-3.0-24B-Thinking -i --4bit

CLI Options

Option	Description
`-m, --model`	HuggingFace model name (required)
`-p, --prompt`	Single prompt to send
`-i, --interactive`	Run in interactive chat mode
`-s, --system-prompt`	System prompt to set behavior
`--max-tokens`	Maximum tokens to generate (default: 512)
`--temperature`	Sampling temperature (default: 0.7)
`--top-p`	Top-p sampling threshold (default: 0.9)
`--device`	Device: cuda, mps, or cpu
`--4bit`	Load in 4-bit quantization
`--8bit`	Load in 8-bit quantization
`--debug`	Enable debug logging
`--show-thinking`	Show model's thinking process

Interactive Commands

Command	Description
`/quit`	Exit the chat
`/clear`	Clear conversation history
`/history`	Show conversation history
`/think`	Toggle thinking display

Project Structure

Local_ChatBot/
├── app.py                  # Streamlit web UI
├── cli.py                  # Command-line interface
├── chatbot.py              # Core ChatBot class
├── run_server.py           # FastAPI server entry point
├── requirements.txt        # Base dependencies
├── requirements-server.txt # Server dependencies (includes FastAPI)
├── server/                 # FastAPI backend
│   ├── main.py             # FastAPI app
│   ├── dependencies.py     # Singleton ChatBot
│   ├── schemas.py          # Pydantic models
│   └── routes/
│       ├── model.py        # Model management endpoints
│       ├── chat.py         # Chat endpoints
│       └── websocket.py    # Streaming WebSocket
├── LocalChatBotApp/        # Native macOS SwiftUI app
│   ├── LocalChatBotApp.xcodeproj
│   └── LocalChatBotApp/
│       ├── Models/         # Data models
│       ├── ViewModels/     # State management
│       ├── Views/          # SwiftUI views
│       ├── Services/       # API & WebSocket clients
│       └── Utilities/      # RTL detection, etc.
└── README.md

Supported Models

Any HuggingFace causal language model should work. Some tested examples:

TinyLlama/TinyLlama-1.1B-Chat-v1.0 - Small, fast model
microsoft/DialoGPT-medium - Conversational model
dicta-il/DictaLM-3.0-24B-Thinking - Hebrew model with reasoning
meta-llama/Llama-2-7b-chat-hf - Llama 2 (requires access)

Device Support

CUDA - NVIDIA GPUs with full quantization support
MPS - Apple Silicon (M1/M2/M3)
CPU - Any system (slower)

Thinking Models

Models that output reasoning in <think>...</think> tags are automatically handled:

Thinking content appears in a collapsible "Thinking" section
Only the final answer is shown during streaming
Toggle visibility in sidebar or with /think command

Requirements

Python 3.9+
PyTorch 2.0+
Transformers 4.36+
Streamlit 1.28+

See requirements.txt for full dependencies.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local ChatBot

Features

Installation

Usage

Native macOS App

Streamlit Web UI

Debug Mode

Command Line Interface

Single Prompt

Interactive Mode

With System Prompt

With Quantization (CUDA only)

CLI Options

Interactive Commands

Project Structure

Supported Models

Device Support

Thinking Models

Requirements

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LocalChatBotApp		LocalChatBotApp
server		server
.gitignore		.gitignore
README.md		README.md
app.py		app.py
chatbot.py		chatbot.py
cli.py		cli.py
requirements-server.txt		requirements-server.txt
requirements.txt		requirements.txt
run_server.py		run_server.py

tomron87/local-chat-bot

Folders and files

Latest commit

History

Repository files navigation

Local ChatBot

Features

Installation

Usage

Native macOS App

Streamlit Web UI

Debug Mode

Command Line Interface

Single Prompt

Interactive Mode

With System Prompt

With Quantization (CUDA only)

CLI Options

Interactive Commands

Project Structure

Supported Models

Device Support

Thinking Models

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages