!! Under active development !! Do not clone into this repo expecting something stable, something will surely break.
This repository allows you to create a chatbot based on someone's writing that you can interact with over Discord or through the command line. You can read blog posts about earlier iterations of the codebase here.
This code has been tested with with Python 3.9.13.
- Activate your virtual env
- Install Pytorch via the installation instructions given here: https://pytorch.org/get-started/locally/
pip install -r requirements.txt
- Set your OpenAI key in:
configs/embedding/text-embedding-3-small.jsonconfigs/llm/gpt-4o-mini.json
python -m src.scripts.serve_retrieval --config configs/retrieval/zef_demo_gt.json --port 5000- In another terminal window:
python -m src.scripts.serve_retrieval --config configs/retrieval/zef_demo_conv_history.json --port 5001 - In yet another window:
python -m src.scripts.chat --bot_config_path configs/bot/zef_demo.json
Step 4 will drop you into a command-line chat loop with a bot based on the contents of data/zef.txt. Read on to learn how to change your bot's source data, prompt, LLM backend, add MCP servers and more.
The bot can be configured to retrieve from two vector stores:
- A store of the cloning target's ground-truth writing, for example, the Facebook status updates in
data/zef.txt. - A store of previous chatbot conversation, which can be updated over time in order to allow the chatbot to learn new things.
Either or both of these stores are optional for the bot's operation, but you probably want to use them in order to give the bot some prior context to go off of.
The bot interacts with vector stores as servers: see src/bot/rag_module.py. I've included an implementation of a vector store server in src.retrieval: what follows are instructions for running my implementation.
In order to produce vectors to put in these stores, we need an embedding model. My implementation allows the user to specify a local embedding model with Huggingface (see configs/embedding/bge-large-en-v1.5.json) or a hosted embedding model (see configs/embedding/text-embedding-3-small.json).
Once you've created your embedding model config you'll set its path as the value of embedding_config_path in your retrieval config.
configs/retrieval contains examples of configs for the two types of vector store:
configs/retrieval/zef_demo_gt.jsonis a ground-truth store for the contents of the documentdata/zef.txt. It does not permit updates.configs/retrieval/zef_demo_conv_history.jsonis a chatbot conversation store. It starts out empty and get updated with new messages over time.
If you want to use your own data, you can either:
- create a .txt file in the same format as
data/zef.txt, with individual samples separated by the string\n-----\n - create a parquet document (or folder of parquet documents) where each entry has the fields
text(specifying the text to embed) and an optional dictionary fieldmeta(specifying metadata associated with the entry)
Once you have a retrieval config that you're satisfied with, you can serve it using python -m src.scripts.serve_retrieval --config configs/retrieval/my_store.json.
You can specify the format to use when presenting information to your model with a Jinja template. configs/prompt_templates contains two examples of such templates:
zef_completion.j2is designed to be used with base models like Mixtral-8x7B-v0.1 which try to continue the output of whatever input they got.zef_instruct.j2is designed to be used with instruction models like OpenAI's GPT or Anthropic's Claude.
You can use any inference endpoint which implements the OpenAI Chat Completions spec (which includes many non-OpenAI providers, like Together AI) or Anthropic's messages API.
See configs/llm for examples.
configs/bot/ contains examples of a config designed for base model inference and a config designed for instruct inference.
Once you have a bot config that you're satisfied with, you can chat with in from the command line with python -m src.scripts.chat --bot_config_path configs/bot/my_config.json.
If your chosen LLM endpoint supports tool use, you can set tool_use to true in your bot config to allow your bot to add reactions to messages and use MCP servers of your choice. To add an MCP server to a bot config, extend the mcp_servers field of your bot config like so:
"mcp_servers": [
{
"name": "My MCP server",
"command": "uv",
"args": [
"--directory",
"/absolute/path/to/script",
"run",
"my_mcp_server.py"
]
}
]
Where command and args follow the Claude Desktop MCP server format for running Python or JavaScript MCP servers.
To chat with your bot on Discord, you'll need to make a Discord bot account and acquire a token. Then you'll need to create a Discord bot config: see configs/discord for an example. You'll need to specify the following fields:
channels: a list of channel names that the bot can talk in in any server that it's invited to. The bot will also respond to any DMs that you send it.clear_command: tying this string in a channel that the bot can access will clear its recent conversational memory, which is useful if it's become stuck in a loop.token: the token for your bot's account
Once you have a config, run your bot with python -m src.scripts.run_discord_bot --bot_config_path configs/bot/my_config.json --discord_config_path configs/bot/my_discord_config.json.
When testing out bots, you may want to run different configurations on some standard set of questions to compare outputs. src/scripts/qa_eval will let you do this given as input either a:
- .json file containing a list of entries with the fields
author,questionandresponse, whereauthoris a string representing the question's author andresponseis a ground-truth answer that you'd consider "correct". - .tsv file containing columns representing author, question and response
It will then output a json or tsv file (depending on command-line args) that allows you to compare generated answer to the specified ground-truth response.
You can specify a database to save messages that the bots sends and receives via the -db argument to src.scripts.chat and src.scripts.run_discord_bot. This codebase only supports storing to a local SQLite database for now, see configs/database/sqlite_example.json for an example.