This repository runs automated conversations between two frontier language models over multiple turns and saves both:
- a readable transcript
- a structured state file for transparency and partial reproducibility
It currently supports:
- Anthropic Claude models
- Alibaba GLM models via DashScope
- Alibaba Qwen models via DashScope
- Direct Z.AI GLM models
- Google Gemini models
- Moonshot Kimi models
- OpenRouter Kimi and GLM models
- OpenAI GPT models
- xAI Grok models
The project is shared as an experiment framework rather than a polished product.
For each run, the program:
- starts a conversation between Model A and Model B
- gives Model A an initial facilitator message
- alternates turns between the two models
- optionally asks each model a final facilitator question
- prints the exchange live in the terminal
- saves a
.txttranscript for readability - saves a
.state.jsonsnapshot with structured run state
Each run produces files in transcripts/:
...txt...state.json
The transcript is the readable artifact.
The state file is the transparency artifact. It includes:
- run status and phase
- system prompts
- run configuration
- provider/model metadata
- usage and cost metrics
convo_aandconvo_b, which are the full API-formatted conversation snapshots each model would see at that checkpoint- git commit hash
- error details for failed runs
- Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Create a
.envfile with whichever provider keys you need:
ANTHROPIC_API_KEY=...
OPENAI_API_KEY=...
GOOGLE_API_KEY=...
MOONSHOT_API_KEY=...
OPEN_ROUTER_API_KEY=...
ZAI_API_KEY=...
DASHSCOPE_API_KEY=...
# or ALI_BABA_API_KEY=...
# or BAILIAN_API_KEY=...
XAI_API_KEY=...Only the keys for the models you actually run are required.
For DashScope Qwen and GLM, DASHSCOPE_API_KEY, ALI_BABA_API_KEY, or BAILIAN_API_KEY all work.
For OpenRouter Kimi/GLM, use OPEN_ROUTER_API_KEY (or OPENROUTER_API_KEY).
The default OpenRouter base URL used by the repo is:
https://openrouter.ai/api/v1
For direct Z.AI GLM, use ZAI_API_KEY.
The default Z.AI base URL used by the repo is:
https://api.z.ai/api/paas/v4/
For DashScope Qwen, the default OpenAI-compatible base URL used by the repo is:
https://dashscope-intl.aliyuncs.com/compatible-mode/v1
For glm-5, the default base URL used by the repo is:
https://dashscope.aliyuncs.com/compatible-mode/v1
Alibaba documents glm-5 as a Chinese Mainland deployment-mode model, so it uses the mainland DashScope endpoint rather than the default international Qwen endpoint.
You can override the Qwen endpoint with DASHSCOPE_BASE_URL and the GLM endpoint with GLM_BASE_URL.
Run with the defaults from params.py:
python3 main.pySee all CLI options:
python3 main.py --helpList models:
python3 list_models.py
python3 list_models.py --google --detailsThe repository includes a public prompt directory:
prompts/public/
The default shipped prompt is:
prompts/public/standard.txt
If you want local-only prompts, put them in:
prompts/private/
That directory is intended for private prompts and is ignored by git.
The easiest way to change defaults is to edit params.py.
Key settings include:
MODEL_A/MODEL_BSYSTEM_PROMPT_A/SYSTEM_PROMPT_BMAX_TURNSTEMPERATURE_A/TEMPERATURE_BTHINKING_BUDGET_A/THINKING_BUDGET_BSTART_MESSAGE_A/START_MESSAGE_BFINAL_QUESTION_A/FINAL_QUESTION_B
The default kickoff currently sends Model A:
You are about to speak with another LLM. Please begin the conversation.
Thinking behavior is provider-specific.
Current notable cases:
- Anthropic:
- most thinking-capable Claude models use a fixed thinking budget
claude-opus-4-6uses adaptive thinking, with the repo mapping the budget setting onto effort levels
- Google Gemini:
- Gemini 3 models use
thinkingLevel=HIGHwith thoughts included
- Gemini 3 models use
- OpenRouter Kimi / GLM:
- use OpenRouter's
reasoningparameter - the repo captures streamed
reasoning_detailsand carries them forward in assistant history
- use OpenRouter's
- Direct Z.AI GLM:
- uses Z.AI's OpenAI-compatible API
- captures
reasoning_content
- OpenAI GPT-5 family:
- uses the Responses API
- default setup uses medium reasoning effort
temperatureis not sent for GPT-5-family models in this path
This repo can estimate per-turn and total run cost when usage data is available.
To enable that:
- Copy the pricing template:
cp pricing.example.json pricing.json-
Fill in the models you care about.
-
Run with:
python3 main.py --pricing-file pricing.jsonIf no pricing file is provided, or a model is missing from the file, token usage may still appear but cost will show as N/A.
If you want to publish selected transcripts, use a curated folder rather than the full working archive.
This repo includes:
shared_transcripts/
That folder is intended for transcripts you are comfortable publishing.
Core files:
main.py- conversation loop, transcript/state output, CLIapi_client.py- provider routing and API callsconversation.py- history shaping utilitiescosting.py- token normalization and cost estimationparams.py- default run configuration
Other useful files:
requirements.txtpricing.example.jsonlist_models.pylist_gemini_models.py
- The repo is designed for experimentation, not strict determinism.
- Even with the same prompts and models, runs can diverge because sampling and provider behavior are stochastic.
- Some transcripts and state files may include failures, quota errors, or other operational artifacts by design.
MIT