openclaw-talks-back 🔱

A real-time voice AI that joins Jitsi video calls and has actual conversations.

No cloud APIs. Runs on a laptop. Fully self-hosted.

What it does

Joins a Jitsi Meet room as a participant named "Enki"
Listens to speech via WebRTC audio capture
Transcribes in real-time using Whisper (~0.6s latency)
Responds with synthesized voice using Edge TTS
All processing happens locally

Demo

You: "Hello, can you hear me?"
Enki: "Hello! Nice to hear from you!"

You: "What's your name?"
Enki: "I am Enki, god of wisdom and water. I'm your AI assistant."

Stack

Component	Technology
Video Conferencing	Self-hosted Jitsi Meet (Docker)
Bot Browser	Puppeteer (Headless Chrome)
Speech-to-Text	faster-whisper (tiny.en model)
Text-to-Speech	edge-tts (Microsoft Neural Voices)
Audio Routing	PulseAudio virtual sinks

Requirements

Linux (tested on Ubuntu 24.04 / WSL2)
Node.js 18+
Python 3.10+
Docker & Docker Compose
PulseAudio
FFmpeg

Quick Start

1. Set up Jitsi Meet

# Clone Jitsi Docker setup
git clone https://github.com/jitsi/docker-jitsi-meet.git jitsi
cd jitsi

# Configure
cp env.example .env
# Edit .env and set:
# - JVB_ADVERTISE_IPS=127.0.0.1,<your-ip>
# - ENABLE_GUESTS=1

# Start
docker compose up -d

2. Set up the bot

# Clone this repo
git clone https://github.com/yaxzone/openclaw-talks-back.git
cd openclaw-talks-back

# Install Node dependencies
npm install

# Create Python venv and install Whisper
python3 -m venv venv
source venv/bin/activate
pip install faster-whisper edge-tts

3. Set up PulseAudio virtual sink

pactl load-module module-null-sink sink_name=VirtualMic sink_properties=device.description=VirtualMic

4. Run the bot

node voice-bot.js

5. Join the call

Open https://localhost:8443/Enki in your browser and start talking!

Configuration

Environment variables:

Variable	Default	Description
`ROOM`	`Enki`	Jitsi room name to join
`WHISPER_VENV`	`./venv/bin/python`	Path to Python with faster-whisper
`EDGE_TTS`	`edge-tts`	Path to edge-tts binary
`OPENCLAW_URL`	`http://localhost:18789/v1/responses`	OpenClaw API endpoint
`OPENCLAW_TOKEN`	(required)	OpenClaw gateway auth token

OpenClaw Integration

The bot connects to OpenClaw's API for intelligent responses. This means you get a real AI conversation, not canned responses.

Enable the responses endpoint in OpenClaw config:

{
  gateway: {
    http: {
      endpoints: {
        responses: { enabled: true }
      }
    }
  }
}

Set your token:

export OPENCLAW_TOKEN="your-gateway-token"

Run the bot — it will now route speech to OpenClaw and speak the AI's response!

Architecture

See ARCHITECTURE.md for detailed technical documentation including:

System diagram
Data flow (STT and TTS)
Key challenges and solutions
Performance metrics

Key Challenges Solved

ICE Connection Failures: Bot connects via localhost but JVB advertised different IP. Fixed by adding 127.0.0.1 to JVB_ADVERTISE_IPS.
Slow Whisper: Model reload per-chunk was too slow. Created persistent server that keeps model in memory.
TTS Audio Routing: Chrome's fake-media-stream only sends test patterns. Used PulseAudio virtual sinks to route TTS audio into WebRTC stream.
Echo Loop: Bot was transcribing its own TTS. Added isSpeaking flag to skip transcription during playback.

Performance

Metric	Value
STT Latency	~0.6s per 4s chunk
TTS Generation	~1-2s
End-to-end	~5-6s
Memory (Chrome)	~250MB
Memory (Whisper)	~200MB

Running as a Service

For persistent operation (auto-start, auto-restart on crash):

# Copy service file
sudo cp jitsi-voice-bot.service /etc/systemd/system/

# Edit the service file to set your OPENCLAW_TOKEN
sudo nano /etc/systemd/system/jitsi-voice-bot.service

# Enable and start
sudo systemctl daemon-reload
sudo systemctl enable jitsi-voice-bot
sudo systemctl start jitsi-voice-bot

# View logs
sudo journalctl -u jitsi-voice-bot -f

The bot will now:

Auto-start on system boot
Auto-restart if it crashes
Always be waiting in the configured room

Future Ideas

Wake word detection ("Hey Enki")
Streaming STT for lower latency
~~Integration with LLM for intelligent responses~~ ✅ Done via OpenClaw API
~~Systemd service for persistence~~ ✅ Done
Multiple room support

License

MIT

Built with 🔱 — February 2026

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md
jitsi-voice-bot.service		jitsi-voice-bot.service
package-lock.json		package-lock.json
package.json		package.json
transcribe-server.py		transcribe-server.py
voice-bot.js		voice-bot.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

openclaw-talks-back 🔱

What it does

Demo

Stack

Requirements

Quick Start

1. Set up Jitsi Meet

2. Set up the bot

3. Set up PulseAudio virtual sink

4. Run the bot

5. Join the call

Configuration

OpenClaw Integration

Architecture

Key Challenges Solved

Performance

Running as a Service

Future Ideas

License

About

Uh oh!

Releases

Packages

Languages

yaxzone/openclaw-talks-back

Folders and files

Latest commit

History

Repository files navigation

openclaw-talks-back 🔱

What it does

Demo

Stack

Requirements

Quick Start

1. Set up Jitsi Meet

2. Set up the bot

3. Set up PulseAudio virtual sink

4. Run the bot

5. Join the call

Configuration

OpenClaw Integration

Architecture

Key Challenges Solved

Performance

Running as a Service

Future Ideas

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages