Skip to content

Real-time voice AI that joins Jitsi video calls and has actual conversations. No cloud APIs. Fully self-hosted.

Notifications You must be signed in to change notification settings

yaxzone/openclaw-talks-back

Repository files navigation

openclaw-talks-back 🔱

A real-time voice AI that joins Jitsi video calls and has actual conversations.

No cloud APIs. Runs on a laptop. Fully self-hosted.

What it does

  • Joins a Jitsi Meet room as a participant named "Enki"
  • Listens to speech via WebRTC audio capture
  • Transcribes in real-time using Whisper (~0.6s latency)
  • Responds with synthesized voice using Edge TTS
  • All processing happens locally

Demo

You: "Hello, can you hear me?"
Enki: "Hello! Nice to hear from you!"

You: "What's your name?"
Enki: "I am Enki, god of wisdom and water. I'm your AI assistant."

Stack

Component Technology
Video Conferencing Self-hosted Jitsi Meet (Docker)
Bot Browser Puppeteer (Headless Chrome)
Speech-to-Text faster-whisper (tiny.en model)
Text-to-Speech edge-tts (Microsoft Neural Voices)
Audio Routing PulseAudio virtual sinks

Requirements

  • Linux (tested on Ubuntu 24.04 / WSL2)
  • Node.js 18+
  • Python 3.10+
  • Docker & Docker Compose
  • PulseAudio
  • FFmpeg

Quick Start

1. Set up Jitsi Meet

# Clone Jitsi Docker setup
git clone https://github.com/jitsi/docker-jitsi-meet.git jitsi
cd jitsi

# Configure
cp env.example .env
# Edit .env and set:
# - JVB_ADVERTISE_IPS=127.0.0.1,<your-ip>
# - ENABLE_GUESTS=1

# Start
docker compose up -d

2. Set up the bot

# Clone this repo
git clone https://github.com/yaxzone/openclaw-talks-back.git
cd openclaw-talks-back

# Install Node dependencies
npm install

# Create Python venv and install Whisper
python3 -m venv venv
source venv/bin/activate
pip install faster-whisper edge-tts

3. Set up PulseAudio virtual sink

pactl load-module module-null-sink sink_name=VirtualMic sink_properties=device.description=VirtualMic

4. Run the bot

node voice-bot.js

5. Join the call

Open https://localhost:8443/Enki in your browser and start talking!

Configuration

Environment variables:

Variable Default Description
ROOM Enki Jitsi room name to join
WHISPER_VENV ./venv/bin/python Path to Python with faster-whisper
EDGE_TTS edge-tts Path to edge-tts binary
OPENCLAW_URL http://localhost:18789/v1/responses OpenClaw API endpoint
OPENCLAW_TOKEN (required) OpenClaw gateway auth token

OpenClaw Integration

The bot connects to OpenClaw's API for intelligent responses. This means you get a real AI conversation, not canned responses.

  1. Enable the responses endpoint in OpenClaw config:
{
  gateway: {
    http: {
      endpoints: {
        responses: { enabled: true }
      }
    }
  }
}
  1. Set your token:
export OPENCLAW_TOKEN="your-gateway-token"
  1. Run the bot — it will now route speech to OpenClaw and speak the AI's response!

Architecture

See ARCHITECTURE.md for detailed technical documentation including:

  • System diagram
  • Data flow (STT and TTS)
  • Key challenges and solutions
  • Performance metrics

Key Challenges Solved

  1. ICE Connection Failures: Bot connects via localhost but JVB advertised different IP. Fixed by adding 127.0.0.1 to JVB_ADVERTISE_IPS.

  2. Slow Whisper: Model reload per-chunk was too slow. Created persistent server that keeps model in memory.

  3. TTS Audio Routing: Chrome's fake-media-stream only sends test patterns. Used PulseAudio virtual sinks to route TTS audio into WebRTC stream.

  4. Echo Loop: Bot was transcribing its own TTS. Added isSpeaking flag to skip transcription during playback.

Performance

Metric Value
STT Latency ~0.6s per 4s chunk
TTS Generation ~1-2s
End-to-end ~5-6s
Memory (Chrome) ~250MB
Memory (Whisper) ~200MB

Running as a Service

For persistent operation (auto-start, auto-restart on crash):

# Copy service file
sudo cp jitsi-voice-bot.service /etc/systemd/system/

# Edit the service file to set your OPENCLAW_TOKEN
sudo nano /etc/systemd/system/jitsi-voice-bot.service

# Enable and start
sudo systemctl daemon-reload
sudo systemctl enable jitsi-voice-bot
sudo systemctl start jitsi-voice-bot

# View logs
sudo journalctl -u jitsi-voice-bot -f

The bot will now:

  • Auto-start on system boot
  • Auto-restart if it crashes
  • Always be waiting in the configured room

Future Ideas

  • Wake word detection ("Hey Enki")
  • Streaming STT for lower latency
  • Integration with LLM for intelligent responses ✅ Done via OpenClaw API
  • Systemd service for persistence ✅ Done
  • Multiple room support

License

MIT


Built with 🔱 — February 2026

About

Real-time voice AI that joins Jitsi video calls and has actual conversations. No cloud APIs. Fully self-hosted.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published