A local-first, real-time meeting transcription and summarization agent.
This project aims to build an assistant that listens to real-time conversations, transcribes them, detects trigger keywords (like "Hey Assistant"), and dynamically dispatches tasks such as summarization and post-correction — all running on your own machine, without relying on cloud services.
- 🎙️ Real-time speech-to-text (multi-language, cross-lingual)
- 🧠 Agent framework to dispatch tasks dynamically
- 🗣️ Wake word detection ("Hey Assistant") to trigger specific actions
- ✍️ Summarization and post-transcription correction
- 💬 Seamless real-time text input into applications (via Chrome extension or native app)
- Set up real-time transcription using RealtimeSTT or WhisperLive.
- Build an agent that:
- Receives live transcriptions
- Detects trigger keywords
- Buffers conversations
- Calls summarization or correction functions
- Output summarized or corrected text to console.
- Set up a WebSocket server to broadcast live transcriptions.
- Build a Chrome extension to:
- Connect to the local WebSocket server
- Autofill active input fields with live transcription
- Integrate local LLMs (e.g., Mistral 7B, OpenHermes) for summarization.
- Implement post-transcription correction (grammar, spelling).
- Fine-tune summarization prompts for meeting notes.
- Build a macOS native input method (InputMethodKit) for system-wide text input.
- Add speaker diarization to separate notes per speaker.
- Optimize low-latency real-time correction during transcription.
- ASR Engine: RealtimeSTT / WhisperLive (Whisper-based)
- Programming Language: Python 3.9+
- Agent Framework: Lightweight custom agent
- Frontend: Chrome Extension (Manifest V3, WebSocket client)
- Optional: OpenAI API (for early summarization tasks)
- Future: Local LLM (Mistral 7B, OpenHermes, llama.cpp)
realtime-meeting-assistant/
├── agent/ # Core agent to manage transcription and tasks
│ ├── agent.py
│ └── ws_server.py
├── asr/ # Setup and documentation for ASR runner
│ └── runner.md
├── frontend/ # Chrome extension for real-time input
│ └── chrome_extension/
├── requirements.txt # Python dependencies
├── README.md # Project introduction and guide
└── .gitignore # Ignore temp files
git clone https://github.com/your-username/realtime-meeting-assistant.git
cd realtime-meeting-assistant
pip install -r requirements.txt
(Use RealtimeSTT or WhisperLive, running locally.)
python agent/agent.py
The agent will connect to the transcription server, detect keywords, and dispatch tasks.
(Instructions coming soon after frontend MVP!)
- Native speaker separation and diarization
- Low-latency incremental summarization
- Full offline summarization using local models
- Mobile device integration
- Windows/Linux support for system-wide typing
This project is licensed under the MIT License.
Feel free to fork, contribute, or modify for personal or commercial use!
Issues and pull requests are welcome!
Please submit detailed bug reports or feature suggestions via GitHub Issues.
Built with ❤️ to make meetings smarter and more efficient.