Local speech-to-text with AMD ROCm GPU acceleration
A browser-based voice transcription app powered by OpenAI's Whisper model, optimized for AMD GPUs using ROCm. Built specifically for the new AMD Ryzen AI processors with Radeon 800M series integrated graphics.
- Real-time audio visualization with cyberpunk-themed UI
- Press-and-hold recording (mouse or touch)
- GPU-accelerated transcription via ROCm/CUDA
- One-click copy to clipboard
- Mobile-responsive design
- Automatic language detection
Tested on:
- AMD Ryzen AI 9 HX 370 + Radeon 890M (Strix Point / gfx1150)
- TUXEDO laptop running Ubuntu 24.04
Compatible with:
- AMD GPUs with ROCm support (gfx1150, gfx1100, etc.)
- NVIDIA GPUs (CUDA)
- CPU fallback (slower, but works)
git clone https://github.com/M64GitHub/whisper-rocm.git
cd whisper-rocmpython -m venv venv
source venv/bin/activateFor AMD Radeon 890M / 880M (gfx1150 - Strix Point):
pip install --index-url https://repo.amd.com/rocm/whl/gfx1150/ torchFor other AMD GPUs, check available builds at: https://repo.amd.com/rocm/whl/
For NVIDIA GPUs or CPU, see: https://pytorch.org/get-started/locally/
pip install -r requirements.txtpython App.pyOpen http://localhost:8000 in your browser.
How to use:
- Click and hold the "HOLD TO RECORD" button
- Speak clearly into your microphone
- Release the button to transcribe
- Click "COPY" or use the auto-selected text
Edit App.py line 20 to change the model:
model = whisper.load_model("medium", device=device) # Options: tiny, base, small, medium, large| Model | Parameters | VRAM | Speed | Accuracy |
|---|---|---|---|---|
| tiny | 39M | ~1GB | Fastest | Basic |
| base | 74M | ~1GB | Fast | Good |
| small | 244M | ~2GB | Medium | Better |
| medium | 769M | ~5GB | Slower | Great |
| large | 1550M | ~10GB | Slowest | Best |
- Backend: FastAPI + Uvicorn
- ML Model: OpenAI Whisper
- GPU Acceleration: PyTorch + ROCm 7.10
- Frontend: Vanilla HTML/CSS/JavaScript
- Audio: Web Audio API + MediaRecorder
source venv/bin/activate
python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'No GPU')"Make sure your browser has microphone access.
MIT License — see LICENSE for details.
- OpenAI Whisper — the speech recognition model
- AMD ROCm — GPU compute platform
- FastAPI — modern Python web framework