whisper-rocm

Local speech-to-text with AMD ROCm GPU acceleration

A browser-based voice transcription app powered by OpenAI's Whisper model, optimized for AMD GPUs using ROCm. Built specifically for the new AMD Ryzen AI processors with Radeon 800M series integrated graphics.

Features

Real-time audio visualization with cyberpunk-themed UI
Press-and-hold recording (mouse or touch)
GPU-accelerated transcription via ROCm/CUDA
One-click copy to clipboard
Mobile-responsive design
Automatic language detection

Hardware

Tested on:

AMD Ryzen AI 9 HX 370 + Radeon 890M (Strix Point / gfx1150)
TUXEDO laptop running Ubuntu 24.04

Compatible with:

AMD GPUs with ROCm support (gfx1150, gfx1100, etc.)
NVIDIA GPUs (CUDA)
CPU fallback (slower, but works)

Installation

1. Clone the repository

git clone https://github.com/M64GitHub/whisper-rocm.git
cd whisper-rocm

2. Create a virtual environment

python -m venv venv
source venv/bin/activate

3. Install PyTorch with ROCm support

For AMD Radeon 890M / 880M (gfx1150 - Strix Point):

pip install --index-url https://repo.amd.com/rocm/whl/gfx1150/ torch

For other AMD GPUs, check available builds at: https://repo.amd.com/rocm/whl/

For NVIDIA GPUs or CPU, see: https://pytorch.org/get-started/locally/

4. Install dependencies

pip install -r requirements.txt

Usage

python App.py

Open http://localhost:8000 in your browser.

How to use:

Click and hold the "HOLD TO RECORD" button
Speak clearly into your microphone
Release the button to transcribe
Click "COPY" or use the auto-selected text

Configuration

Whisper Model Size

Edit App.py line 20 to change the model:

model = whisper.load_model("medium", device=device)  # Options: tiny, base, small, medium, large

Model	Parameters	VRAM	Speed	Accuracy
tiny	39M	~1GB	Fastest	Basic
base	74M	~1GB	Fast	Good
small	244M	~2GB	Medium	Better
medium	769M	~5GB	Slower	Great
large	1550M	~10GB	Slowest	Best

Tech Stack

Backend: FastAPI + Uvicorn
ML Model: OpenAI Whisper
GPU Acceleration: PyTorch + ROCm 7.10
Frontend: Vanilla HTML/CSS/JavaScript
Audio: Web Audio API + MediaRecorder

Troubleshooting

Check if ROCm detects your GPU

source venv/bin/activate
python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'No GPU')"

Microphone permissions

Make sure your browser has microphone access.

License

MIT License — see LICENSE for details.

Acknowledgments

OpenAI Whisper — the speech recognition model
AMD ROCm — GPU compute platform
FastAPI — modern Python web framework

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
App.py		App.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisper-rocm

Features

Hardware

Installation

1. Clone the repository

2. Create a virtual environment

3. Install PyTorch with ROCm support

4. Install dependencies

Usage

Configuration

Whisper Model Size

Tech Stack

Troubleshooting

Check if ROCm detects your GPU

Microphone permissions

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

M64GitHub/whisper-rocm

Folders and files

Latest commit

History

Repository files navigation

whisper-rocm

Features

Hardware

Installation

1. Clone the repository

2. Create a virtual environment

3. Install PyTorch with ROCm support

4. Install dependencies

Usage

Configuration

Whisper Model Size

Tech Stack

Troubleshooting

Check if ROCm detects your GPU

Microphone permissions

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages