粵文字幕生成器 Cantonese Subtitle Transcript Service

呢個係粵文字幕生成器，輸入音頻文件（.mp3 .wav .webm .flac 等等）輸出.srt 字幕文件。

粵語轉寫用 FunAudioLLM/SenseVoiceSmall 配合 Silero VAD 做切分。字幕文字以 OpenCC 進行繁簡轉換及規則修正。

使用教程

準備工作

將本 repo clone 落本地後，跑下面嘅命令嚟安裝依賴，注意必須要用python 3.12以下版本，如果用3.13 會出錯

apt install ffmpeg
pip install -r requirements.txt

跟住準備好你需要轉寫嘅音頻文件，單獨轉寫一個文件可以直接跑

python cli.py audio.mp3 --output_dir output

如果唔特指某個文件而係成個路經，就會自動轉寫晒路經下所有嘅音頻：

# 自動轉寫晒所有 audio/ 入面嘅音頻
python cli.py ./audio/ --output_dir output

如果你想下載 YouTube 片音頻，可以裝 pip install yt-dlp 然後跑下面嘅命令嚟下載

# 呢條命令係單純下載音頻，冇視頻嘅，如果想要下載埋視頻就刪咗個 -f ba 佢
yt-dlp -f ba https://youtu.be/rIBD6A4lnLQ

Introduction

This service uses SenseVoice and VAD for transcription and OpenCC for traditional Chinese conversion and rule-based fixes to generate Cantonese subtitles.

This version supports local files via CLI and a simple web UI; the API includes a YouTube helper to download audio if needed.

Download audio file from Youtube video URL
Use VAD model to split audio file into small audio clips
Use SenseVoice model to generate Cantonese subtitle transcript and timestamp for each audio clip
Since the output of SenseVoice model is Simplified Chinese, we use OpenCC to convert it to Traditional Chinese and apply rule-based Cantonese fixes
Generate SRT file for the Cantonese subtitle transcript

Models

Models are loaded dynamically via FunASR and Silero VAD at runtime; no ONNX export is required.

Prerequisites

sudo apt install ffmpeg
pip install -r requirements.txt

Usage

Prerequisites

Ensure ffmpeg and Python dependencies are installed. Models are downloaded automatically by FunASR/Silero on first run.

You can run the following command to download a YouTube audio. Make sure you have yt-dlp installed by pip install yt-dlp.

# download audio file from youtube video url, if you want to download video as well, remove -f ba
yt-dlp -f ba https://youtu.be/rIBD6

Transcribe

Run the CLI (OpenCC corrector is enabled by default). You can increase context and merge small pauses for better quality.

single file transcription can be run directly

$ python cli.py your_audio.mp3 --output-dir output --max-length 30 --merge-gap-ms 200

or in batch

# Auto transcribe all audio files under the audio/ directory
python cli.py ./audio/ --output_dir output

or run the web API service

$ python app.py

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
.github/workflows		.github/workflows
corrector		corrector
data		data
models		models
tests		tests
transcriber		transcriber
.gitignore		.gitignore
CHANGLOG		CHANGLOG
Dockerfile		Dockerfile
README.md		README.md
api.py		api.py
app.py		app.py
cli.py		cli.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

粵文字幕生成器 Cantonese Subtitle Transcript Service

使用教程

準備工作

Introduction

Models

Prerequisites

Usage

Prerequisites

Transcribe

About

Uh oh!

Releases

Packages

Languages

laubonghaudoi/yuesub-api

Folders and files

Latest commit

History

Repository files navigation

粵文字幕生成器 Cantonese Subtitle Transcript Service

使用教程

準備工作

Introduction

Models

Prerequisites

Usage

Prerequisites

Transcribe

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages