AI Sub: AI-Powered Subtitle Generation with Translation

Overview

AI Sub is a command-line tool that leverages Google's Gemini models to generate high-quality, audio-synchronized subtitles. It is designed to produce precise English and Japanese subtitles by analyzing both audio and visual cues.

Key Features:

Multimodal Understanding: Utilizes video frames for context (e.g., identifying speakers, reading on-screen text) and audio for precise timing.
Dual-Language Support: Generates verbatim transcriptions and translations for English and Japanese.
Automatic Segmentation: Automatically splits long videos into smaller segments for efficient processing.

Showcase

Here's an example of subtitles generated by AI Sub:

For more examples, please visit ai-sub-showcase.

How It Works

Preprocessing: The input video is segmented into smaller chunks to fit within API context windows and file size limits.
AI Processing: Each segment is sent to Google Gemini. The AI analyzes the audio for speech and the video for context, following strict prompting rules to generate subtitles.
Compilation: Generated subtitles from all segments are merged into a final, chronologically sorted SRT file.

Installation

Prerequisites: Python 3.10 or higher.

Set up a Python virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate.bat`

Install AI Sub:
```
pip install --upgrade ai-sub
```

Usage

You can use AI Sub with either a Google AI Studio API Key or the Gemini CLI.

Option 1: Using Google AI Studio API Key

Obtain your API Key:
- Sign in to Google AI Studio.
- Click "Create API Key".
- Copy and securely store your key. Never disclose your API key publicly.
Run the application:
```
ai-sub --ai.google.key YOUR_API_KEY --ai.model=google-gla:gemini-3-flash-preview "path/to/your/video.mp4"
```
Note: Replace YOUR_API_KEY with your actual key and "path/to/your/video.mp4" with the video file path.

Option 2: Using Gemini CLI

Install and Authenticate Gemini CLI:
- Install: npm install -g @google/gemini-cli
- Authenticate: Follow instructions at gemini-cli.
Run the application:
```
ai-sub --ai.model=gemini-cli:gemini-3-pro-preview --split.re-encode.enabled=True --thread.subtitles=1 "path/to/your/video.mp4"
```
Important Notes for CLI Mode:
- No API key is required; the tool uses your authenticated Gemini CLI instance.
- Additional arguments are required to split and re-encode the video because the Gemini CLI has a 20MB upload limit per chunk.
- Re-encoding is resource-intensive and will increase processing time.

Known Limitations

Timestamp Accuracy: Subtitle timestamps may occasionally be inaccurate. This is an inherent characteristic of the Gemini AI model. Shorter video segments generally yield better accuracy.
AI Hallucinations: Like all LLMs, Gemini may occasionally produce "hallucinations" or inaccurate information.

If you encounter issues, consider re-processing specific video segments as detailed below.

Advanced: Re-processing Segments

Intermediate files are stored in a temporary directory (default: tmp_<input_file_name>). You can customize this location using the --dir.tmp flag.

To re-process a specific segment:

Navigate to the temporary directory.
Locate and delete the corresponding part_XXX.json file.
Re-run the script. It will automatically detect missing files and re-process only those segments.

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.github/workflows		.github/workflows
.vscode		.vscode
showcase/old		showcase/old
src/ai_sub		src/ai_sub
.aiexclude		.aiexclude
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Sub: AI-Powered Subtitle Generation with Translation

Overview

Showcase

How It Works

Installation

Usage

Option 1: Using Google AI Studio API Key

Option 2: Using Gemini CLI

Known Limitations

Advanced: Re-processing Segments

About

Uh oh!

Releases 27

Languages

License

FlippFuzz/ai-sub

Folders and files

Latest commit

History

Repository files navigation

AI Sub: AI-Powered Subtitle Generation with Translation

Overview

Showcase

How It Works

Installation

Usage

Option 1: Using Google AI Studio API Key

Option 2: Using Gemini CLI

Known Limitations

Advanced: Re-processing Segments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 27

Languages