AI Sub is a command-line tool that leverages Google's Gemini models to generate high-quality, audio-synchronized subtitles. It is designed to produce precise English and Japanese subtitles by analyzing both audio and visual cues.
Key Features:
- Multimodal Understanding: Utilizes video frames for context (e.g., identifying speakers, reading on-screen text) and audio for precise timing.
- Dual-Language Support: Generates verbatim transcriptions and translations for English and Japanese.
- Automatic Segmentation: Automatically splits long videos into smaller segments for efficient processing.
Here's an example of subtitles generated by AI Sub:
For more examples, please visit ai-sub-showcase.
- Preprocessing: The input video is segmented into smaller chunks to fit within API context windows and file size limits.
- AI Processing: Each segment is sent to Google Gemini. The AI analyzes the audio for speech and the video for context, following strict prompting rules to generate subtitles.
- Compilation: Generated subtitles from all segments are merged into a final, chronologically sorted SRT file.
Prerequisites: Python 3.10 or higher.
-
Set up a Python virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate.bat`
-
Install AI Sub:
pip install --upgrade ai-sub
You can use AI Sub with either a Google AI Studio API Key or the Gemini CLI.
-
Obtain your API Key:
- Sign in to Google AI Studio.
- Click "Create API Key".
- Copy and securely store your key. Never disclose your API key publicly.
-
Run the application:
ai-sub --ai.google.key YOUR_API_KEY --ai.model=google-gla:gemini-3-flash-preview "path/to/your/video.mp4"Note: Replace
YOUR_API_KEYwith your actual key and"path/to/your/video.mp4"with the video file path.
-
Install and Authenticate Gemini CLI:
- Install:
npm install -g @google/gemini-cli - Authenticate: Follow instructions at gemini-cli.
- Install:
-
Run the application:
ai-sub --ai.model=gemini-cli:gemini-3-pro-preview --split.re-encode.enabled=True --thread.subtitles=1 "path/to/your/video.mp4"Important Notes for CLI Mode:
- No API key is required; the tool uses your authenticated Gemini CLI instance.
- Additional arguments are required to split and re-encode the video because the Gemini CLI has a 20MB upload limit per chunk.
- Re-encoding is resource-intensive and will increase processing time.
- Timestamp Accuracy: Subtitle timestamps may occasionally be inaccurate. This is an inherent characteristic of the Gemini AI model. Shorter video segments generally yield better accuracy.
- AI Hallucinations: Like all LLMs, Gemini may occasionally produce "hallucinations" or inaccurate information.
If you encounter issues, consider re-processing specific video segments as detailed below.
Intermediate files are stored in a temporary directory (default: tmp_<input_file_name>). You can customize this location using the --dir.tmp flag.
To re-process a specific segment:
- Navigate to the temporary directory.
- Locate and delete the corresponding
part_XXX.jsonfile. - Re-run the script. It will automatically detect missing files and re-process only those segments.
