🎙️ DeepDub: Automated AI Video Dubbing Pipeline

DeepDub is an automated pipeline that dubs video content into other languages while preserving the original speaker's voice. It uses a chain of state-of-the-art AI models to transcribe, translate (with cultural nuance), and clone voices locally.

🧠 The Architecture

The pipeline consists of four distinct modules:

The Ear (Transcription): Uses Faster-Whisper to extract audio and generate precise timestamps.
The Brain (Translation): Uses Llama 3.2 (via Ollama) with a custom system prompt to perform context-aware translation (e.g., understanding that "chop" in a kitchen means "cut," not "pork chop").
The Voice (Cloning): Uses Coqui XTTS v2 to clone the original speaker's timbre and generate speech in the target language (Spanish, Hindi, etc.).
The Editor (Assembly): Uses FFmpeg to surgically insert the new audio segments at the correct timestamps, mixing them with the original background noise.

🛠️ Tech Stack

Language: Python
Transcription: faster-whisper (OpenAI Whisper optimized)
Translation: ollama running llama3.2 (Local LLM)
Voice Cloning: TTS (Coqui XTTS v2)
Media Processing: ffmpeg-python

🚀 Installation

Clone the repository

git clone [https://github.com/yourusername/DeepDub.git](https://github.com/yourusername/DeepDub.git)
cd DeepDub

Install Dependencies (Note: Microsoft C++ Build Tools are required for Coqui TTS on Windows)
```
pip install -r requirements.txt
```
Install Local LLM Download Ollama and pull the lightweight model:
```
ollama pull llama3.2
```

🎬 How to Run

Place your video file in the root folder and rename it to input_video.mp4.
Step 1: Extract & Transcribe
```
python 1_transcribe.py
```
Step 2: Smart Translation
```
python 2_translate_llm.py
```
Step 3: Generate Voice Clones
```
python 3_clone.py
```
Step 4: Merge Video
```
python 4_merge.py
```
Done! Check final_dubbed_video.mp4 for the result.

🔮 Future Roadmap

Lip Sync: Implement Wav2Lip to match mouth movements to the new language.
Background Noise Separation: Use Spleeter to isolate voice from music for cleaner mixing.
GUI: Build a Streamlit interface for drag-and-drop dubbing.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
1_transcribe.py		1_transcribe.py
2_translate.py		2_translate.py
2_translate_llm.py		2_translate_llm.py
3_clone.py		3_clone.py
4_merge.py		4_merge.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ DeepDub: Automated AI Video Dubbing Pipeline

🧠 The Architecture

🛠️ Tech Stack

🚀 Installation

🎬 How to Run

🔮 Future Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Dhy4n-117/DeepDub

Folders and files

Latest commit

History

Repository files navigation

🎙️ DeepDub: Automated AI Video Dubbing Pipeline

🧠 The Architecture

🛠️ Tech Stack

🚀 Installation

🎬 How to Run

🔮 Future Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages