A professional-grade voice cloning and text-to-speech application built with Python using Coqui TTS XTTS v2. Clone your voice and generate natural-sounding speech in multiple languages.
- Voice Cloning: Record or upload your voice and clone it instantly
- Multi-language Support: Auto-detects and supports 13+ languages
- Real-time Generation: Generate speech in your cloned voice in seconds
- Quality Control: Audio normalization, pitch, and volume adjustments
- Pitch Adjustment: Fine-tune voice pitch (0.5x to 2.0x)
- Volume Control: Adjust output volume (0.1x to 3.0x)
- Gender Modulation: Toggle between natural/female voice characteristics
- Device Selection: Choose specific input/output audio devices
- Dark/Light Themes: CustomTkinter-based modern interface
- Real-time Monitoring: Live audio level and duration display
- Hardware Detection: Automatic GPU/CPU detection and optimization
- Model Selection: Switch between different TTS engines
- GPU Acceleration: CUDA support for NVIDIA GPUs (GTX 1650+ optimized)
- Low VRAM Mode: Automatic optimization for 4GB VRAM systems
- Batch Processing: Handles long texts with automatic chunking
- Audio Formats: Supports WAV, MP3, FLAC, M4A
- CPU: Intel i5-3470 (4C/8T) | 3.2GHz
- RAM: 16.0GB | Used: 8.2GB (51.2%)
- GPU: NVIDIA GeForce GTX 1650 | 4.0GB VRAM | CUDA 13.1
- OS: Windows 10/11
- Python 3.10 (64-bit)
- NVIDIA GPU (GTX 1650 or better recommended)
- NVIDIA Drivers (latest)
- CUDA Toolkit 12.1 (optional, for GPU acceleration)
git clone https://github.com/SyntX34/Python-Voice-TTS.gitcd Python-Voice-TTSpython -m venv voice_tts.\voice_tts\Scripts\activatepip install -r requirements.txtpython app.py