A Python tool that automatically renames MP3 files based on their speech content using AI-powered Speech-to-Text technology.
- Scans a directory for MP3 files
- Converts each MP3 to WAV format (temporarily)
- Uses OpenAI's Whisper model for high-quality transcription (with Google Speech API as fallback)
- Extracts the first sentence or a specified number of words from the transcription
- Renames the MP3 file based on this extracted text
- Cleans filenames to ensure they are valid
- Python 3.6 or later
- Required Python packages (install with
pip install -r requirements.txt):- openai-whisper (for state-of-the-art speech recognition)
- torch (required for Whisper)
- SpeechRecognition (fallback recognition)
- pydub (audio file manipulation)
- PyAudio (for audio processing)
- FFmpeg (for MP3 conversion, used by pydub)
-
Clone or download this repository
-
Install required packages:
pip install -r requirements.txt
-
Install FFmpeg (if not already installed):
- macOS:
brew install ffmpeg - Linux:
apt-get install ffmpeg - Windows: Download from ffmpeg.org
- macOS:
Run the script with the path to the directory containing MP3 files:
python mp3_renamer.py /path/to/mp3/folderpython mp3_renamer.py /path/to/mp3/folder [options]Available options:
-d, --duration SECONDS- Process only a specific duration (default: 10 seconds)-s, --start SECONDS- Start processing from this time (default: 0 seconds)-f, --first N- Use the first N words for the filename instead of a sentence-v, --verbose- Enable verbose output for debugging-e, --engine {whisper,google,both}- Choose the speech recognition engine (default: whisper)-m, --model {tiny,base,small,medium,large}- Select Whisper model size (default: base)
Use Whisper with a larger model for better accuracy:
python mp3_renamer.py /path/to/mp3/folder --model mediumProcess only the first 5 seconds of each file:
python mp3_renamer.py /path/to/mp3/folder --duration 5Use the first 10 words instead of trying to detect a sentence:
python mp3_renamer.py /path/to/mp3/folder --first 10- The script scans the specified directory for MP3 files
- For each MP3 file:
- Converts it to WAV format (needed for speech recognition)
- Uses OpenAI's Whisper model to transcribe the audio
- Extracts the first sentence or specified number of words
- Cleans the text to make it a valid filename
- Renames the MP3 file with the new name
- Removes the temporary WAV file
tiny: Fastest, lowest accuracy, minimal resource usagebase: Good balance of speed and accuracy (default)small: Better accuracy, moderate resource usagemedium: High accuracy, higher resource usagelarge: Best accuracy, significant resource usage
If you encounter issues:
- Try running with
--verboseto see detailed logs - Use
--model smallor--model mediumfor better transcription - Adjust the
--startand--durationparameters to capture the correct part of audio - Use
--first Nto bypass sentence detection if it's not working well