A small, end-to-end project that finetunes a Hugging Face Wav2Vec2 model on the classic GTZAN dataset to classify music genres.
- Self-supervised Wav2Vec2 finetuned for 10 GTZAN genres
- Best validation accuracy: 87%
- Training with
transformers.Trainer(HF) - Gradio web UI to upload an audio file and get predictions
from transformers import pipeline
model_id = "hangnguyen25/wav2vec2-base-finetuned-gtzan"
pipe = pipeline(
"audio-classification",
model=model_id
)
preds = pipe("path/to/your/audio.wav")
print(preds)- Dataset: GTZAN
- Libraries: Hugging Face transformers, datasets, evaluate, accelerate; librosa; gradio; PyTorch