Skip to content

tobystic/audio-corpora-builder

 
 

Repository files navigation

audio corpora builder

Build large audio corpora in various languages → {Yorùbá, Urhobo, Edo, Èʋe, Igbo}

Audio Corpora

Curate specific language corpora from the wealth of audio available in good quality on YouTube The process is as follows:

  • Locate a list of existing playlists, e.g. OrisunTV Iroyin
  • Alternatively, create a new playlist with a custom set of YouTube videos
  • Update yoruba_sources.yml with the reference to the playlist
  • Execute $ python download_youtube.py --output ./audio/

Install dependencies

  • Python 3.5, 3.6 or 3.7
  • pip install -r requirements.txt

About

Build large audio corpora in various languages → {Yorùbá, Urhobo, Ẹ̀dó, Èʋe, Ị̀gbò}

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%