Highlights
- Pro
Stars
Use API to call the music generation AI of suno.ai, and easily integrate it into agents like GPTs.
Press shortcut → speak → get text. Free and open source. More local-first apps soon ❤️
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
Unofficial implementation of Image Super-Resolution via Iterative Refinement by Pytorch
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Python class that generates pixel art from images
The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.
Model parallel transformers in JAX and Haiku
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Project Page of 'GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction' [CVPR2019]
Boring avatars is an open source React library that generates custom, SVG-based avatars from any username and color palette.
A German pronunciation dictionary modeled after CMUdict with transcriptions based on the ARPAbet symbol set.
Avatars for Zoom, Skype and other video-conferencing apps.
Reference code for "Motion-supervised Co-Part Segmentation" paper
shawwn / gpt-2
Forked from nshepperd/gpt-2Code for the paper "Language Models are Unsupervised Multitask Learners"
A tiny (~650 B) & modern library for keybindings.
Recurrent neural network for audio noise reduction
Neural network-based singing voice synthesis library for research
A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
TensorFlow CNN for fast style transfer ⚡🖥🎨🖼
spring-media / ForwardTacotron
Forked from fatchord/WaveRNN⏩ Generating speech in a single forward pass without any attention!
Tensorflow implementation of Learning-based Video Motion Magnification
Collection of Docker images with headless VNC environments



