Stars
source codes based on PyTorch to analyze EHR
Paper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)
Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.
A collection of papers on diffusion models for 3D generation.
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
Instant voice cloning by MIT and MyShell. Audio foundation model.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
✨✨Latest Advances on Multimodal Large Language Models
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
The world's largest GitHub Repository for LLMs + Robotics
A comprehensive list of Implicit Representations and NeRF papers relating to Robotics/RL domain, including papers, codes, and related websites
TidyBot: Personalized Robot Assistance with Large Language Models
Implementation of "PaLM-E: An Embodied Multimodal Language Model"
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"
Source code and demo for INTERPSEECH 2023 paper: DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model
SoftVC VITS Singing Voice Conversion
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Source Code for "Adaptive Transfer Learning with Deep CNN for EEG Motor Imagery Classification".
Any-to-any voice conversion by end-to-end extracting and fusing fine-grained voice fragments with attention