- London
- @maxilevi__
- https://maxilevi.com
Highlights
- Pro
Stars
OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation [SIGGRAPH 2025]
CADAM is the open source text-to-CAD web application
A unified inference and post-training framework for accelerated video generation.
[ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Clouds
Official implementation of Inductive Moment Matching
Here you'll find a growing collection of 3D models, textures, and images from inside NASA.
Official code for paper: Scaling Mesh Generation via Compressive Tokenization [CVPR'25]
A generative world for general-purpose robotics & embodied AI learning.
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
a cpp ggml port of "VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech." for use in mobile devices. this is my undergraduate project
Zero-Shot Speech Editing and Text-to-Speech in the Wild
lightweight, standalone C++ inference engine for Google's Gemma models.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
An Open Source text-to-speech system built by inverting Whisper.
Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.
A library to inspect and extract intermediate layers of PyTorch models.
CjangCjengh / vits
Forked from jaywalnut310/vitsVITS implementation of Japanese, Chinese, Korean, Sanskrit and Thai
Turn expensive prompts into cheap fine-tuned models
Ads97 / WhatsApp-Llama
Forked from meta-llama/llama-cookbookFinetune a LLM to speak like you based on your WhatsApp Conversations
Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2
Local voice recording for creating Piper datasets
An extensible, easy-to-use, and portable diffusion web UI 👨🎨
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
SoftVC VITS Singing Voice Conversion





