Stars
slime is an LLM post-training framework for RL Scaling.
Awesome LLM compression research papers and tools.
QLoRA: Efficient Finetuning of Quantized LLMs
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
