-
OpenAI
- San Francisco
- efrantar.github.io
- @elias_frantar
Stars
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Solve Rubik's Cube in less than 19 moves on average with Python.
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".
A generic rubiks cube solver
The world's fastest Lego Rubik's Cube solving robot, averaging 1 second flat.
Use python3 to program your LEGO EV3. Communicate via Bluetooth, WiFi or USB. Send direct commands.
Code for ICML 2022 paper "SPDY: Accurate Pruning with Speedup Guarantees"
Efficient reference implementations of the static & dynamic M-FAC algorithms (for pruning and optimization)
The first ever Lego robot to solve a random Rubik's Cube in under 1 second.

