Summer-Summer

Follow

😆

Haojun Xia Summer-Summer

😆

Follow

PhD Candidate, LLM Researcher, Machine Learning System, GPU, FPGA

63 followers · 36 following

University of Sydney
Sydney NSW, Australia
https://summer-summer.github.io/

Achievements

Achievements

Organizations

Pinned Loading

Kitty Kitty Public

Algorithm-System Co-design: accurate and efficient 2-bit KV cache quantization for LLM Inference..

Python 4 2
usyd-fsalab/fp6_llm usyd-fsalab/fp6_llm Public

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 276 22
AlibabaResearch/flash-llm AlibabaResearch/flash-llm Public

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 231 22
SpInfer SpInfer Public

Forked from xxyux/SpInfer

SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs

Cuda
ComputerArchitectureLab ComputerArchitectureLab Public

This repository is used to release the experimental assignments of Computer Architecture Course from USTC

Verilog 39 14