Skip to content
View Baizx98's full-sized avatar

Highlights

  • Pro

Block or report Baizx98

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Baizx98/README.md

Baizx98 | 今夜白

Typing SVG

Website Email ResearchGate Research Focus

About Me

I am a Computer Science PhD student working on efficient large language model inference systems. My recent work sits at the intersection of KV Cache optimization, hierarchical memory management, and practical serving-system design.

I am especially interested in building methods that are not only effective on paper, but also honest under real system constraints such as memory fragmentation, bandwidth pressure, and latency-throughput tradeoffs.

Current Interests

  • 🧠 KV Cache pruning, compression, and offloading
  • 📏 long-context inference optimization
  • 🚀 vLLM-style serving systems and performance tuning
  • 📱 edge-side deployment for resource-constrained devices

Research Vibe

I like research that feels a bit like system detective work:

🔍 Find where the real bottleneck is hiding.
🧠 Figure out whether the cost comes from memory, movement, or scheduling.
⚙️ Turn that pain point into something measurable and optimizable.
📊 Test whether the idea still holds under realistic workloads.

Toolbox

Python C++ Linux vLLM Profiling Agents

Python for fast prototyping, C++ for systems work, Linux for getting close to the machine, and agent tools for making research workflows a little less manual.

Beyond Research

  • ✍️ I enjoy explaining system ideas as much as building them
  • 🚴 I spend time on badminton, cycling, photography, and sci-fi / mystery reading
  • ✨ I like projects that feel rigorous, useful, and a little elegant

GitHub Snapshot

GitHub Stats
Top Languages
GitHub Trophies

Motto

Optimize what matters. Keep the system honest.

Pinned Loading

  1. vllm vllm Public

    Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python