I am an AI Infrastructure Engineer focused on building scalable systems for large-scale model training and inference.
My work centers around GPU clusters, distributed training, and cloud-native AI platforms, helping accelerate the development and deployment of LLMs and AI systems.
β‘ Focus Areas
- Large-scale LLM Training Infrastructure
- GPU Cluster Scheduling
- Kubernetes Native AI Platforms
- Distributed Systems for AI workloads
Languages
AI / Cloud Infrastructure
- Kubernetes
- GPU Scheduling
- Distributed Training
- Cloud Native Infrastructure
I actively contribute to open source projects in AI infrastructure and cloud-native ecosystems.
- β‘ Volcano β Kubernetes batch scheduler for AI/HPC workloads
- β‘ Kubeflow β Machine learning platform on Kubernetes
- β‘ AReaL β Distributed RL infrastructure for large models
- β‘ OpenKruise β Advanced workload management for Kubernetes
- β‘ Skywalking Python Agent
- Distributed Systems
- LLM Infrastructure
- Reinforcement Learning Systems
- Cloud Native AI Platforms
π Basketball lover
π Enjoy writing technical articles
π‘ Think Twice, Code Once

