AI Infra Engineer focused on Large Language Model (LLM) Pre&Post-training , inference optimization. Currently developing expertise in all training tech stack, GPU kernel optimization (CUDA/Triton). Combines solid ML theory, PyTorch engineering, and system-level optimization to build scalable, high-efficiency AI solutions.
"Building a complete stack understanding from model math to GPU execution pipeline, and from training to inferencing"
| Domain | Skills / Tools |
|---|---|
| LLM Training | PyTorch, LLaMA-Factory,Verl, Megatron, Deepspeed |
| Inference Optimization | vLLM, CUDA, Tesor Core, Nsight Systems |
| GPU & Systems | CUDA C++, PTX profiling, memory hierarchy tuning, Cutlass , cuBLAS, cuDNN |
| Efficient Computing | Pruning, Quantization, Knowledge Distillation, Kernel Fusion |
| Math Foundations | Linear Algebra, Probability, Information Theory (KL, CE), Optimization |
