Mrudhuhas MrudhuhasM

Hi, I'm Mrudhuhas 👋

ML Engineer | Training → Inference → Production

8+ years building ML systems end-to-end: from training custom models to optimizing inference at the GPU kernel level.

What I Do

🔥 Inference Optimization — Custom Triton kernels (Flash Attention, LayerNorm, GELU) achieving 1.85x speedup
🤖 LLM Applications — Production RAG/Agents with LangGraph, 60-70% cost reduction
🧠 Model Training — Fine-tuning transformers (BERT, T5, Llama) with LoRA/QLoRA
☁️ Production — GCP/AWS deployments serving millions of requests

Featured Projects

Triton-GPT2 — GPU Kernel Development

GPT-2 inference engine with custom Triton kernels. 275 TPS vs 149 TPS HuggingFace (1.85x speedup)

Fused Flash Attention matching PyTorch SDPA
Custom LayerNorm, GELU, Softmax kernels
KV-cache for autoregressive decoding

Meditations RAG — Production LLM Application

Agentic RAG with LangGraph: Controller → Retriever → Generator → Evaluator loop
Live Demo | Sub-500ms latency | 500+ RPS

Tech Stack

GPU/Inference: Triton, CUDA, vLLM, TGI, Quantization (GPTQ, AWQ)
LLM Apps: LangGraph, LangChain, LlamaIndex, RAG
Training: PyTorch, LoRA/QLoRA, Mixed Precision, DeepSpeed
Production: GCP, Docker, Kubernetes, FastAPI, Redis

📫 Open to remote opportunities (contract or full-time) | Flexible on US/EU timezones

LinkedIn • Email

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mrudhuhas MrudhuhasM

Achievements