Skip to content
View MrudhuhasM's full-sized avatar
πŸ“š
πŸ“š

Block or report MrudhuhasM

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
MrudhuhasM/README.md

Hi, I'm Mrudhuhas πŸ‘‹

ML Engineer | Training β†’ Inference β†’ Production

8+ years building ML systems end-to-end: from training custom models to optimizing inference at the GPU kernel level.

What I Do

πŸ”₯ Inference Optimization β€” Custom Triton kernels (Flash Attention, LayerNorm, GELU) achieving 1.85x speedup
πŸ€– LLM Applications β€” Production RAG/Agents with LangGraph, 60-70% cost reduction
🧠 Model Training β€” Fine-tuning transformers (BERT, T5, Llama) with LoRA/QLoRA
☁️ Production β€” GCP/AWS deployments serving millions of requests

Featured Projects

Triton-GPT2 β€” GPU Kernel Development

GPT-2 inference engine with custom Triton kernels. 275 TPS vs 149 TPS HuggingFace (1.85x speedup)

  • Fused Flash Attention matching PyTorch SDPA
  • Custom LayerNorm, GELU, Softmax kernels
  • KV-cache for autoregressive decoding

Meditations RAG β€” Production LLM Application

Agentic RAG with LangGraph: Controller β†’ Retriever β†’ Generator β†’ Evaluator loop
Live Demo | Sub-500ms latency | 500+ RPS

Tech Stack

GPU/Inference: Triton, CUDA, vLLM, TGI, Quantization (GPTQ, AWQ)
LLM Apps: LangGraph, LangChain, LlamaIndex, RAG
Training: PyTorch, LoRA/QLoRA, Mixed Precision, DeepSpeed
Production: GCP, Docker, Kubernetes, FastAPI, Redis


πŸ“« Open to remote opportunities (contract or full-time) | Flexible on US/EU timezones

LinkedIn β€’ Email

Pinned Loading

  1. meditations-rag meditations-rag Public

    A production Rag Application

    Python

  2. gpt2-engine gpt2-engine Public

    Python

  3. PMPP PMPP Public

    This repo contains all the code from the time i started reading PMPP

    Cuda

  4. triton-code triton-code Public

    Python