"Outcomes over demos. Architecture over hype."
I bridge the gap between fragile AI research demos and resilient enterprise systems. I don't just write prompts; I engineer stateful, observable, and governed architectures that replace manual operational toil with deterministic reliability.
My systems are built for:
- Predictability: End-to-end type safety (Pydantic/TypeScript) and binary acceptance tests.
- Governance: Strict "Human-in-the-Loop" (HITL) gates, RBAC, and audit trails.
- Observability: If it isn't traced in Langfuse or Temporal UI, it doesn't exist.
Autonomous Accounts Payable agent replacing manual invoice processing end-to-end.
The Problem: Finance teams drown in manual invoice reconciliation β 15β30 min per invoice, 5β10% error rate, 3β7 day approval bottleneck. The Solution: A "Trust Battery" architecture with Azure OCR + LangGraph state machine that autonomously approves low-risk invoices and escalates anomalies.
- Architecture:
PDF UploadβAzure Document Intelligence (OCR)βLangGraph AP Workflow (11 nodes)βTrust Battery DecisionβQuickBooks MCP + HubSpot MCP SyncβImmutable Audit Ledger - Key Innovation: Trust Battery Logic β 4-level vendor trust (PROBATION β STANDARD β CORE β STRATEGIC) with dynamic auto-approval thresholds ($500 β $50k). Bank detail changes and PO mismatches auto-route to HITL review tasks.
- Production Hardening: Idempotent MCP tool calls via
Request-Idheaders, SHA-256 cryptographic audit receipts, SOC 2 data minimization (store hashes, not PDFs), Azure Key Vault secret management, L1/L2/L3 LLM cache (90% call reduction). - Metrics: 99% OCR accuracy | 60β80% auto-approval rate | 97% cost reduction ($15β30 β $0.50/invoice) | 83 tests passing | $0/month for 12 months (Azure free tier)
Governance-first internal operations system for Seed to Series A startups replacing back-office fragmentation with 13 specialized AI employees.
The Problem: Early-stage startups drown in operational chaos, juggling 15 disconnected tools. Founders waste 15β20 hours/week on back-office tasks, delaying product roadmaps by ~3 months per year. The Solution: A virtual office with a Chief of Staff orchestrating 13 specialized AI employees across 6 desks (Finance, People, Legal, Intelligence, IT, Admin). Everything requiring human judgment is prepared perfectly and presented in 30 seconds.
- Architecture:
Telegram BotβTier 1: Chief of Staff AgentβTier 2: 6 Desks (13 Virtual Employees)βTier 0: BusinessOS (Go + Temporal + Graphiti)βTier 3: Data Layer (Qdrant + Neo4j). - Key Innovation: The Self-Correcting Memory System β Sarthi learns company-specific context over time. Agent acts β Founder confirms β Memory updated (Qdrant + Neo4j) β Future auto-categorized with context drift detection.
- Production Hardening: Strict HITL (Human-in-the-Loop) gates enforced by Temporal, deterministic state management, and an explicit boundary (Zero external-facing work like RevOps or Customer Success).
- Metrics: $0/month infrastructure cost for MVP | Replaces βΉ2LββΉ3.75L/month in fractional admin costs | 20xβ50x ROI | 125 tests passing (Targeting 189 tests for v4.2.0).
Production-grade AI-native e-commerce CX platform where the agent IS the interface β zero page navigation, zero forms, all conversation.
The Problem: Chatbots are dumb text boxes that can't "do" anything β users still navigate pages, fill forms, and wait for human support agents. The Solution: A Generative UI agent that renders dynamic React components (ProductGrid, CartCanvas, OrderTimeline, ActionConfirm) directly inside the chat stream, powered by a LangGraph supervisor routing 14 intent types.
- Architecture:
Next.js 15 GenUI CanvasβHono + Bun (GraphQL Yoga / MCP endpoints)βFastAPI + LangGraph (ShopperAgent / SupportAgent)βPostgreSQL 16 + pgvector (Hybrid FTS + Vector Search)βAzure AI Foundry (gpt-4o-mini) - Key Innovation: Agent-First Commerce β Every user action is a conversation turn. LangGraph supervisor with typed state, Redis checkpointing, circuit breaker for resilience, and Human-in-the-Loop for critical actions (checkout, refunds). RAGAS + LLM-as-Judge scoring via Langfuse.
- Observability: 100% of agent turns traced in Langfuse with per-span latency (classify, tools, generate), faithfulness scores, and correlation IDs on every tool call.
- Metrics: 307 tests passing (100% pass rate) | P95 agent turn latency < 500ms | Task completion target > 95% | Cart recovery > 15% vs 10% industry avg | Merchant time saved > 2hr/day


