AI Platform + LLM Infrastructure
• Shipped 3 production LLM agents (context aware chat, account research agent, SDR coaching agent) using RAG, tools, safety layers, and session/state management. Currently, all 3 are core product differentiators and highlighted in 2025 sales cycles.
• Own the company wide LLM proxy (LiteLLM), and observability tool (Langfuse) including authentication, cost tracking, observability, and alerting that caught and prevented multiple 5-figure cost spikes.
• Implemented agent evaluation framework (LLM as judge, synthetic data regression tests, and golden dataset testing). Now required steps for major agent updates.
ML Ops
• Owned end-to-end ML operations: Delta Lake + Spark data platform, BERT/tree-model training pipelines, MLflow registry, feature store patterns, and batch/real-time inference on Kubernetes.
• Automated dataset creation, model retraining, drift monitoring, and deployment workflows using Spark, Databricks, MLflow, and Kubernetes.
Distributed and Event Driven Systems
• Scaled Kafka pipelines for millions of emails/day, powering sentiment analysis, signature extraction, and OOO detection.
• Designed high-throughput, event-driven services processing tens of millions of events/day for real-time deal + buyer models.
• Enforced strict multi-tenancy across all data + ML systems
Cross-team & Org Leadership
• Uniquely bridging AI research, data engineering, backend, platform, unblocking cross-team initiatives.
• Drove architecture for AI features used by the entire company.
• Mentored engineers on Spark, distributed systems, and ML workflows, and Agents.