Charlotte, North Carolina, United States
• Designed and shipped Spotify's centralized safety backend service for human-agent interactions, hosting ML models via vLLM with high-throughput, low-latency serving across Spotify's agentic experiences
• Built catalog-scale AI inference data pipelines processing Spotify's full podcast and audiobook catalog, integrating ML inference as a core pipeline step for automated safety classification at production scale
• Designed and built the multi-modal video inference pipeline supporting Spotify's video podcast catalog end-to-end — from ingestion through model serving to downstream consumers
• Optimized model hosting infrastructure for cost, throughput, and latency — applying batching, GPU utilization tuning, and serving-layer improvements to vLLM-based deployments
• Owned backend service design, API contracts, observability, and reliability for AI-powered systems serving production traffic
• Partnered with ML, Trust & Safety, and Platform teams to ship reliable AI inference systems integrated into Spotify's broader service architecture
• Tech: Java, Python, vLLM, model serving, distributed systems, data pipelines, GCP, Docker, Kubernetes, microservices