• Built production LLM pipeline transforming therapy transcripts into clinical documentation through multi-stage generation—entity extraction, contextual retrieval, structured synthesis—reducing clinician documentation time by 90%.
• Engineered RAG system combining vector search and property graphs to ground LLM outputs in patient history, assessments, and EHR data, reducing hallucinations in safety-critical healthcare applications.
• Designed multi-provider inference infrastructure with intelligent routing across OpenAI, Fireworks, and Together AI, implementing streaming, dynamic fallbacks, and cost-aware distribution for thousands of daily requests.
• Developed prompt engineering framework with self-evaluation loops where models iteratively refine outputs against clinical accuracy and compliance criteria, reducing human review time by 60%.
• Built ML API service for NLP preprocessing (transcript segmentation, NER, sentiment analysis, behavioral markers) feeding downstream clinical analytics and knowledge graph pipelines.
Implemented structured generation with Zod/Pydantic validators to constrain LLM outputs to strict clinical schemas, ensuring type safety and seamless EHR integration.
• Created observability infrastructure tracking token usage, latency, and quality metrics across prompt templates, enabling data-driven optimization and A/B testing of generation strategies.
• Architected asynchronous processing with Redis-backed queues for compute-intensive AI workloads, plus secure HIPAA-compliant APIs with JWT auth across several endpoints.