• Architected and operated 6 production Ray clusters (detections, headcount, occupancy, care, room motion,
visualizer) across multi-AZ AWS, supporting 100+ concurrent worker nodes and autoscaling from zero to 1,000
workers via Ray Autoscaler. Deployed Ray Serve for low-latency HTTP inference, running continuously in
production for 2+ years.
• Designed a distributed stream processing framework using Ray’s actor model to ingest NATS message
streams, implement priority-based queuing, and dynamically schedule distributed processors—automatically
scaling concurrent workers to handle data from thousands of devices.
• Built versioned Ray cluster deployment pipelines with templated configs, automated job submission, and
rollback support. Integrated full-stack observability using Prometheus, Ray’s metrics API, Grafana, and
CloudWatch with custom counters, gauges, and latency histograms.
• Developed models for detection, tracking, motion detection, pose estimation and successfully deploying them end-to-end for ~5000 sensors
• Achieved 92% accuracy in person detection and counting using OpenCV for low-resolution thermal data.