Bellevue, Washington, United States
• Architected core services for Truveta’s privacy-preserving Person Matching platform, enabling patient identity resolution across 30+ healthcare organizations and processing 200M+ records/day using distributed PySpark pipelines orchestrated via Netflix Conductor.
• Built the ML-enhanced identity matching pipeline, integrating deterministic token. matching with BERT-based embeddings and a distributed clustering workflow (split/merge resolution and stable ID assignment) to accurately link billions of patient records while significantly improving recall over rule-based approaches.
• Founded and open-sourced OpenToken, a cryptographically secure tokenization framework enabling deterministic PII-based record linkage across organizations without exposing raw identifiers, leveraging AES-256 encryption and SHA-256 hashing.
• Led cross-team development of automated partner onboarding workflows, reducing healthcare data ingestion and provisioning time from >1 week to <5 hours and accelerating expansion of the Truveta data network.
• Designed scalable distributed microservices for high-volume identity processing, building horizontally scalable Spark and Java services supporting sub-second lookup latency across billions of identity records.
• Established production reliability and release infrastructure, implementing observability (StatsD metrics, structured logging, tracing) and CI/CD pipelines in Azure DevOps with containerized deployments (Docker/Kubernetes) and FDA-compliant audit logging.