Built real-time streaming engine on Kubernetes with Python, Scala and Apache Kafka for internal Data Scientists, enabling real-time model scoring for a variety of models.
Designed and wrote Python SDK to abstract data streams, ingress and egress patterns, enabling users to focus on solely business logic and model training/scoring.
Created an automated onboarding pipeline for data scientists to register and deploy models, reducing overhead for both Data Scientists and Software Engineers.
Worked with Data Scientists on feature exploration, model training, ETL development and model monitoring through Python, Spark, Apache Airflow, K8s, EMR and Datadog.
Created Docker images enabling distributed Spark on Kubernetes, drastically reducing AWS Ec2 and EMR costs across departments.
Lead team of interns who deployed MLFlow into multiple Kubernetes environments and tested model training and registry, enabling MLflow use by Data Scientists.
Designed and developed multiple ETL batch jobs for a variety of Machine Learning use cases including fraud prevention and affiliate marketing optimization.
Mentored numerous new team members in onboarding, software design, etc..