Cambridge, Massachusetts, United States
• Led team of 5 to build and deploy 2 end-to-end CI/CD ML pipelines for a new client: from ingesting live data from 4+ text sources via Kafka to automating financial risk classification
• Collected requirements from client and organized into discrete pieces of work to delegate
• Performed EDA independently on text, free-text, and numerical input data associated with 2 of these sources and designed transformations like fuzzy matching, keyword search
• Built, optimized, and validated 3 of the 8 decision tree classifiers through several iterations
• Composed weekly reports on the health of the production system, deployed optimizations
• Supported the team’s completion of the rest of the pipeline: integrated with vendors through Kafka, performed EDA, designed and built 50+ table SQL database, designed and built 5 classifiers, designed and built automated testing framework
• Performed technical advising, implemented versioning system and code reviews
• Supported the creation of a question vs. answer classifier for a 2nd client including tokenization, TF-IDF, and decision tree classifier achieving > 96% accuracy