Experience
2022 — Now
San Francisco Bay Area
As the first ML Engineer, I co-founded the modern machine learning efforts at Coinbase, acting as the team's tech lead for the last three years over a 40+ team. I had a role in building, designing, or advising every ML model running in production currently at Coinbase, as well as the technical and strategic direction. Highlights:
* Landed double-digit increases in site-wide net revenue and engagement across the Feed, Growth & Notifications, and user spend limits; strategic independence of 3rd-party vendors and a massive reduction in fraud for the ATO & Risk domain; and improving support agent response time.
* Created Auto-ML-like lego-blocks around tabular data, sequence & text data (using Transformers - the T in ChatGPT - from scratch & deep learning), and support for blending these models. Machine learning engineers and data scientists across the company use these lego-blocks to deliver impact.
* Set long-term technical direction for the team, owned large parts of the interview process, published a blog post around our stack, contributed to a KDD presentation, and ran a knowledge-sharing group and boot camp.
* Proposed a Responsible AI framework around model cards inspired by academia, which is used by new models.
* Working on the first ML + Crypto use cases.
2018 — 2022
2018 — 2022
San Francisco
2016 — 2018
2016 — 2018
San Francisco Bay Area
Worked on applying NLP on Slack chat logs - expert finding, topic modeling, text classification. Improved search quality. Later on, built full-stack features such as web search and external note sharing.
Machine learning
===============
Expert finding:
• Tf-idf based model for recommending experts based on a query string
Topic modeling:
• Pipeline for cleaning text data, parsing out stop-words and first names, keeping only nouns
• Langid-based classifier for keeping only english-speaking teams
• Clustered the cleaned-up text messages using LDA in 50 clusters
* Visualized results using pyLDAvis
Text classification:
• Collected a labeled dataset for classifying messages into design/non-design
• Built additional filtering step for design-messages using a vocabulary from design-related Wikipedia pages
• Used fastText as an initial baseline (both with pretrained Glove word vectors and with training the word vectors). Got 0.83 f1-score
• Reproduced colleagues Bidirectional LSTM classifier using Keras. Got 0.86 f1-score
Technologies used: jupyter, pandas, fasttext (text classification), gensim (lda), keras (bidirectional LSTM)
Search quality
============
• Collected gold dataset of queries and expected positives for those queries
• Built search quality eval tool
• Iterated on the ElasticSearch model
• Improvement top3 recall from an initial 30% to 90% for that gold dataset
* Had follow-up call with an ElasticSearch outside expert consultant
Full-stack
========
• Web-search
• External note sharing (via public link)
• Invite flow
• Note Reactions (similar to the Facebook ones)
Technologies used: Python, React, CSS, HTML
2015 — 2016
2015 — 2016
Bucharest
* In charge of building the big data system in charge of making sense of terabytes of diverse information
* Worked on building a one box type of search system for the various data collected by BitDefender, using Cassandra, Spark and Django.
* Aim of the system was two-fold: easily search through terabytes of data and automatically generate forensic leads for the analysts to investigate.
2011 — 2013
2011 — 2013
College Park
* TA for Algorithms & Data Structures, Complexity Theory and Randomized Algorithms
* Held weekly TA hours for explaining difficult course material, at both undergrad and grad level
* Responsibilities also included grading homework
Education
University of Maryland
Research Master
2011 — 2013
University of Bucharest
Masters
2009 — 2011
University of Bucharest
Bachelor
2006 — 2009