Experienced Graduate Teaching Assistant with a demonstrated history of working in the higher education industry. Skilled in ICEM CFD, Numerical Simulation, Microsoft Word, Computer-Aided Design (CAD), and Paraview, Python, Matlab, and Machine Learning.
Experience
2022 — Now
2022 — Now
San Francisco Bay Area
Core contributor to OneTrust's market-leading Data Discovery classification engine, building large-scale systems that power enterprise data discovery and compliance workflows.
Led the implementation of Classification V2 using Java and Rust, significantly improving entity detection accuracy, throughput, and system robustness in production.Designed and productionized high-accuracy classification models through deep research, large-scale experimentation, and continuous validation.
Selected to join OneTrust Labs, a CIO-sponsored AI Innovation team, to deliver the company's first
LLM-powered AI agents for PIA and TPRM products.Played a key role in upgrading and extending the Assessment Auto-Completion RAG pipeline, substantially improving accuracy, reducing manual effort, and expanding the range and complexity of questions the AI agents can reliably answer. Evaluated and prototyped advanced retrieval and reasoning approaches, including early Graph-based RAG designs, to inform long-term AI platform strategy.
Currently transitioning to the AI Platform team, contributing to the design of a foundational AI agent platform intended to support scalable, secure, and extensible AI-driven products across the company.
Named inventor on two pending patents related to data discovery and classification technologies:
US20240070319A1 - Methods and Systems for Detecting Entities Using Custom Classification Techniques
US20250061154A1 - Generating probabilistic data structures for lookup tables in computer memory for multi-token searching
2022 — 2022
2022 — 2022
San Francisco Bay Area
Work with PB level data and discovery meaningful terms
• --------------------------------
Maintain library and micro-services for Data Discovery core engine
Design and implement many new and cool features to extend product functionality
Maintain and develop high performance NLP models and services
Keep enhancing the accuracy and speed of our product
Continuously contribute to product support needs to make customers being happy
Onboard and mentor new team members
2021 — 2022
2021 — 2022
San Francisco Bay Area
Maintain and enhance our infrastructure of file ingestion system on AWS.
Design and develop API endpoints monitors to provide abilities of alarming APIs' failures at runtime.
Enhance and automate code testing strategies on Jenkins,
Design and develop new data-pipeline with AWS, DynamoDB, Data brick Spark and so on.
Design and develop RESTful APIs.
Refactor codes to decouple complexity and improve readability and reusability.
2020 — 2021
2020 — 2021
Sandy Springs, Georgia, United States
Worked for Data Discovery (Python project) as scanner team :
1. Develop and maintain data source connector micro-services by Flask.
2. Optimized the scanner to be able to handle large scale problem by using the coroutine, multi-thread and multi-processing.
3. Research, design and develop the OCR.
4. Designed load balancer for Kafka payload distributor to optimize the Kafka performance.
• Worked for Integration Team. Seed and manage workflows for internal and external data source.
• Working as terminator for product supporting. Diagnose technical issues for Kafka, RESTful API, DBs and so on.
• Working for new Data Discovery Project (Integri's acquisition) as classification core team:
1. Develop , optimize and maintain Classification Core library in Java to allow classification happens for TBs data in short time.
2. Work with data science team to develop and deploy ML and deterministic models, in both Java and Python, to enhance our classification coverage.
3. Develop and maintain Restful APIs for multiple micro-services, with frameworks both Spring-boot(Java) and Flask(Python), to support operation of data discovery.
4. Design data models and data pipelines for new features.
6. Onboarding new team members.
5. Product support as technical export (sometimes I even perform as data analyst).
2019 — 2020
Atlanta
• Optimization methods: linear regression and logistic regression using gradient descent and Newton’s Method
• Parametric approaches: EM algorithm, GMM, HMM, linear model and generalized linear models, model selection and cross validation
• Nonparametric approaches: PCA, Splines and approximation of functions, Bootstrap, Monte Carlo methods
Education
Georgia Institute of Technology
Master's degree
2018 — 2020
Georgia Institute of Technology
Master's degree
2016 — 2018
Georgia Institute of Technology
Bachelor's degree
2011 — 2015