Experience
2024 — Now
2024 — Now
New York, United States
Equities Engineering
2018 — 2023
2018 — 2023
San Francisco, California, United States
• Built and iterated an online anti-scraping machine learning model, and integrated this auto-retrainable model with Airbnb’s online trust evaluation system with tiered intervention methods to provide a robust and trustworthy anti-bot protection on Airbnb platforms, with a 99% precision and 80% recall, reduced listing scrapings from 99% to 1% and overall bot traffic volume from 140M/day to 30M/day without a negative booking impact, saved $2.5M infrastructure costs
• Developed a content moderation proxy service with built-in neural network model to filter user-generated text content including spams, host recruitments, insults and etc., with a 1.00 precision and 0.81 recall for binary classifier and 0.72 precision and 0.81 recall for multi-classifier, saved $220,000 yearly in third party’s service costs
• Collaborated on a team of 12 working on the most challenging program "China Data Localization" in order to close the compliance gap for China Cyber Security Law and Chinese Internet Content Provider obligations to ensure Airbnb's business continuity in China, localizing storage by replicating Aurora/Vitess database from the US region to CN region, and building the cloud infrastructure to localize EC2/Kubernetes services in China
• Designed and developed a performant real time log ingestion pipeline (8k QPS) integrated with Kibana using AWS S3, Lambda, Kinesis, Elasticsearch, and Logstash, building monitors and alerts to guarantee system’s availability and efficiency, and composing diagnosis runbook
• Built a scalable streaming pipeline using Kinesis, Spark streaming to ingest data to Hive, designing data model and applying offline analysis to data in data warehouse to identify data inconsistencies, fill the gaps and assist to solve problems such as A/B testing experiment assignment imbalance, and user drop-off
2016 — 2018
2016 — 2018
Greater Boston Area
• Developed an efficient distributed data ingestion system building upon HBase and Kafka to replicate data, utilizing DB2 locking mechanism to achieve the replication concurrency control, developing unit tests based on HBase Mini-Cluster and Derby and smoke tests in Ruby
• Build a user-friendly large-scale enterprise cloud web application by designing database schemas and indexes for HBase and DB2 to support efficient health record write and search functions, developing an FVT platform in Java
• Develop a real-time data pipeline by designing the end-to-end data flow, using server-side data streaming API, Kafka, Spark Streaming, and IBM Cloud Object Storage
• Optimized and automated a cloud service deployment procedure in IBM UrbanCode Deploy (UCD), reducing testers’ manual operation time from two hours to ten minutes
• Led an innovative healthcare project – Fast Healthcare Interoperability Resources (FHIR) data pipeline, which gathers, serializes, and curates healthcare information based on Java, designing the data transformation models, developing and testing the pipeline successfully, and preparing deployment documentations
• Collaborated with team to launch IBM Watson Health’s first cloud platform, IBM® Watson™ Platform for Health, developing a near-real time data transformation for the ETL data pipeline in Java
• Coordinate and lead design meetings, working sessions, code reviews and knowledge-share sessions with teams in NYC, Canada, Japan and Brazil; manage daily scrum as a scrum master
2015 — 2015
Greater New York City Area
• Course: CSOR 4231 Analysis of Algorithms I lectured by Professor Clifford Stein.
Education
Columbia University
Master's Degree
2014 — 2015
Xi'an Jiaotong University
Bachelor's Degree
2010 — 2014
The high school affiliated to Xi'an Jiaotong University
The high school affiliated to Xi'an Jiaotong University