I will graduate in May 2020. I am interested in building robust large-scale distributed systems and low-level systems (database/kernel/hypervisor/compiler).

Experience

DatabricksSoftware Engineer

2020 — 2023

Mountain View, California, United States

Designed and implemented a static analysis pipeline for nightly and release builds, used for tracking API breaking changes and dependency upgrades, checking symbol conflicts and generating dependency lists in release notes

Built a new pipeline to synchronize internal Spark fork with the company's monorepo, reduced the latency to integrate new code changes from days to less than 3 hours. Improved monitoring by adding dashboards and alerts for out of sync

Developed and deployed the pipeline for updating aarch64 images, enables the product on ARM-based instance

Coordinated the important dependency update from Hadoop 2 to Hadoop 3 during Databricks Runtime 9 to 10 major release

Contributed several user experience improvements to Spark History Server, a debugging tool for Spark jobs

Contributed to several optimization passes in the query compiler, like common subexpression elimination

Removed all old log4j dependencies and replaced them with reload4j project, helped mitigate log4j vulnerabilities

FacebookPerformance and Capacity Engineer, Intern

2019 — 2019

Menlo Park, CA

Built a new pipeline to annotate network traffic metrics with service and hardware information, and boosted query speed by 2x through forcing Presto to use broadcast join in physical plan

Designed and developed a tool in Python to search for adjustments of service capacity placement among data centers which can reduce cross-datacenter network traffic, and proposed several suggestions which each can reduce the cross-datacenter network traffic by 12%(several Tb/s) with the help of that tool

Face++Software Development Intern

2018 — 2018

Beijing City, China

Designed and implemented a distributed storage system for large-scale computer vision datasets with Go

Improved the service monitoring system by adding more detailed metrics about usage patterns for better schedule policy

MicrosoftSoftware Development Engineer Intern

2017 — 2017

Shanghai Suburb, China

Boosted VSCode Arduino extension’s building speed by 8.25x with incremental build feature, reduced average build time from 1.5 minutes to 10 seconds

Provided insights of VSCode user activity data using machine learning clustering algorithms

Education

Carnegie Mellon University

Master of Science - MS

Shanghai Jiao Tong University

Experience

Education

Master of Science - MS

Bachelor of Engineering - BE