Senior Software Engineer focused on infrastructure and backend systems. Experience designing and operating reliability-critical platforms. Currently exploring AI infrastructure and model-serving systems, with an emphasis on reliability, observability, and cost-efficient scaling.
Experience
2024 — Now
2024 — Now
Daemon Job Framework: Continuous Execution Infrastructure (2024 - 2025)
* Designed and built a daemon job framework for long-running and streaming workloads, boosting throughput by 70%+, improving reliability, and reducing service overhead with in-memory caching across job cycles.
* Implemented auto-version upgrades with zero-downtime rollouts and geo-redundancy with safe staged deployments.
* Delivered telemetry dashboards with latency, reliability, and timeline tracing metrics for deep operational visibility.
Surgical Failover: Data Pipeline for Failover “Doctor” Engine (2025 - now)
* Built a streaming topology pipeline using the daemon job framework to continuously ingest Azure and SharePoint data, serialize with Protobuf, and publish to blob storage.
* Designed and implemented Doctor, a central decision engine that continuously ingests topology and telemetry from blob storage, snapshots the data for algorithm execution, and outputs automated failover decisions.
* Optimized throughput with a producer–consumer model and sliding-window caching, processing 1M+ blobs / 30 min within ≤2 GB steady-state memory.
2021 — 2024
2021 — 2024
Directory Partition Migration & Failover Automation (2023 - 2024)
* Led the migration from an Active Directory–based solution to a SQL-based directory architecture, driving cross-team collaboration between the SharePoint Directory and Disaster Recovery teams, and designing schemas, APIs, and CRUD workflows for directory partitions and replicas.
* Architected automated partition failover with a traffic-light gating system and pre-failover validation, ensuring seamless recovery without data loss across 400+ production farms.
* Developed monitoring and alerting for long-running or failed failover operations, significantly improving failover reliability, recovery time, and operational performance.
2020 — 2021
2020 — 2021
Washington
2018 — 2020
2018 — 2020
Austin
Employment Related Service for Contractor Jobs
• Work on full-stack development for the web application with Django, MySQL, and ReactJS, which essentially facilitates candidates to apply for matched jobs and allows recruiting agencies to manage the requisition activities
• Implemented an automatic user-interface refresh feature for the project, involving adding code using the setInterval function and React Lifecycle methods to the related containers and components
• Extended the project with a sourcing React App with features of creating, editing and filtering job requisitions, and sending invitations to potential candidates, which at least reduce the time for getting user applications by 60%
• Set up a Node.js cron job for creating and refreshing Elasticsearch indices based on query results from the database, and made search queries with aggregations to score and rank the replaying candidates
• Fulfilled the phone call transcription feature for users to check contact history, taking advantage of Twilio dual-channel recordings and IBM Watson Speech-to-text customized models
Find Me a Meeting
• Build an API design first web application, of which API features include creating events on the shared calendar, fetching free/busy availability, generating unique meeting URLs, and modifying user preferences
• Overwrote the default login and logout views to enable recruiters to sign in via Google and Azure OAuth flows
• Utilized token-based authentication, session-based authentication and customized permission classes to enable role-based access control and object-level access control
• Integrated with Datadog and Sentry monitoring tools to keep track of the unexpected errors and performance
2017 — 2018
New York, New York
Project Making evidence appraisal available and computable
1. Scraped and extracted useful fields, such as pages, authors, and edits from edits in xml format of wikijournalclub using API, regex and beautifulsoup4 packages in Python
2.Identified edits that are minor vs. substantive based on features such as flags and size change
3. Determined which substantive comments are appraisals of studies or other information/assertions using text classifiers
Education
Columbia University
Master of Science (MS)
2016 — 2018
Sun Yat-sen University
Bachelor's degree
2012 — 2016