Skills - Programming: Python, Scala, JAVA, C, C++, Shell Script, Matlab - BigData: Hadoop Ecosystem, EMR, Spark, SparkSQL, SparkStreaming - DevOP/Infrastructure: Amazon AWS, Ansible, Salt, Docker, Vagrant, Jenkins - Database: MySQL, MongoDB, Google BigQuery, HBase, Hive, S3 - DataAnalytics: Machine Learning, Deep...

Experience

AutodeskPrincipal Software Engineer, AI/ML Platform

2022 — Now

New York, New York, United States

Data Platform for Machine Learning

Feature Management Service

Labeling Service

TD SecuritiesVP, Big Data Platform Architect

2018 — 2022

New York City Metropolitan Area

Architected and developed Hadoop big data infrastructure used by various trading desks and internal engineering teams from the ground up. Headed big data project and improved platform to achieve multi-tenancy, secure, scalability and highly availability. As big data platform product owner, my primary job function included:

** Launched and Built Big Data Infrastructure on Premises and Cloud **

Pioneered infrastructure-as-code workflow and developed Ansible script to automatically deploy Hortonworks distribution Hadoop Big Data Platform (HDP) on premises environments.

Led HDP SaltSatck script development and successfully improved efficiency to deploy complex Hadoop ecosystem including HDFS, YARN, HBASE, HIVE, SPARK and KAFKA into diverse cloud like Microsoft Azure.

** Developed and Orchestrated Scalable ETL Pipelines **

Established Airflow cluster with Celery and RabbitMQ backend from scratch, which elevated capabilities of handling multiple ETL tasks and enhanced visibility of data management.

Implemented Spark based end of day market data ingestion pipeline digested from Kafka, which successfully boosted data processing time over 100%.

** Administered Prod and Dev Big Data Platform **

Led Hadoop administration group to maintain, upgrade platform, and recover from disaster. Advised solutions for clients across New York, Toronto and London with over 50 developers to onboard big data applications.

Devised code based configuration provisioning methodology integrated with CI/CD pipeline. Improved reliability of managing over 200 configuration parameters in Hadoop ecosystem across all production and development environment.

Enhanced platform monitoring mechanism by integrating DataDog logging system and inaugurated realtime dashbord to investigate all infrastructure metrics from centralized place, which efficiently informed support team in real time when any of infrastructure is abnormal.

Weber ShandwickData Engineer

2016 — 2018

New York City Metropolitan Area

Architected the whole data streaming pipelines. Developed Spark backend distributed system based on SMACK principle to automatically digest daily TB scaled data fed from social media platforms(Ongoing: Wechat, Taobao, Weibo; Prospective: Facebook, Twitter, Tumblr). My primary job function included:

** Designed and Programmed the ETL Process **

Built a message queue system and API to digest raw data from vendor, integrated with cron jobs to automatically ship raw data to S3 data storage and a real-time error tracking system such as Sentry to improve the reliability of the process.

** Developed Python And Scala Based Backend System **

Transformed the conventional single-machine data processing system into a distributed architecture that is deployed on Amazon AWS cluster using Spark framework, which significantly reduced the computation time from 10 hours to 1 hour and converted the data into clean and structured format.

** Bridged From Raw Data To Readily Accessible Information **

Established the cron jobs that ship cleaned data to the connected database including MySQL, MongoDB and Google BigQuery, provided with an easy and queryable interface for data strategy team's usage, which successfully doubled the company's productivity.

** Managed Production And QA Environment **

Utilized Docker and Vagrant to build deployment-like server environment on the local machine. Programmed Ansible playbook to automate ETL process locally and deployed code on the product server after passing test.

** Optimized Codes To Maximize Utility **

Guided data scientist and data analyst team to implement machine learning algorithm, and optimized the code to be more concise and efficient in utilizing resources under distributed architecture.

(Formerly called Bomoda and later acquired by Weber Shandwick, part of Interpublic Group (NYSE: IPG) in November 2017)

Renault Innovation Silicon ValleySoftware Engineering Intern

2015 — 2015

Sunnyvale, California Area

Conducted for the autonomous vehicle project.

** Prototyped In-Car Seat Driver's Heartbeat Detection System **

Developed system including hardware and software. Embedded sensor into car seat to capture vibration signal from driver and communicated digital signal with computer via Arduino.

** Created Heartbeat Detection Algorithm **

Analyzed the extracted siganl and designed a new algorithm based on time-frequency analysis to convert vibration siganl to human readable heartbeat number. Successfully achieved error rate within 10% as compared with ground truth.

** Conceptualized Algorithm Into Software Product **

Implemented the algorithm based on Python and packaged it with GUI which makes software more user friendly.

INNOLUX群創光電Software Engineer

2012 — 2013

Tainan City, Tainan City, Taiwan

Conceptualized and implemented new algorithm for Low Color Shift technique of oblique viewing angle of LCD using C and Matlab. Significantly reduced the color shift on oblique view by 30%

Education

Carnegie Mellon University

Master's Degree

2014 — 2015

National Tsing Hua University

Credit Certificated Program

2013 — 2014

National Taiwan University

Master's Degree

2009 — 2011

National Taiwan University

Bachelor's Degree

2005 — 2009

Experience+1

Education

Master's Degree

Credit Certificated Program

Master's Degree

Bachelor's Degree

Experience