• Designed and implemented Perception Off-Board Data Preparation Pipeline over 20,000 Perception data with Python and Spark, providing Modeling Team with highly reliable, quality-checked data
• Developed Geo-Split Data Splitting Algorithm to split data according to Geometry location, eliminating overfitting and increasing perception accuracy by 1% ~ 3% for different models
• Deployed pipeline on AWS EC2 and S3 using Docker and Spinnaker
• Analyzed and Visualized data using Pandas and Bokeh, generating over 40 metrics across multiple dimensions on the quality of data collected, improving data label density by 27%
• Integrate with VerCD continuous delivery system to feed data and report to modeling team