Boston, Massachusetts, United States
• Led architecture of data platform on Google Cloud Platform (GCP) enabling scalable and automated processing on petabyte-scale data lake to accelerate deep learning (DL) development and data analytic workflows by up to 80%.
• Developed data collection features for on-board robotics platform in C++ and ROS 2, reducing CPU consumption by 79%, reducing storage requirements by 67%, and increasing reliability and automation.
• Implemented horizontally-scalable Kubernetes infrastructure for processing unstructured data (image, video, and ROS bags) consisting of containerized data transforms operating on a Cloud Storage data lake and orchestrated with Dagster.
• Developed data search system consisting of BigQuery data warehouse, Postgres database, Node.js (Express) RESTful API, and Python client library enabling search and retrieval of data and machine learning (ML) annotations.
• Developed image content and ML-based data pipelines for prioritization of unlabeled data for annotation, optimizing metrics such as scene object density (85% increase) to improve model training within constrained annotation budget.
• Organized and managed cloud infrastructure by setting up Terraform for resource management, establishing a team git workflow, standing up an internal Python package repository, and creating documentation for end users.