Senior Data Engineer accomplished with 6+ years of experience in designing and developing efficient and scalable solutions. Proven track record of meeting business and technical requirements and exceeding data client expectations.

Experience

OutlierSoftware Engineer

2024 — Now

Training the next generation of AI large language models (LLM) using RLHF

Improving model quality by using best practices to validate and optimize Python code generated by LLMs

Analyzing outputs of models and providing ratings and justifications for alignment with user-generated prompts

Collaborating with cross-functional teams to shape the future direction of models

AuraSenior Data Engineer

2021 — 2022

Led the architecture of key data quality and monitoring frameworks for internal and external data sources and infrastructure including Spark, Glue, MySQL, Snowflake, Airflow, Tableau and third-party vendors

Mentored engineers and members of cross-functional teams by teaching technical concepts and offering guidance

Designed the technical interview process and interviewed candidates for data related engineering roles

Supported the analytics of Aura’s suite of identity protection and security products that helped transform it into a unicorn company

AuraData Engineer

2018 — 2021

Redwood City, California, United States

Developed robust end-to-end batch and real-time streaming pipelines that processed tens of terabytes of daily data orchestrated by Airflow using Python, SQL, Scala, Spark, Databricks, and Snowflake

Optimized Docker containers for efficient layers that were orchestrated by Kubernetes and integrated with CircleCI

Fine tuned SQL queries, Spark jobs and clusters for performance gains and reduction in overall costs

Implemented robust metadata synchronization and data transfer between Snowflake and Databricks

Migrated from AWS only to Databricks on AWS, Maven to Gradle, Ansible to Terraform, and Zeppelin to Databricks notebooks and integrated Delta Lake

Developed tools for tracking table-level data lineage and schema evolution in Python and SQL that was key in streamlining the simplification of pipelines and the data model

AnchorFreeJunior Data Engineer

2017 — 2018

Redwood City, California, United States

Rebranded to Pango and acquired by Aura

Used Spark, Impala, SQL and Python to process data from Kafka and supported VPN products analytics

Converted jobs scheduled by Jenkins, cron and Zeppelin to Airflow for unified orchestration

Migrated from an on-premise Hadoop ecosystem to AWS (S3, EMR, Glue) to improve scalability and performance

Implemented seamless restart of Tableau Server using iptables and port routing rules that allowed for continuous access across the company

University of California, Santa CruzUndergraduate Research Assistant

2016 — 2017

Santa Cruz, California, United States

At Vollmers Lab: vollmerslab.soe.ucsc.edu

Conducted library preparation and utilized bioinformatic approaches for antibody sequencing data to characterize the adaptive immune system

Constructed data visualizations based on novel sequencing datasets using Python and Matplotlib

Leveraged bioinformatic tools such as BLAT to align and map sequenced DNA fragments and ran scripts on Rocks cluster for distributed processing

Education

University of California, Santa Cruz

Experience

Education

Bachelor of Science - BS