Software Engineer with seven years of experience in Data Engineering and DevOps for internal engineering platform, recently focused on frontend development with React. Experienced in Terraforming AWS infrastructure and setting up CI/CD pipelines for improving developer experience.
Experience
2024 — Now
2023 — 2024
2023 — 2024
Data Team - Internal Data Platform
• Built and deployed services for internal data platform using open-source components. Argo Workflows for ETL Job orchestration. Trino for querying across data stores. Hive Metastore for managing Iceberg tables. Apache Ranger for securing data permissions. Superset for BI visualizations. Datahub for metadata discovery.
• Initiated Proof of Concept (POC) deployments using official or open-source third-party Helm charts on the existing EKS cluster. Used Terraform to manage AWS dependencies: IAM role permissions and S3 buckets.
• Enhanced deployment security with DevOps team by implementing Okta SSO and Vault for secure deployment value management.
• Used Prometheus and Grafana for application monitoring of uptime, long and resource intensive queries, and debugging error logs.
Redesign Platform Core Team - Frontend Development
• Developed frontend pages for the Redesign Health Platform Portal with React using reusable components where appropriate.
• Used Storybook for rapid page development, independent of API connection.
• Collaborated with team member on integrating API contract in unit testing for mocked data validation with said contract.
• Identified gap in unit testing coverage and bootstrapped to increase coverage from near 0% to 45%.
• Created Jira tasks aligned with product requirements and Figma design screens.
Implemented devcontainer setup with docs for streamlined onboarding for both data and core teams.
Unfortunately, no longer continuing role due to staff reduction
2022 — 2023
2021 — 2022
2021 — 2022
Data Platform Lead - Clinical Trials
• Ingested a daily full feed of clinical trial data and recorded changes over time.
• Expanded the process to incorporate multiple countries and establish linkages.
• Orchestrated and managed workflows using Argo Workflows.
• Wrote Pyspark jobs for reading/writing Apache Hudi tables and Python jobs, ensuring comprehensive test coverage.
• Coordinated with internal data consumers for normalized, cleaned data through shared message contracts to queue, for consumers to pull from.
• Implemented Continuous Integration (CI) with unit and integration test runs for each commit, Dockerized build job, and CircleCI integration. Managed AWS infrastructure using Terraform and coordinated with DevOps team.
• Led agile grooming and planning sessions to strategize sprints and deliverables. Maintained alignment with technical manager and product owner priorities.
• Mentored and conducted code reviews for a team of 4 that started with minimal Python and no Pyspark experience.
• Maintained up-to-date documentation for the evolving team and project.
2017 — 2021
2017 — 2021
Greater New York City Area
Chemical Substance Normalization
• Collaborated with Subject Matter Experts (SMEs) to build and refine rule-based approach for enriching substance metadata and linking substances
• Utilized AWS Step Functions for end-to-end task sequencing and error handling. Created jobs in Lambda and AWS Glue with PySpark for each data processing step.
• Created Kibana dashboards for data sharing and quality analysis before data releases. Created Elasticsearch index schemas, indexed data, and provided query structure for application team requirements.
• Established linkage for 21 million unique substances to original content sources, available for consumption by application teams in both RDS database and Elasticsearch.
Geolocation Tagging Engine
• Developed a service to identify and return top locations mentioned in a document. Existing technique for determining top locations based on area overlap count was too slow and yielded lower quality results compared to commercial software.
• Reduced total runtime per document from 2 hours to under 20 seconds.
• Achieved 65% improvement in top location result quality.
• Implemented algorithm for area overlap count using open source python libraries (Geopandas, Shapely).
• Replaced dictionary matching component of locations with Spacy PhraseMatcher for faster matching and simplified setup.
• Implemented unit and integration tests where there were none before.
• Integrated service into CI/CD pipeline using Github, Docker, Jenkins, and Terraform for scaling and deployment.
Education
The Cooper Union for the Advancement of Science and Art
Bachelor of Engineering (B.E.)
2013 — 2017
Thomas Jefferson High School for Science and Technology
High School
2009 — 2013