New York City Metropolitan Area
Architected and developed Hadoop big data infrastructure used by various trading desks and internal engineering teams from the ground up. Headed big data project and improved platform to achieve multi-tenancy, secure, scalability and highly availability. As big data platform product owner, my primary job function included:
** Launched and Built Big Data Infrastructure on Premises and Cloud **
Pioneered infrastructure-as-code workflow and developed Ansible script to automatically deploy Hortonworks distribution Hadoop Big Data Platform (HDP) on premises environments.
Led HDP SaltSatck script development and successfully improved efficiency to deploy complex Hadoop ecosystem including HDFS, YARN, HBASE, HIVE, SPARK and KAFKA into diverse cloud like Microsoft Azure.
** Developed and Orchestrated Scalable ETL Pipelines **
Established Airflow cluster with Celery and RabbitMQ backend from scratch, which elevated capabilities of handling multiple ETL tasks and enhanced visibility of data management.
Implemented Spark based end of day market data ingestion pipeline digested from Kafka, which successfully boosted data processing time over 100%.
** Administered Prod and Dev Big Data Platform **
Led Hadoop administration group to maintain, upgrade platform, and recover from disaster. Advised solutions for clients across New York, Toronto and London with over 50 developers to onboard big data applications.
Devised code based configuration provisioning methodology integrated with CI/CD pipeline. Improved reliability of managing over 200 configuration parameters in Hadoop ecosystem across all production and development environment.
Enhanced platform monitoring mechanism by integrating DataDog logging system and inaugurated realtime dashbord to investigate all infrastructure metrics from centralized place, which efficiently informed support team in real time when any of infrastructure is abnormal.