Experience
2026 — Now
2026 — Now
United States
2022 — 2026
2022 — 2026
New York City Metropolitan Area
Architecting robust real-time and batch pipelines and massive scale data solutions for Netflix's Open Connect team, with a focus on optimizing Video Streaming, Streaming Algorithms, CDN-level TCP Connection, Experimentation, Content Steering, and Content Placement.
• Leading the development of highly scalable, low-latency, multi-region Apache Flink pipelines, processing billions of CDN events daily with a peak throughput of 1 million events per second—establishing a global benchmark for Flink jobs.
• Designed petabyte-scale Spark pipelines for CDN metrics, achieving significant efficiency improvements by fine-tuning Apache Iceberg using strategies like merge-on-read, clustering, and partitioning for optimized query performance.
• Built Netflix’s first massive Spark Streaming pipelines, building unified pipelines that support both batch and streaming modes. Implemented auto-recovery mechanisms to ensure resiliency and 99.999% uptime, achieving a true Kappa architecture.
• Spearheaded the entire log ingestion flow and cloud game server-side data pipelines, integrating technologies like Protobuf, schema registries, log collection frameworks, data mesh, Spark Streaming, Elasticsearch, and Druid at scale.
• Developed high-performance JavaScript dashboards by integrating with Apache Druid to achieve sub-millisecond query latencies, enabling ingestion of 50 billion rows per hour. Evaluated trade-offs between group-by and Top N querying, factoring in cardinality for optimal performance.
• Implemented Kappa and Lambda architectures to seamlessly balance real-time and batch processing, maximizing code reuse while carefully considering trade-offs for various streaming tech stacks like spark streaming, flink and mantis.
Technologies Used: Scala, Apache Flink, Kafka, Avro, Google Protobuf, Apache Iceberg, Druid, Trino, Spark, Spark-Streaming
2021 — 2022
New York City Metropolitan Area
Built petabyte scale, distributed and highly resilient data solutions to power Merchant team’s data needs.
• Initiated and lead the cross-teams project (15+ teams from engineering to support) to make menu data accessible for analytical and near real-time use-cases by leveraging lambda data processing architecture.
• Architected and built real-time services using flink, that consumes data from multiple Kafka topics by doing a streaming join, hydrates, processes and sinks data back to Kafka for ingestion to Apache Pinot and Snowflake. I have optimized this service to process ~5 billion events per day, achieved end to end latency of less than ~2 minutes and have scaled our service to 100 + distributed Kubernetes pods.
• Wrote spark job to join and ingest menu data for all our merchants globally, this job made up the batch layer of menu lambda architecture and ingested 40 billion rows per day. I optimized this job extensively which end up saving us around 5x cost.
Technologies Used: Flink, Cassandra, AWS, Databricks, Kubernetes (EKS), Spark, Jenkins, Terraform, Java, Snowflake, Elastic Search
2017 — 2021
2017 — 2021
New York City Metropolitan Area
Built real-time and offline solutions to make twitter's revenue data accessible and reliable while leveraging the largest-scalable distributed data processing technologies in the world.
• Built a scalable data ingestion platform for all revenue teams which runs managed batch jobs to move a large number of datasets of size ~500 TB/hr from on-premises to the cloud. This platform uses dataflow with Apache beam for data transformation (Thrift->Avro), Big Query for data warehousing, Kafka for messaging when the partitions are available on-premises, and Apache Airflow for orchestration
• Worked on building a petabyte-scale distributed data warehouse for revenue org by leveraging Big Query.
• Worked on large scale batch processing pipelines to flatten and move around ~150TB data per hour from on-premises to cloud.
• Build various large scale tools and libraries that involve Big Query Onboarding tool, HDFS to Big Query data migration tool, Big Query to Druid data migration tools, and various serialization frameworks (e.g. thrift, Avro).
• Maintained one of the largest druid (Scalable Distributed Analytics Platform) cluster in the world ~6000 nodes. Maintenance involves handling scalability issues, performance issues, and ingesting various aggregated datasets at scale.
• Leveraged our internal batch computation frameworks (scalding) and our workflow management platform to assist other teams in building out their data pipelines.
Technologies: BigQuery, Beam, Dataflow, Airflow, GCP, Scalding, Map Reduce, Scala, Batch Processing, Real-Time Streaming, Data Engineering, Hadoop, Druid(Distributed analytics platform), Kafka.
2016 — 2017
2016 — 2017
New York City Metropolitan Area
• Architected and created the database which is used by the global audit team, analyzed and interpreted data feed sources to migrate data into MS SQL Server database using daily automated import processes.
• Developed automated ETL (Extract-Transform-Load) processes to move data between GRC system and Oracle database thus reducing the load time by 60%.
• Wrote data load and transformation scripts in SQL Server Integration services (SSIS) to cleanse and standardize the data in accordance with the business rules thus reduced the manual data clean up time by 80%.
• Created clustered and non-clustered indexed on tables for faster searching and retrieval of data.
Develops queries to research, analyze, and troubleshoot data and to create business reports; and working with the development team to investigate, correct bugs and deficiencies.
• Generated ad hoc reports using complex queries to support auditors globally for budget tracking, issue tracking and getting useful analytical insights leading cutting budget by 10%.
• Developed executive reports and dashboards in Qlikview to visualize data and measure the performance of business in real time thus reducing the manual report creation time by 50%.
• Wrote and analyzed trending reports, drew extrapolations from findings with the project manager and the team (SSRS, Excel, T-SQL, Queries).
Environment: T-SQL, ETL Development, SQL Server, SSIS, MS Access, MS Excel, Qlikview
Education
New Jersey Institute of Technology
Master's degree
2015 — 2017
PES University
Bachelor of Engineering (B.E.)
2010 — 2014