I'm interested in analyzing and handling data. Python is my first choice of programming language however I've worked with others as well. I have a good understanding of high availability parallel and distributed system designs. I'm an avid tech enthusiast.
Experience
2021 — Now
2021 — Now
Beavercreek, Ohio, United States
Develop and maintain highly scalable and available APIs for audiobook metadata delivery to 26 library and retail partners.
Accountable for onboarding new partners, including setting up ingestion systems to efficiently process data from content providers and deliver it to partner platforms.
Build and maintain ETL (Extract, Transform, Load) processes to provide accurate and up-to-date audiobook financial data.
Participate in tech health initiatives to ensure adherence to best practices in system deployment and maintenance.
2019 — 2020
2019 — 2020
Dayton Metropolitan Area
Vandalism detection in Wikipedia using user edit history
o Used python packages like requests, beautifulSoup to scrape the wiki’s edit histories.
o Performed text parsing, pre-processing and feature extraction using Apache Spark on 8TB of text.
o Vectorization techniques like TF-IDF, Word2Vec (Gensim), Sentence2Vec (Universal Sentence Encoder) have been used.
o Ingested the processed data into neo4j.
o Used networkx to build author interaction graph in python.
o Worked with SOTA graph representation learning algorithms like GraphSAge and Node2Vec.
2019 — 2020
Fairborn, Ohio, United States
Improve the academic, social, and professional lives of graduate students consistent with the goals of the University
2019 — 2020
2019 — 2020
Fairborn, Ohio, United States
Designed Labs for “Data Science with Python” Course.
Illustrated python visualization and data wrangling packages like Matplotlib, Pandas, and NumPy.
Educated under-graduate students about code versioning systems like Git.
2015 — 2018
2015 — 2018
Bengaluru Area, India
Built a data ingestion pipeline using Apache Airflow’s workflow, scheduler, and Genie Orchestrator to orchestrate Hadoop jobs for processing time-series data from the historian time series database using Python.
Developed distributed Machine Learning models using Spark’s MLlib, Scikit-Learn, and Tensorflow in Python.
Leveraged AWS Kinesis firehose and GLUE ETL services to extract the data from General Electric Power Company’s proprietary time-series database, transform as per the input data specifications of each analytic and load the data into AWS RDS/Aurora/S3.
Worked with AWS Python SDK to constantly monitor the jobs and remotely start/stop/short-circuit certain stages in the data pipeline.
Involved in creating a data-lake by extracting Service Now’s data from various data sources to HDFS.
Developed AWS lambda Functions to perform data integrity checks and transformations on AWS S3 datasets.
Ingested data collected from IoT devices into AWS timestream time series DB and built interactive dashboards using AWS Quick Sight.
Developed shell scripts to ingest data from various sources like Flat Files, CSV’s, PostgreSQL DB, and MongoDB into HDFS data lakes.
Migrated analytics developed using Apache Storm to Apache Flink to analyze the time series streaming data from IoT edge devices.
Built a CI/CD pipeline with Jenkins to build, test, and deploy spark analytics in AWS EC2/EMR instances.
Modularized the data pipeline into sub-components using docker containers for each service.
Developed Data Warehousing Scripts to migrate historic data from data stores to AWS Glacier DB.
Developed Utility Scripts to receive real-time data from the Apache Kafka and store the stream data to HDFS and S3 using Scala/Python
Developed scripts to deploy the data pipelines into AWS and pivotal cloud foundry.
Developed highly concurrent and multi-threaded applications using Java and Spring Boot.
Developed interactive near real-time dashboards using Angular, React, and Spring Boot.
Education
Wright State University
Master of Science - MS, Computer Science
2018 — 2020