New York City Metropolitan Area
• AWS Expertise: Managed CloudWatch, CloudTrail, S3 (versioning for sensitive data), DynamoDB (Lambda integration), automated snapshots with Python, and developed Redshift-based data warehousing.
• Data Warehousing and ETL Automation: Designed and implemented data warehouses on AWS Redshift and Azure SQL Data Warehouse, optimizing high-performance storage solutions. Created and configured Apache Airflow DAGs for automated ETL workflows, reducing data ingestion time by xx% with PySpark and Scala..
• Spark & Big Data: Developed PySpark scripts for data transformations, RDD operations, SQL queries, Spark Streaming, and advanced data analysis; optimized Hive with bucketing, partitioning, ORC file formats.
• Kubernetes: Maintained production-grade Kubernetes clusters with regular patches and upgrades to ensure stability and scalability.
• Data Quality & ML: Performed data profiling and quality checks (Hive, Hadoop, Spark), created UDFs in MapReduce, Pig, and Hive, implemented machine learning algorithms in PySpark for data-driven insights.