I’m a Data and Software Engineer with 4+ years of experience building scalable data pipelines, backend systems, and cloud solutions across AWS, Azure, and GCP. I work with Python, PySpark, Kafka, SQL, Java/Spring Boot, and GenAI tools to deliver reliable, high-impact products.
Experience
2024 — Now
2024 — Now
Dallas, Texas, United States
• Built a real-time ingestion pipeline using AWS Kinesis and Kafka to process 500,000+ feedback events per day with low latency and seamless transformation into structured datasets.
• Designed dynamic ETL pipelines with PySpark and advanced SQL (window functions, CTEs, recursive queries) to handle complex time-series aggregations and significantly reduce processing time.
• Implemented large-scale batch processing on Hadoop using HDFS and Hive, applying Python multiprocessing for parallelization and achieving a 30% reduction in execution time.
• Integrated GPT-4 and RAG pipelines to summarize customer feedback and classify contextual insights, improving sentiment accuracy and reducing hallucinations through retrieval and reranking strategies.
• Built secure, scalable data storage and retrieval solutions using AWS S3, Glue, Athena, and Redshift Spectrum, optimizing schemas and query performance through effective partitioning.
• Orchestrated fault-tolerant workflows with Temporal using activity retries, backoff strategies, and durable execution to eliminate manual restarts and reduce pipeline failures.
• Developed optimized dashboards using advanced SQL techniques like materialized views, partitioned tables, and pivot queries, improving data availability and analytics performance.
2022 — 2022
2022 — 2022
India
• Designed and deployed a distributed data storage system on HDFS with Hive-based partitioning to manage 500+ GB of daily financial data, improving scalability and reducing storage latency.
• Orchestrated automated ETL pipelines in Azure Data Factory for seamless movement of financial and HR datasets, and used Azure Databricks to build dimensional models and transformations that enhanced reporting accuracy for people analytics and HR leadership.
• Built real-time ingestion pipelines with Kafka to process high-volume transactional events, boosting throughput and scalability by 50% and enabling real-time insights for downstream applications.
• Optimized Hive storage and advanced SQL querying, integrating with Azure Synapse Analytics to accelerate financial data retrieval and improve analytics performance.
• Developed PySpark-based datasets for financial reporting, creating structured views that highlighted key trends and automated daily reporting workflows, reducing manual preparation time.
• Automated anomaly detection and metadata tagging by integrating GenAI APIs into pipelines, incorporating prompt engineering, versioning, and evaluation hooks with human-in-the-loop validation to ensure high data quality and monitoring reliability.
2020 — 2021
India
• Developed Spring Boot - based Java applications using REST APIs and microservices architecture, enabling secure and scalable integration with enterprise systems.
• Built full-stack solutions by integrating backend services (Java, .NET, SQL) with frontend components (HTML, CSS, JavaScript) to deliver interactive dashboards and reporting tools for business stakeholders.
• Resolved 200+ critical incidents by debugging Java applications and optimizing SQL queries on Oracle databases, ensuring high system uptime and data integrity.
• Automated ETL and data integration workflows using Java, SSIS, and SQL Server, reducing manual intervention and improving the reliability of migration processes.
• Implemented CI/CD pipelines and version control using GitHub, Jenkins, Visual Studio, and Octopus, reducing release cycle times and improving deployment success rates.
• Strengthened application security by applying Java security best practices, role-based access control (RBAC), and encryption across backend systems.
Education
University of North Texas
Master's degree
SRK Institute of Technology