Production Engineer with 4+ years of experience working on critical enterprise and payment processing systems, ensuring high availability, reliability, and operational stability in ITIL-driven environments.
Experience
2024 — Now
2024 — Now
Texas, United States
Worked on high-volume payment processing platforms handling critical financial transactions, ensuring system availability, data
integrity, and seamless user experience across distributed microservices architecture.
• Owned end-to-end production stability, leading incident triaging, root cause analysis (RCA), and resolution of P1/P2 issues while
adhering to ITIL-based incident and change management processes.
• Performed deep log analysis across application logs, Unix system logs, and Oracle database logs, identifying root causes of failures
and implementing long-term fixes to prevent recurrence.
• Engineered automation solutions using Python and Unix shell scripting (cron, bash) to eliminate manual interventions, improving
operational efficiency and reducing repetitive workload by 30%.
• Built and maintained monitoring dashboards using tools like Prometheus, Grafana, or Datadog to track system health and
performance.
• Monitored system performance using AppDynamics, proactively identifying performance bottlenecks, memory leaks, and latency
issues, reducing MTTR by 25%.
• Collaborated with development teams to optimize application performance and ensure reliability in production environments.
• Worked extensively with Oracle databases (PL/SQL) to perform query tuning, data validation, and troubleshooting of data
inconsistencies in production systems.
• Managed cloud infrastructure on AWS, including compute, networking, and storage services.
• Improved system scalability by implementing auto-scaling and load balancing strategies
• Reduced deployment failures by implementing automated testing and rollback mechanisms
• Enhanced monitoring and alerting strategies by tuning thresholds and reducing false positives, enabling faster and more accurate
incident response.
• Handled Kafka-based event-driven systems, ensuring reliable message processing, troubleshooting consumer lag, and maintaining
data consistency across services.
2020 — 2022
India
Worked on enterprise-grade applications in ITIL-driven environments, ensuring high availability, system stability, and smooth
production operations across distributed systems.
• Performed Unix-based troubleshooting using shell commands and log analysis techniques (grep, awk, tail) to diagnose system-level
issues and resolve them efficiently.
• Designed and developed RESTful APIs and microservices using Java and Spring Boot, supporting high transaction volumes
Linux/Unix, Capacity Planning, Performance Optimization
exceeding 1 million requests per day.
• Led debugging efforts across application, middleware, and database layers, identifying root causes and implementing permanent fixes
for recurring production issues.
• Built and deployed scalable microservices architecture, improving application performance, modularity, and maintainability.
• Automated operational and deployment workflows using Python and shell scripting, reducing manual effort and improving
consistency in processes.
• Managed containerized applications using Kubernetes (AWS EKS), ensuring high availability, scalability, and efficient resource
utilization.
• Worked with Kafka messaging systems, troubleshooting message delays, consumer failures, and ensuring reliable event processing
pipelines.
• Optimized Oracle database queries and backend processing logic, improving application performance and reducing response times by
20%.
• Actively contributed to incident resolution, system improvements, and release cycles, ensuring production readiness and long-term
stability of applications.
Education
University of North Texas
Master's Degree
2022 — 2024
Osmania University
Bachelor
2018 — 2021