Experience
2025 — Now
United States
• Deployed and managed Prometheus and Grafana for system metrics and alerting, improving detection of infrastructure bottlenecks.
• Deploy and manage containerized applications using OpenShift platform, ensuring seamless continuous integration and delivery (CI/CD) pipelines.
• Collaborated with development teams using Git and integrated it with GCP-based CI/CD tools for automated versioning and code deployment.
• Designed, developed, and maintained AWS Glue ETL jobs to process, transform, and load large-scale structured and semi-structured data.
• Automated data pipelines using AWS Glue Workflows, triggers, and schedules to ensure reliable data processing.
• Implemented CI/CD pipelines using GitHub Actions, automating build, test, and deployment processes to enhance software delivery efficiency.
• Integrated GitHub with Jenkins to streamline automated testing and deployment workflows, improving developer productivity.
• Designed, deployed, and managed scalable Azure cloud infrastructures using Azure Virtual Machines, Virtual Networks, and Load Balancers.
• Designed and deployed multicluster Kubernetes environments on AWS EKS, leveraging KCP for API aggregation and workspace management.
• Developed custom CRDs, APIResourceSchemas, APIExports, and APIBindings to enable dynamic API discovery and integration with external providers.
• Automated infrastructure provisioning and configuration using Terraform and Helm for consistent, repeatable deployments.
• Implemented centralized logging and auditing pipelines using Fluentd, CloudWatch, and S3 for compliance and troubleshooting.
• Created real-time metrics collection and alerting with Prometheus, Grafana, and AWS CloudWatch to monitor platform health and resource usage.
• Acted as on-call SRE supporting 24/7 production workloads, handling incident triage, mitigation, and escalation.
2024 — 2025
United States
• Designing, deploying, and managing cloud infrastructure on AWS, Azure, and Google Cloud Platform (GCP) to optimize performance, scalability, and cost-efficiency.
• Provisioned and maintained AWS and Azure infrastructure, including EC2, S3, IAM, VPC, Azure Web Apps, Storage, and Active Directory.
• Managed microservices with Docker, Kubernetes, OpenShift, and Azure Kubernetes Service (AKS).
• Implemented continuous integration and delivery pipelines with tools like Git, TeamCity, Octopus, and AWS Code Pipeline.
• Designed and implemented automated pipelines for AWS EC2 to OCI Compute instance migration, ensuring minimal downtime and optimized performance.
• Configured Prometheus to collect real-time metrics from cloud infrastructure, applications, and services for performance monitoring.
• Provisioned and maintained cloud resources across AWS, Azure, and GCP, including EC2, S3, IAM, VPC, Azure Web Apps, and GCP Compute Engine for scalable deployments.
• Developed automation scripts with PowerShell, Ansible, and Chef to streamline deployment and infrastructure management.
• Defined and enforced SLOs/SLIs as part of the observability strategy, aligning system reliability targets with business objectives.
• Deployed containerized applications and scaled Kubernetes clusters, enabling efficient orchestration and resource utilization.
• Developed Infrastructure as Code (IaC) solutions using Terraform to automate provisioning of computer, networking, and storage resources in OCI.
• Utilized Azure Recovery Vault and backups to ensure disaster recovery and data integrity.
• Set up Prometheus Alert manager to trigger alerts based on predefined thresholds, ensuring quick incident response and resolution.
• Proficient in using Terraform to define, provision, and manage cloud infrastructure (AWS, GCP, Azure) through code, ensuring consistent and repeatable deployment processes for scalable and secure environments.
2022 — 2024
United States
• Expertise in Prometheus, Grafana, ELK Stack, Datadog, and CloudWatch for initiative-taking monitoring, logging, and incident response.
• Experienced in Terraform, CloudFormation, and Ansible to automate provisioning and management of cloud resources.
• DevOps Workflow encompassing all stages, beginning with SCM Commit Build, Integration Build Compiling.
• Integrated monitoring and logging solutions using OCI Logging & Oracle Cloud Observability, ensuring initiative-taking issue resolution and enhanced system reliability.
• Kernel tuning, Writing Shell scripts for system maintenance and file management.
• Integrated observability into CI/CD pipelines, enabling shift-left monitoring and early detection of performance regressions during deployments.
• Experience in Chef with configuring Chef-Repo, setting up multiple Chef Workstations, and writing Chef Cookbooks and Recipes to automate the deployment process using Spinnaker and integrated with Jenkins jobs for CD framework.
• Skilled in integrating Git repositories with CI/CD tools (e.g., Jenkins, GitLab CI) for automated build, test, and deployment pipelines, accelerating the software delivery process.
• Developed automation scripting in Python (core) using Puppet to deploy and manage Java applications across Linux servers.
• Utilized Datadog security monitoring features to track vulnerabilities, detect threats, and ensure compliance with industry standards.
• Integrated Grafana with multiple data sources, including Prometheus, Elasticsearch, and Datadog, for centralized monitoring.
• Utilized Python for data extraction, transformation, and analysis, leveraging libraries such as Pandas and NumPy to process large datasets.
• Created scripts in Python which are integrated with Amazon API to control instance operations.
• Integrated Prometheus with Grafana for real-time visualization and with tools like Kubernetes and Docker for enhanced container monitoring.
2018 — 2021
Hyderabad, Telangana, India
• I am skilled in utilizing tools such as Prometheus, Grafana, and kubectl to monitor cluster health, diagnose issues, and implement initiative-taking measures for resource optimization and application reliability in Kubernetes environments.
• Experienced on AWS EC2, EBS, ELB scaling groups, Trusted Advisor, S3, Cloud Watch, Cloud Front, IAM, Security Groups, Auto Scaling.
• Expertise in using Git for version control to manage and track code changes, ensuring efficient collaboration across distributed teams and maintaining a clean project history.
• Developed a custom AWS-to-OCI security policy mapping tool, converting AWS IAM roles, policies, and security groups to OCI IAM, ensuring compliance.
• Effectively planned and deployed hybrid Cloud infrastructure in a production environment.
• Analyse cloud infrastructure and recommend improvements for performance gains and cost efficiency solutions.
• Created the architecture and created the Cloud Formation template to facilitate deployment.
• Have knowledge about Basic information about Linux OS. (File system, File configuration, Linux structure, directories.)
• Working on Various incidents like as ESX/ESXi server Down, Data store storage issues, Vmotion, Patching, Snapshots, HA, and DRS, etc.
• Use VMware VSphereVcenter Update Manager to apply patches to ESX, ESXi and virtual machines.
• Maintaining Vcenter Servers, creating Virtual Machine Templates.
• Performing different ESX server & Virtual Machine related tasks like vMotion, Storage. VMotion, High Availability (HA), DRS (Distributed Resource Scheduling), Cloning, Snapshot.
• Responsible for remote administration of 2003/2008/2012 servers in domain environment.
• Service requests: Tickets regarding changes in the infrastructure, increase of memory, hard disk, Number of CPU’s, v2v migrations, installing software.
Education
Trine University
Master of Science - MS
Acharya Nagarjuna University (ANU), Guntur