San Mateo, California, United States
• Eliminated recurring P0 incidents by introducing a Memcached proxy layer, reducing connections from ~65K per node to ~3.6K cluster-wide and stabilizing cache access under peak load
• Migrated 20+ services from EC2 to AWS EKS, reducing infrastructure costs by 30–40% and improving deployment speed by ~2x
• Reduced infrastructure costs and accelerated deployments and rollbacks by 3x through migration from external AWS load balancers to in-cluster Kubernetes networking
• Stabilized cluster networking by migrating Datadog communication from TCP to Unix sockets, reducing TCP retransmits to near zero and significantly decreasing inter-service timeouts
• Reduced infrastructure overhead by replacing per-service consul-template sidecars with a centralized ConfigMap-based solution, eliminating thousands of containers and significantly reducing cluster resource usage
• Built MCP server to integrate LLM tooling with Prometheus, improving observability workflows and reducing debugging time
• Led system modernization initiatives (PHP 7 → 8, CakePHP 3 → 4), reducing technical debt and improving runtime efficiency
• Contributed to MySQL 5.7 → 8.0 upgrade by updating queries and validating production readiness during rollout