Developed a critical load testing tool to scale the Ordering engine and its hundreds of dependencies for peak events like PrimeDay,BlackFriday, etc. Previously, major outages prevented customers from placing orders; tool ensures ordering ecosystem is tested accurately and to scale so customers do not experience impact.
Key contributions include:
• Designed and implemented traffic network infrastructure, leveraging multiple AWS services to scale to hundreds of thousands of transactions per second, ensuring evenly distributed traffic.
• Developed dynamic traffic safety mechanisms, automating real-time load adjustments and reducing manual intervention.
• Regularly identified and addressed scaling bottlenecks, optimizing system performance and improving the ability to accurately replay high load scenarios.
• Hands-on experience running tool for critical events like PrimeDay. Significant contributions to the development of monitoring dashboards, alarms, and performance metrics to track tool health and ensure real-time visibility of system performance.