Deployed a monitoring solution for a large Kubernetes+AWS deployment(prometheus). Set up metrics collection/exposure for various parts of the existing infrastructure(Kafka, Nginx, Spark, Kubernetes ect.) and set up monitoring and alerting on metrics recorded from them.
Additionally, implemented dynamic rate limiting for a JVM-based webhook-ing service that consumed work from Kafka, and fired webhooks based on various filtering criteria at high volume.