Welcome back from #AWSReInvent2019 !! Hope everyone had a safe and enjoyable visit and travel. If you are still there – lucky you!
While visiting the vendor pavilion at the Venetian resort, we came across many vendors – some known, and some unknown – to the masses. Gremlin is one of the gems we found very interesting to share with you.
First – some background – what is Chaos Engineering and Chaos Monkeys?
Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system’s capability to withstand turbulent and unexpected conditions.
In software development, a given software system’s ability to tolerate failures while still ensuring adequate quality of service—often generalized as resiliency—is typically specified as a requirement. However, development teams often fail to meet this requirement due to factors such as short deadlines or lack of knowledge of the field. Chaos engineering is a technique to meet the resilience requirement.
Chaos engineering can be used to achieve resilience against Infrastructure, Network, and Application failures.
Chaos Monkey is a tool invented in 2011 by Netflix to test the resilience of its IT infrastructure. It works by intentionally disabling computers in Netflix’s production network to test how remaining systems respond to the outage. Chaos Monkey is now part of a larger suite of tools called the Simian Army designed to simulate and test responses to various system failures and edge cases.
Now that we covered some of the basics, back to Gremlin. Gremlin Software is a “failure-as-a-service” platform built to make the Internet more reliable. It turns failure into resilience by offering engineers a fully hosted solution to safely experiment on complex systems, in order to identify weaknesses before they impact customers and cause revenue loss.
During our flight back from Las Vegas, I decided to run a trial of Gremlin for a very simple use case: CPU Saturation and it’s impact against a web application. No better test than our own VVL Systems website.
The video below walks you through a simple CPU saturation test and measurement of impact via NewRelic. Gremlin offers many other situation simulations for Chaos Engineering which we’ll dive deeper as we get more familiar with the platform.