Resilience Testing Workshop

The End of May and half of June Mark Abrahams and I gave a workshop about Resilience Testing. This Blog is for everyone who want to continue with experimenting after that workshop and/or want to learn more about the subject.

Resilience testing: 

Why: To help you build more stable application and perform well in real life situations;
To avoid: Service loss, Data loss at the end Customer loss!

This workshop introduces the following: Basic load testing & Resilience testing via Infra Evens. You will find all the materials on google drives.

Basic Load testing:

For load testing we used Gatling. So to read more about Gatling go to their website. I will cover the basics to start using it with our workshop.

If you look at the presentation you will find the steps you have to take on slides 9 & 10 to setup the environment. If you have questions, send me a e-mail. But if you have done that we can start with fixing your loadtest.

So for this you need your IP and to open the folder with the load test called ‘load.scala’. This loadscript written in Scala needs to have the IP from the VM to start stressing it.

You need to put the IP of your Virtualbox in the loadscript on the place below.

val httpProtocol = http

Next step is to tune the loadscript to create enough load. We used the following setup:

setUp(scn.inject(rampUsers(1400) during (5 minutes))).protocols(httpProtocol)

Which spawns a 1400 users over a time period of 5 minutes. With the standard VM settings about 1500 is the normal limit without causing errors. You can also use other settings, read more about those on the gatling wiki.

Stress Test

To find the boudary there is a stress test included in the gattling scenarios as well.  If you run this you will also get an idea of the resilience of our application. Because the application doesn’t break and keeps responing to incoming request. But when it gets to much, you still get some responses.

Resilience Testing

The Resilience tests we do focus on infrastructure failures, which is something you will encounter when you will move to the cloud that there are infra related events that cause short drops in CPU, memory or IO performance and/or network issues. So while you run a basic load test and introducing these events, you can start with your resilience tests at a low level and build upon that.

The scenario’s we provide are based on either linux commands or a tool called Stress which you can install as part of your linux distro.

By using the following command you will introduce a a worker process for one cpu threat for 40 seconds.

stress –c 1 –t 40

Memory 30 seconds.

stress –m 5 -t 30

IO load 60 Seconds.

stress –i 1 –t 60

For creating a network time out we use the Linux command to reinitialize the network adapter. That creates a short network failure.

/etc/init.d/networking restart

Extra ways of stressing the CPU

There are ofcourse other commands to use aswell, because we noticed that stressing the CPU with stress did not work as effectivity because of build-in OS/Docker components, you can use the following commands as well:

dd if=/dev/zero

of =/dev/null

More about Chaos Monkey will follow in the next blog post.