Round Earth Test Strategy and earthquakes

Recently, James Bach published a nice analogy/model regarding test strategy, named “Round Earth Test Strategy“(1)

Early in 2018, in January, James Bach was kind enough to share this analogy on a special channel RST( this channel was created especially for RST alumni; it’s an active and very very very useful group to be in – another very good reason to take RST class(2)).

I liked it. But, in that moment, and days/months after, I felt that I still miss some aha moments, about those things, which confirm in a way that a certain subject is internalized in myself somehow.

Museum

In the image of this blog post is the Natural History Museum in London. It was here where I had the aha moments I knew I was missing. Suddenly, in my head, I remembered James Bach’s round earth model. In there is a section named “Volcanoes and Earthquakes”. When I began to read what was written, I was amazed how the things described there can be used as an analogy, as James thought, for testing, but maybe also for problems/risks/bugs/tools/approaches from the IT field:

● “Earthquakes can happen without warning, causing death and destruction on a massive scale. When they strike we feel a sudden, violent shaking of the ground, but they are caused by slowly moving plates on Earth surface. As these plates move, pressure builds up until it finally gives away”(3) → In reading this text I remembered bugs. How certain bugs can create a lot of mess (urgent calls, staying late in the night, missing important moments). How those bugs appear suddenly, without notice. But what stroke me is the fact of slowness and violence of earthquakes.

Slowness in:

– how functionality is written → It’s not like the functionality is developed instantly. There are meetings, mails, writing code, searching through code, testing, etc.

– execution of the program – managed memory leak, usually, takes time to be observed. But when it reaches a critical point, entire system crashes not only the programs.

The idea of moving plates made me think of integration testing.

● “Preparing for the worst; living with earthquakes: Scientists can’t predict exactly where and when an earthquake will strike but they know roughly which area around the world are at risk. It is vital for the people living in these areas to prepare for what may come and know what to do when it does. Without adequate preparation, earthquakes can cause huge suffering and destruction”(3) → It describes perfectly the role of testers and why they are searching for the worst, why they should think negatively. Testers, like these scientists, should understand that they cannot predict and also can’t be sure that the software is ok (just a black swan among thousands and thousands of white swans was enough to proof that there were not just white swans). Although they can’t predict or prove the correctness of a software program, they will use models to identify possible areas with problems guided by risks (For example, by looking at the source control metadata, weak areas within the source code can be discovered. ). Since we’re talking about people, the risks also have a psychological and sociological dimension. It is sure, problems will occur and maybe we should guide our testing also by the possible suffering we create, for the ones using our software.

● “Impact scale: There are different ways of measuring earthquakes. Unlike Richter scale, which measures magnitude of the shaking, the Mercalli scale measures the amount of damage caused – the loss of life and the damage of buildings. Generally speaking, the higher earthquake magnitude the greater the devastation, especially when it strikes near populated areas. But you also have to factor the in depth of the earthquake, and how well people have prepared. A big earthquake can have a low Mercalli value if it happens deep underground or if buildings have been properly supported”(3) → When I saw measuring, I recalled the nonsense in counting the test cases – which is very susceptible to the reification error and this is very, very dangerous. But we have something which tries to avoid the reification error and is based on events/activities: it is called Session Based Test Management.

But there is more, the fact that a big earthquake can have a low Mercalli, made me think about complexity and the fact that the relation between cause and effect is not linear. Populated areas also indicates complexity, because social systems are inherently complex. This means, for testing, that the approach is more informal (it’s about trying/probing, then make sense of it, then respond/report the possible problems which might occur), not a formal one – when thinking of testing, and more specifically the checking dimension, maybe here mutation checking makes sense.

There is also another implicit dimension here, which is the place where the earthquake happened. Even if it is a small earthquake but it happens in the ocean, it can generate a tsunami. If I relate this to testing, it makes me think of the different coverage areas like structure, platform, function, operations, data, time, interfaces(9), hazard.(thx Ionut Albescu)

● “Danger after the quake: The danger doesn’t stop once the ground has stop shaking. Fires, landslides and even liquefaction can all cause damage and loss of life…Scientists and engineers have developed ways to deal with these dangers through defenses, warning systems and building design. But even with the best plans in place some communities can still be caught off guard”(3) → How many developers, testers, scrum masters,… think that maybe a person will be fired because he/she is not working fast enough with our software product? How will a developer sleep at night when his/her code caused, even indirectly a death, or a bankruptcy? There are consequences, but a lot of people don’t get the fact that they must assume also unintended consequences, which were triggered by what their product does. How will they deal with that?

They have developed defences, but it’s interesting that they speak about them with terms like: tools (4), models(5). We, in IT, use a lot the word “automation”…

When they speak about plans it’s very serious because it’s about people’s lives. They are not using some tools/techniques as a plan, they are guided by the reality of the situation. What a test plan means, for a lot of people: automation at the unit level, integration and maybe acceptance(BDD). And they add exploratory, although they are not able to articulate what it means and how to show/do it in a professional way → this is not a test plan.

What if aftershock are the equivalent of hotfixes? And we have a flow like this:

1. A bug appears

2.A quick hotfix is made, but in a hurry

3. As a result, that hotfix can cause undesired problems(maybe inadvertently), because of the chaos created by the initial bug.

The last sentence made me think at the japanese word “hansei”. Because even though those scientists built/are building, defense and warning mechanisms(kaizen), the things can still go wrong and this is sadness/regret.(6)

● “After the earthquake, responding to disaster: earthquakes and tsunamis can destroy home and buildings, transforming lives. The hours and days that follow the disastrous events can be vital for saving anyone who has been trapped as a result….As people come to terms with the destruction they can start with the process of building resilience – changing the way they live and act to deal with the risk of an earthquake in the future. This can leave them better prepared for future earthquakes“(3) → The keyword here is resilience, but a lot of IT people want robustness. We, in IT, have chosen the wrong metaphor. Rapid Software Testing(a Context Driven methodology) is fully aware of that, that’s why it’s so different from “Factory Style” testing(7), because it sees the context as an ecology, not as a factory(8).

Conclusion: Read James Bach’s post, then read the text from the museum again. I hope you will find it as useful as it was for me.

(1) James Bach, “Round Earth Test Strategy”, http://www.satisfice.com/blog/archives/4947

(2) https://rapid-software-testing.com/

(4) “Tectonic hazards/Earthquake engineering“, https://en.m.wikiversity.org/wiki/Tectonic_hazards/Earthquake_engineering

(5) “Improving defence against earthquakes and tsunamis”, https://www.ucl.ac.uk/news/2017/mar/improving-defence-against-earthquakes-and-tsunamis

(6) James Coplien, Interview, www.infoq.com – in this interview he explained scrum and these 2 japanese words, among other things – I can’t find the text, as link, but I have it as text, tough, in my personal archive.

(7) James Bach, Michael Bolton, “RST Appendices”, http://www.satisfice.com/rst-appendices.pdf – pages 3-6

(8) Alicia Juarrero, “Safe-Fail, NOT Fail-Safe”, https://vimeo.com/95646156

(9) James Bach, Michael Bolton, SFDIPOT, http://www.satisfice.com/rst.pdf

Leave a Reply Cancel reply