Yesterday, I experienced how things that we take for granted can all suddenly go off the rails when the most basic practice is not followed. How complex systems, even when that complexity is carefully hidden from end users, can fail at the most inopportune time, and for the simplest reason. How this can affect the confidence you have placed in that system, thus making you think that you might need a new and better one, with more features, etc. Too abstract? All right, here is a concrete example.
Just a regular car tune-up
I have, what is considered by many, a good and reliable car. Not super fun to drive, but efficient, with low maintenance costs. I always go to the dealership for tune-ups; I tell myself that they are the ones who know the most about my model, and how to efficiently maintain it. So two days ago, I went for a regular tune-up; there was nothing special to report. Everything went well, and it didn’t cost an arm and a leg either.
A crucial meeting
The next day, I had a very important meeting downtown and had to drive about 45 minutes in traffic that is heavier than I typically drive in. I was not exactly on time and was worried I might be late for the meeting. Yes, you guessed it: when you fear being late, what happens? In my case, a protest was blocking virtually the entire downtown core.
Was something missed during the tune-up?
How can things get worse, you ask? Simple. With all the stop and go, the car started to act up at every single stop. The RPMs were continuously decreasing when the car was at a standstill instead of stabilizing at the value it typically stabilized at before the tune-up… Until the engine stalled altogether… every time I stopped! In an already stressful situation, I had to continuously give the engine gas, even when stopped, to avoid having the engine stall… Not exactly good for air quality either! And with a standard transmission, I had to play a lot with the pedals, the gear shift, the hand break, lots of fun…
The meeting… and this nagging thought about changing cars
I finally reached my important meeting, completely stressed out and already worried about how I would make it back home after. I can tell you that even before I knew what the issue was with the car, I had plans to go buy another one.
Now, how do I get back home safely?
After the meeting, I called my dealership, where I had gone for the tune-up, and told the service representative how the car was behaving. He suggested that I go to another dealership that was close to where I was and that they would pay for what he thought would be minor repair/adjustment. So I went. But, because of the ongoing protest, it still took me 45 minutes to drive the one km between the lot where I had parked and the new dealership, still giving the engine way too much gas for the speed at which I was driving!
By the time I arrived at the second dealership, my check engine light went on. I thought that it did not bode well…
The (way-too-easy) fix
Once there, they took my car keys and said they would look into it; I thought that it would take forever and eventually cost me a small fortune. Fifteen minutes later, the service representative tells me:
The problem is solved. We will charge your regular dealership for half an hour of work. Thank you for swinging by.
I was almost shocked at how fast it went… and a little surprised. My engineering spirit (by now you know it is NOT a mechanical engineering spirit!) wanted to know what they had done. It is then that he simply told me:
During the tune-up at your regular dealership, their technician simply forgot to reconnect an air hose. The air-gas mix was therefore not good, causing the car to stall when the vehicle came to a stop with the engine running.
OK, but I thought your blog was about good software
Simple hey? And I was mentally prepared to buy a new car… So it made me think that, in the world of software, we often witness the same thing. A car, like software, is somehow a complex system. Like software, the goal of the maker is to hide this complexity as much as possible. Like in software, even if everything is done right every time, if one simple practice is not followed properly just once, this can suddenly highlight the true complexity and ruin the whole system completely. A car stalling at every stop is as useful as software crashing every time you start it. In my case, an improperly connected hose transformed my car into a useless pile of metal. With software, if the smallest practice is not followed, the same thing can happen and ruin the experience for your users, who may very well switch to your competitors and possibly never look back. Switching software, might I remind you, is much easier than switching cars! This is sometimes called a moment of truth (United Breaks Guitars moment of truth).
Criticality, complexity and software systems
So when dealing with complex systems, we should always remember that
- The more critical the system,
- the more important all the practices around it become.
- the more important the non-functional requirements become (such as diagnostics in my car example).
- the more important the relationships with your users become.
Just imagine for a second that a bank is performing simple maintenance on their website, and this causes random transfers of money between customer accounts. They then find out hours later that the technician performing the maintenance simply forgot to initialize a script variable. It is not the most realistic example, but it demonstrates how a simple practice that is not followed can have huge consequences.
And do not forget that software, the one you are building for your users, is possibly easier to replace than a car or a financial services provider!