The last time I heard about chaos engineering was about Netflix at AWS re:Invent 17.
And I thought to myself, that in my last job, we were doing chaos engineering as a service. That is:
- Writing code thinking that it could not fail.
- Working with sometimes poorly defined user stories.
- Using early open source software for critical components.
- Changing priorities several times a day.
- Deliver unfinished and/or not so tested components.
- Deploy on unstable private clouds.
I know what you think. How could a company challenge its employees to deliver software this way ? Well I’m sorry this was not my concern at the time and won’t be anytime in the future. I’m willing to say that, in a sense, we could describe the need to adapt as agility.
Agility means dealing with the problems. From the product point of view, say hi to the concept of System Observability.
I have been fortunate enough, very early in my career, to work in small teams in small companies, where the implied corporate culture was that everyone is a devops.
Back to my last job as a system architect. Fortunately I asked that logs about the business process happening was written all the time. Logging was done asynchronously to not alter the overall performance. Sometimes I joke about the fact that I’m personnaly never using the debugger to understand what is wrong in the code I write, I just look a the logs. And I really do most of the time.
That particular mindset helped a lot. Thanks to the use of meaningfull error messages, when the production was halted, we did not spent hours figuring out was could be wrong because we were able to quickly track a user (or an automated process) action through all the architectural layers, from the client to the database.
It was hard, but it was inspiring. And it was not a Serverless architecture, which is what I’m doing now. So now the challenge for me is to be able to observe a Serverless backend. I’ll share my findings in the next few months. Stay tuned.