Posted on 4th March 2025
Resilience in software development:
Planning for the unknown crisis
I recently attended nor(DEV):con - a local software developers' conference. Against the backdrop of great talks, conversions and a well-organised and worthwhile event, I did a lot of thinking. One of the most thought-provoking talks was a non-technical keynote about leadership. The central theme was the change and 'disruption' that we are experiencing. Of course AI was mentioned as a significant disruptive force in the industry, but there are a lot of other things also shifting and changing in current times...
Questioning the status quo
Underpinning a lot of what is happening in the world is 'disruption' or to look at it another way: unpredictability and instability. Thinking about this poses some important questions:
What happens when something major like a pandemic or a war occurs and what impact might that have?
We can't necessarily rely on our 'allies' or organisations we trust to be there and acting in our interests indefinitely. We have to entertain the possibility that these relationships may be compromised.
A specific example I've been thinking about - what if (for whatever reason - lets not speculate) we can no longer use a major supplier we might rely on in the tech industry?
For the sake of making this thought experiment feel real: let's assume that supplier is Microsoft. If you work in the tech sector, how would this impact you? How would you deal with it? And should we be seriously planning for the possibility of this kind of event?
Managing risk in software development
At the end of the day, there are many potential risks. If the risk / threat of a potential problem is assessed (determining its probability and impact), an appropriate strategy for managing that risk can be implemented. But I think now is a time when we need a heightened awareness of high impact risks in particular.
I'm not a tech security expert, so there are probably people better placed to advise on that angle in terms of crises and resilience. However, I do think those of us who are involved the software development process have important points to raise and a role to play too.
I don't have any big answers to potential scenarios like Microsoft, or any other major infrastructural partner we may rely on, becoming impossible to work with for whatever reason. But perhaps a shift towards thinking about these possibilities is the first step to mitigating them. For example:
- Where we deploy our code
- Where we store and transfer our data
- What dependencies we have
These are the main areas developers can concentrate on having an impact / influence in managing risk.
De-risking through Decoupling
Personally I'm interested in how we can decouple our projects from reliance on external resources. Maybe that just means picking a dependency or version that can be deployed on a wider range of devices: If you are using Microsoft's .NET framework, perhaps a small but important step is to consider using .NET core instead. But maybe it means planning for or using entirely different frameworks. It all depends on our assessment of the risk.
I think anything that can utilise, and be deployed on, tech that comprises of open-source components is one significant way to help with this 'decoupling'.
We should be naturally suspicious of cloud services that lock our software and our users' data into platforms from which it would be hard to extract them.
We should also not just create and use abstractions and tech which purport to allow us flexibility and portability, but actually test whether that is the case: To determine how easy it is to change components when necessary and how resilient a system is to absence of those components.
We should try unplugging development, test, build, deployment and production technology from the capability to access external / remote resources. As well as using network sniffing tools to inspect what communication is being made, it can also be useful to literally unplug hardware from the internet and see what effect that has when that communication is severed. This is one way to reveal our level of reliance on external factors.
There's a lot that can be done during software development to work towards reducing risks which may otherwise cost us significantly if they have to be dealt with at the time of a crisis. There's also a lot of decisions we might make differently with a greater awareness of potential risks.
As you might be able to tell, I'm energised about this topic in my work. I think the motivator of an increasing probability of high-risk scenarios might actually help drive forward our approach to portability and interoperability in the tech industry. It could help us to simplify our tech stacks somewhat and to use and contribute more to open source software. So while the potential increase in high-risk scenarios and crises is an uncomfortable aspect of our unpredictable and frankly scary times, it is also an opportunity to do things differently.