Expose the Machinery

Why is it so hard for software engineering leaders to justify architectural improvements?

When the CEO asks why you need to invest in upgrades, you can’t just say “trust me” and then get frustrated when your requests don’t get funded.

Imagine for a moment that you are a manufacturing engineering leader at an automotive factory, and various machines in the factory are worn with age and growing less reliable over time. Some of them no longer perform at the level that is needed. You might propose upgrading and modernizing. You would gather data demonstrating the problems. You would quantify the risk of delaying the upgrades to various future time horizons. You would cost out the upgrade options and recommend a solution. You would calculate the expected improvements in performance and reliability, and the projected net impact on the bottom line. And here’s the key, you might invite the CEO on a tour of the factory so they can see and experience the trouble spots first hand.

I think we have trouble justifying architectural investments in software engineering because, unlike in the automotive factory, our systems machinery is hidden away in a data center or “in the cloud.” However, if we demonstrate rigor and get creative about exposing the machinery to the CEO, the justification may become a lot easier.

The proposal might start something like this: 

”The core code of our inventory system is 5 years old and the implementation is brittle. As a result, we have suffered partial or complete downtime for an hour or more at least once a quarter for the past year. There is a high risk (95% probability) of continued incidents and longer outages. I’ve worked with finance to estimate a cost of $50-100K for each outage...” (then more data, options, recommendations, costs, bottom line impact, etc.).

Then, follow that conversation with a “tour of the factory floor.” Get creative! Invent ways to make the inventory system real. Show and explain some of the brittle code. Demonstrate one or more failure modes using diagrams or simulations. Show the real people who are impacted by outages and demonstrate how it costs the business. Force the system to fail right then and there and then demonstrate how you did it. Do an interpretive dance. Go crazy! Just make it real.

No more “trust me.” We can do better than that, if we use rigor and we expose the machinery.