2019-09-03

The Mathematics of Continuous Delivery

If you're used to traditional, i.e. fairly infrequent, software delivery, the idea of Continuous Delivery will probably seem hopelessly expensive. "Maybe it works for someone else, but it won't work for us," is a common thought. If normal feature deliveries typically takes days or weeks, it seems impossible to make them every time a programmer checks in some code.

You can probably deliver an emergency bug fix fairly rapidly in a pinch, but could you continuously make all your deliveries in that speed without making a mess? Those emergency fixes are special cases, and maybe done in a way which wouldn't be sustainable if all changes were made like that. Right?

Of course, it would be great if emergency fixes were as reliable and well tested as ordinary releases—they certainly deserve to be that, but isn't that just a dream?

Each delivery requires a substantial amount of time and resources, so surely making many deliveries, must be more costly than making fewer and bigger deliveries, right? If we rush a delivery, it's likely to turn out broken. Quality must be allowed to take time, and since the must let each delivery take a reasonable amount of time, we can't make them too often, so we need to include as many changes as we can in each delivery, right?

If we imagine that we produce \(m\) deliveries each year, with \(n\) changes in each delivery, we'll deliver \(x = mn\) changes in a year regardless of if it's \(x\) deliveries of  \(1\) change each, or  \(1\) delivery of  \(x\) changes each. How could the former be more efficient than the latter, if each delivery has some setup time etc?

There are some obvious benefits with frequent deliveries, and particularly with frequent deployment, such as being able to use new features sooner rather than later. I've also explained how deploying software in baby steps might reduce risks and downtime, in A Use Case for Continuous Deployment. In this text, I'm focusing on costs of delivery.

Let's do some math!

Call the total cost of a single software delivery \(C_{delivery}\). I don't think it matters a lot if we're thinking dollars, calendar days or person-hours here.

In your normal delivery process, there are probably parts that take as much time regardless of the size of the delivery. Maybe some IT person needs to install the new verson of the software on some server for instance. Let's call that part of the cost \(C_0\).

There are probably also parts of the delivery which will be proportional to the amount of changes we've made. For instance manual tests of these changes. That's \(C_1n\).

If this was all, things might be different, and you'd probably not be reading this article, hoping that there migh be a better way to deliver software. In all organizations I know, big releases have almost always taken longer in practice than they should have taken if everything went according to plan. The bigger the release, the less likely to go as planned.

A core issue in this is the interaction between the different changes in the release. In a delivery with \(n\) changes, each change might impact, or be impacted by the \(n-1\) other changes , so there are \(n(n-1)\) potential failure causes due to changes made in parallel, which is close to \(n^2\).

Even for bugs which are caused by a single change, it might not be obvious where the problem is located (unless you only made one change...). This means that for each hard to locate bug in a delivery of \(n\) changes, the time it takes to locate each bug is proportional to \(n\), and the number of such bugs is also proportional to \(n\). I.e. more factors that are propotional to \(n^2\).

It's also my experience, both as software developer and as responsible for software quality, that the time it takes to diagnose and solve a software defect, is stronly related, maybe proportional, to how long ago it was since the defect was introduced. If I mistype something and the compiler or a unit test help me discover this at once, I'll fix it in a couple of seconds. If someone reports a defect that was created months ago, it will probably take days before it's clarified that I'm the one who should fix it, and then I'm in the middle of something else that I need to finish, and after that it'll take time to reproduce that defect in some code I haven't touched in months, and I'll spend time recalling what that software was, and maybe I even have to figure out something done by someone who no longer works for us. More \(n^2\) terms, since the average age of changes when they are delivered is likely to be in proportion to \(n\) too.

There are probably higher order terms too, but let us stop here, and settle for the following equation for the cost of a delivery with \(n\) changes.

\begin{equation*}
   C_{delivery} = C_0 + C_1n + C_2n^2
\end{equation*}

This means that our yearly delivery costs for \(m\) deliveries are:

\begin{equation*}
   C_{year} = mC_{delivery} = mC_0 + mC_1n + mC_2n^2
\end{equation*}

How can we minimize \(C_{year}\) for a certain \(x\)? How many deliveries \(m = x/n\) should we make?

If we look at the middle term \(mC_1n\) first, we notice that it doesn't matter for this part of the equation if we make many small deliveries, or fewer but bigger. For a fixed \(x = mn\), this part will be \(xC_1\) per year either way. Of course it's good if \(C_1\) is as small as possible, but that's true whether we make one big mega-delivery, or many, tiny deliveries.

Looking at the remaining parts, \(mC_0\) and \(mC_2n^2\), we can make some observations:

If \(C_0\) is big, it seems we want to minimize \(m\) and prefer few and big deliveries.

On the other hand, looking at the last term, \(n^2\) means that  \(mC_2n^2\) can become the dominating term if \(n\) is large, even if \(C_2\) is small. That seems to suggest that small deliveries are better, and that Continuous Delivery becomes more important as your product grows, and get's more changes.

Besides, \(C_0\) is about IT processes, which should be standardized, predictable and repetitive. Ideal for automation, making \(C_0\) as small as possible, and thus \(mC_0\) reasonably small even if \(m\) grows larger.

\(C_2\) on the other hand is the surprise factor. It's the "how on earth could this cause that"-factor. While we can (and should) work to make problem solving and defect fixing systematic and controlled, it will require creativity and innovation, and we can be sure there will be some surprises and things we couldn't predict.

Hopefully, looking at deliveries—continuous or not—like this, helps you figure out how to approach challenges in software delivey.

Considering \(mC_0\), it's important to control and automate the repetetive steps in building, testing, configuring, installing and monitoring software, and also avoid handovers and enable development teams to deliver new versions of their products without red tape. There are plenty of good tools and practices for such things, but it also depends on a suitable system architecture.

Considering \(mC_1n\), manual testing and similar tasks should be performed and automated in parallel with development, so that we have good, automated regression tests etc when it's time to deliver.

The most important in dealing with \(mC_2n^2\) is to minimize \(n\), i.e. Continuous Delivery!

While programming is certainly a creative and sometimes unpredictable activity, software delivery is manufacturing, which should be standardized and automated, and benefits a lot from concepts such as Kaizen and Just-In-Time.