Picky Python: 2016

There are a number of good reasons for reasons using Continuous Delivery and Continuous Deployment. Yesterday, Jeff Campbell talked about "Making the Case for Continuous Delivery" at the Göteborg Continuous Delivery Meetup. His talk is online at Youtube.

He had many good points, but I thought that I'd like to complement that with a more concrete example, of how life can get much simpler if we work this way. There are two fundamental ideas here:

Complex problems usually get much simpler if we split them into several smaller problems, that we solve after each other.
A Continuous Deployment ability enables us to rapidly get small production changes into production, so that we can solve problems in several small steps, with real-world feedback between each step in a single working day (instead of several months).

Somewhat simplified, Continuous Deployment means that as soon as developers change your software, your system is built, tested and put in production. Continuous Delivery means that as soon as developers change your software, your system is built, tested and could be put in production if that's what you want. The benefit from the case below comes from actually putting changes into production in several small steps, but it does not depend on a practice of always doing Continuous Deployment.

Our Use Case

Imagine that we have a production system where we store information about customers. In the customer table, we've stored address information, but due to changes in business requirements, we'll soon need to keep track of both current and previous addresses. The old address fields won't suffice. We'll need a customer_address table where we can store several addresses for different date ranges for each customer.

The Traditional Solution

In traditional development, with releases maybe every month or less often, we'd develop a new version of the software, which contained the new support for several addresses, as well as all other features expected for this release. The new software would not expect any address information in the customer table. The replacement for current features would simply display the address from the new table where end_date was empty, and new features showing historical addresses, would use start_date and end_date to decide which address to show.

The programmer would probably not think so much about the migration of data when he changed the programs. His focus would be on the new business requirements, the future behaviour. Having implemented that, he (or a DBA) would look at data migration. The migration would go in four steps:

Perform a backup of the database.
Add the new table to the database schema.
Run a script to copy the address for each customer to the new table, with some (more or less correct) start date set.
Drop the old address columns in the customer table.

This data migration would need to happen in some "service window", when the system is not running. Downtime would be difficult to avoid. It's quite possible that other system changes in our monthly release caused additional database migrations.

When the service window is over, we resume operation. If all is well, we only had a few hours of downtime. Ok? YMMV. We're not home yet though... What if some other change in the system caused a blocker bug, and we need to revert to the old version of the system? We no longer have a database that supports that. In the best case, the time needed to revert to the old software will be as long as the upgrade (reverse data migration), unless we're willing to loose the data changes done after the upgrade (restore backup).

Even if we are careful and quality-minded, a traditional approach with infrequent releases will always mean that the risk for disruptions are much bigger than for a Continuous Deployment approach, since there are many changes, and each change is a risk. The consequence of each issue that occurs is also bigger, since there are more migrations to revert etc.

The Agile Solution

With a Continuous Deployment we can afford to be much more concerned with the transition of our production environment from its current state to its future state. A small, low risk change to a production environment should be a simple thing in a Continuous Delivery environment, so we can do that often without worries.

Our first step, would simply be to create the new table. That is a tiny change which has minimal impact on the running system.

Our second step, is to change our software, so that the code which adds, updates or removes address information, deals with the new table in addition to the old table. Reads are still just from the old table. This means that we duplicate data for a while. In the long run we don't want that, but in the short run, we put this in production and verify that we get exactly the content we expect in the new table. Nothing relies on the new data yet, and we haven't made any backwards incompatible changes. We can easily run the new code in parallel with the old code in the production environment, and reverting to the previous software version in case of trouble can be made with minimal impact on production.

Once we're convinced that creating and updating addresses in the new table works as expected, we launch a script which goes through all customer records and add all old addresses to the new table. Yet another low impact, low risk operation.

Once we have complete data in the new address table, we deploy a version of the software which reads address data from the new table. We can verify that everything works just as expected, and if there are problems, we can always revert to the previous software version. The address information in the old table is still there. We can also monitor that no code is reading address information from the old location any longer.

Now it's time to remove the functionality from the software which updates the address fields in the old customer table. We put this in production and make sure that it works as expected.

Finally, when we're convinced that everything works as expected, we drop the address columns from the customer table. By now, we're convinced that they are no longer used by any software. If there happens to be some legacy software we can't change which expects to find address in the customer table, we can provide that with a database view.

Conclusions

Instead of one software upgrade with a significant risk that something would go wrong, and a possibly severe impact if something went wrong, we've performed six quick, simple, very safe, low impact changes to the system. We've been able to observe how each change worked, and we've had the ability to adapt to any discoveries we made at any step in the process.

In the nominal case, we have a service window of a few hours in the traditional approach, and no impact on production at all in the agile approach.

If we only want to perform a production release on a monthly basis, but want similar confidence and low impact as the agile approach, we'd have to release a preparatory version now (with other, non-related features and/or bug-fixes in it), get the new behaviour into production in the next release, a month from now (if all went as expected with the preparatory release), and get the data redundancy a little more than two months from now. Isn't it better to be finished today?

Picky Python

2016-03-04

A Use Case for Continuous Deployment

Our Use Case

The Traditional Solution

The Agile Solution

Conclusions