A primer on zero downtime deploy, atomic deployments, red-green deployments, and how to minimize downtime during database migrations.
Zero downtime deployment is a deployment method where your website or application is never down or in an unstable state during the deployment process. To achieve this the web server doesn’t start serving the changed code until the entire deployment process is complete.
In this article I’m going to assume you’re just using a single web server and not in a situation where you need a load balanced configuration. Zero downtime deployments are even better in those situations but the vast majority of websites run off a single web server instance.
Zero downtime is a type of deployment. There are limitless ways to make it happen. Off-the-shelf tools, homespun scripts, or complicated pipelines via something like Gitlab.
One of the most popular zero downtime deployment methods is the atomic deployment.
Atomic deployments are a style of code deployment that symlink the most recent version of the code so it’s available to the web server to serve.
The directory structure of an atomically deployed site looks like something like this:
current
deploy-cache/
releases/
20190504161640/
20190504164421/
20190504170431/
20190504172417/
The current
directory is actually a symlink to the most recent release directory inside of releases
. In this setup, we would point our web server to current/web
and then it would always be pointed to the latest version of the code.
Inside of the timestamped directories is a complete version of the website or web application code.
The setup has two important implications.
First, none of the new code is available to be executed until all of the code is completely deployed. Th entire code base is first saved in the timestamped directory inside of releases
(and with some tools first saved to a deploy-cache
directory).
Second, if there is a problem with the deployed code it’s very fast and simple to “roll back” the deployment by re-linking the current
symlink to the previous release (by date in the releases
directory). It doesn’t require a re-deploy or anything time consuming. The code is already there, waiting.
Craft CMS relies completely on Composer and the vendor
directory that Composer builds to hold the application code and all of its dependencies (including all plugins or modules you have installed).
If you’re doing a software update to Craft or plugins, it can take several seconds to download the updates into the vendor
directory. During this time the site code is unstable and likely incomplete. If visitors were allowed to visit the site during the update process it would likely cause errors.
Therefore, zero downtime deployments via atomic deployments allow the Composer to download and save the updated dependencies completely before the new version of the software is served by the web server.
Sometimes your deployments will just be code updates, like revised templates, but frequently there will be a database migration or some sort of database update that happens along with the code update. Even the Craft project config file will trigger a database update when Craft sees a change in the file.
To work around this, we want to run the project config sync or update migration immediately after all of the code is deployed and right after the new release is linked to the current
directory.
To do this we’d script our deployment process (deploying Craft with Envoyermakes this very easy) to run craft project-config/sync
and craft migrate/all
to run all Craft and plugin migrations.
(You could also visit the control panel URL for the website to kick off the migration.)
But you can probably see the gap in our zero downtime deployment. There’s going to be a very small window of time between the code deployment being live and when the database migration is complete. Craft CMS upgrade migrations are usually pretty quick but there is still a chance someone could access your site while the migration is in-progress.
How can we work around this?
One work-around is an old fashioned after-hours, scheduled deployment. Just the other day I had to do a Craft and plugin update on a site. I tested it locally, on a dev server, on the staging server, and everything went swimmingly. However, I still scheduled a deployment to production during off hours (around 8pm at night) in order to minimize the impact on visitors.
There’s another style of deployment called Blue-Green. It is intended to remove any downtime, even during updates that require database migration.
In this deployment setup there are two production environments. If you have two load balanced web servers and a database server then you need that times two.
How does it work? Martin Fowler puts is succinctly:
As you prepare a new release of your software you do your final stage of testing in the green environment. Once the software is working in the green environment, you switch the router so that all incoming requests go to the green environment — the blue one is now idle.
Blue-green deployment also gives you a rapid way to rollback — if anything goes wrong you switch the router back to your blue environment.
But what about database changes from user data (like here on CraftQuest where new rows of data are being created every few seconds) that happened in blue while green was being prepared to go live?
There’s still the issue of dealing with missed transactions while the green environment was live, but depending on your design you may be able to feed transactions to both environments in such a way as to keep the blue environment as a backup when the green is live. Or you may be able to put the application in read-only mode before cut-over, run it for a while in read-only mode, and then switch it to read-write mode. That may be enough to flush out many outstanding issues.
There will always be an issue to work around but blue-green might be a solid solution if you need truly zero downtime for you website.
My first brush with atomic deployment was years ago using Capistrano to deploy Ruby on Rails apps (and then later PHP apps). But now a bevy of tools over support for atomic deployments.
Here are a few (but a quick web search would likely turn up more):
Zero downtime deployments shouldn’t be a deployment type; it should be the deployment type. Unless we’re doing a major overhaul that requires an extended downtime period, all of our deployments should be of the zero-downtime variety.
But we might think we’re doing zero downtime deployments when we really aren’t. Let’s think back to the old days when we would just upload a bunch of updated files via SFTP. Sure, it might only take a minute or two but there was that in-between time when myImportantFunctions.php
was updated while the other file (probably logInUser.php
) wasn’t yet updated but still called functions in myImportantFunctions.php
.
You can see the problem here.
But it’s not just something as archaic as an SFTP upload. This can also be the case with those clever deployment setups that just use Git.
git-pull
the master
branch onto the server as a way to deploy changed code. While Git is very fast it still isn’t immediately updating all files at once and there could easily be code issues if someone is using your web application or website at the same time.
That’s where the concept of zero downtime deployments comes in. The website is always available and all code changes are made available at the same time.