Andrew Welch

Andrew Welch · Insights · #backups #devops #craftcms

Making the web better one site at a time, with a focus on performance, usability & SEO

· 5 min read ·

Mitigating Disaster via Website Backups

A solid backup strategy is an insurance policy for your clients, and can make you the hero if disaster strikes

Tidal Wave Disaster

Building a website takes a ton of work; nobody knows this better than the web developers and designers who build the website. Clients at least understand the amount of work involved when they pay their invoice, and yet more often than not their investment isn’t protected with a proper backup strategy.

This would be like building a house, but not bothering to get home owners insurance.

While many hosting facilities and VPS services offer “snapshot” backups, what they do is essentially make a backup disk image of your server.

Linode Snapshot Backups

Linode “snapshot” backups

While this is useful if you want to restore the entire server to an arbitrary point in time, it’s pretty heavy-handed if all you need to do is restore a file that the client deleted (and you’ll lose anything else they’ve changed in the interim).

I view backups like this as good to have in an emergency, and useful if you want to do a backup before doing a major server upgrade, in case something goes awry. But they are not what I’m looking for most of the time; indeed, snapshot backups are not recommended as a way to reliably back up dynamically changing data such as a mysql database.

So let’s see if we can’t come up with more practical backup strategies

Implementing a proper backup strategy is a service that you can provide to your clients, because it has real value. So let’s have a look at how we can do it.

The first thing we need to do is logistically separate the content that we create from the content that the client creates. Content that we create, such as the design, HTML, CSS, JavaScript, templates, and graphics should all be checked into a git repository.

This allows you to collaborate with other developers, gives you an intrinsically versioned backup of the website structure, and allows you to easily deploy the website. Whether you use GitHub.com private repos, Beanstalkapp.com, BitBucket.orgGitLab.com, or your own private Git server for your git repos, it doesn’t really matter. Just start using them.

This all goes into one box, and we store that box in a git repo, so it’s already backed up. If you’re not doing it already, the time is now to get on board with using git repos. It’s getting to the point where it’s a standard part of web development.

Backup Boxes

Then the content that the client creates, in terms of the data they enter in the backend, images they upload, and so on goes into another box. The Database & Asset Syncing Between Environments in Craft CMS talks about this separation, and we’re going to leverage it here as well, again with the help of Craft-Scripts.

This box of client uploaded content is the part that we have to develop a backup strategy for.

Link Enter Craft-Scripts

Before we get into the nitty gritty of backups, let’s talk a little bit about the tools we’re going to use to make it happen.

Craft-Scripts are shell scripts to manage database backups, asset backups, file permissions, asset syncing, cache clearing, and database syncing between Craft CMS environments. In reality, they will really work with just about any CMS out there, but we’ll focus on their use with Craft CMS here.

You may already be familiar with Craft-Scripts, if you use them for Hardening Craft CMS Permissions or Database & Asset Syncing Between Environments in Craft CMS. They also have handy scripts for doing backups.

In a nutshell, the way Craft-Scripts works is you copy the scripts folder into each Craft CMS project’s git repo, and then set up a .env.sh (which is never checked into git via .gitignore) on each environment where the project lives, such as live production, staging, and local dev. For more on multiple environments, check out the Multi-Environment Config for Craft CMS article.

Then you can use the same scripts in each environment, and they will know things like how to access the database, where the assets are, etc. based on the settings in the local .env.sh

The Craft-Scripts documentation covers setting up the .env.sh in detail, so we won’t go into that here, however I think real-world examples can be helpful. So here’s the full .env.sh that I use on my local dev environment for this very website:

# Craft Scripts Environment
#
# Local environmental config for nystudio107 Craft scripts
#
# @author    nystudio107
# @copyright Copyright (c) 2017 nystudio107
# @link      https://nystudio107.com/
# @package   craft-scripts
# @since     1.1.0
# @license   MIT
#
# This file should be renamed to '.env.sh' and it should reside in the
# `scripts` directory.  Add '.env.sh' to your .gitignore.

# -- GLOBAL settings --

# What to prefix the database table names with
GLOBAL_DB_TABLE_PREFIX="craft_"

# The path of the `craft` folder, relative to the root path; paths should always have a trailing /
GLOBAL_CRAFT_PATH="craft/"

# The maximum age of backups in days; backups older than this will be automatically removed
GLOBAL_DB_BACKUPS_MAX_AGE=90

# -- LOCAL settings --

# Local path constants; paths should always have a trailing /
LOCAL_ROOT_PATH="/home/vagrant/sites/nystudio107/"
LOCAL_ASSETS_PATH=${LOCAL_ROOT_PATH}"public/img/"

# Local user & group that should own the Craft CMS install
LOCAL_CHOWN_USER="vagrant"
LOCAL_CHOWN_GROUP="vagrant"

# Local directories relative to LOCAL_ROOT_PATH that should be writeable by the $CHOWN_GROUP
LOCAL_WRITEABLE_DIRS=(
                "${GLOBAL_CRAFT_PATH}storage"
                "public/img"
                )

# Local asset directories relative to LOCAL_ASSETS_PATH that should be synched with remote assets
LOCAL_ASSETS_DIRS=(
                "blog"
                "clients"
                "users"
                )

# Craft-specific file directories relative to LOCAL_CRAFT_FILES_PATH that should be synched with remote files
LOCAL_CRAFT_FILE_DIRS=(
                "rebrand"
                "userphotos"
                )

# Absolute paths to directories to back up, in addition to `LOCAL_ASSETS_DIRS` and `LOCAL_CRAFT_FILE_DIRS`
LOCAL_DIRS_TO_BACKUP=(
                "/home/forge/wiki.nystudio107.com"
                )

# Local database constants
LOCAL_DB_NAME="nystudio"
LOCAL_DB_PASSWORD="secret"
LOCAL_DB_USER="homestead"
LOCAL_DB_HOST="localhost"
LOCAL_DB_PORT="3306"

# If you are using mysql 5.6.10 or later and you have `login-path` setup as per:
# https://opensourcedbms.com/dbms/passwordless-authentication-using-mysql_config_editor-with-mysql-5-6/
# you can use it instead of the above LOCAL_DB_* constants; otherwise leave this blank
LOCAL_DB_LOGIN_PATH="localdev"

# The `mysql` and `mysqldump` commands to run locally
LOCAL_MYSQL_CMD="mysql"
LOCAL_MYSQLDUMP_CMD="mysqldump"

# Local backups path; paths should always have a trailing /
LOCAL_BACKUPS_PATH="/home/vagrant/backups/"

# -- REMOTE settings --

# Remote ssh credentials, user@domain.com and Remote SSH Port
REMOTE_SSH_LOGIN="forge@nystudio107.com"
REMOTE_SSH_PORT="22"

# Remote path constants; paths should always have a trailing /
REMOTE_ROOT_PATH="/home/forge/nystudio107.com/"
REMOTE_ASSETS_PATH=${REMOTE_ROOT_PATH}"public/img/"

# Remote database constants
REMOTE_DB_NAME="nystudio"
REMOTE_DB_PASSWORD="XXX"
REMOTE_DB_USER="nystudio"
REMOTE_DB_HOST="localhost"
REMOTE_DB_PORT="3306"

# If you are using mysql 5.6.10 or later and you have `login-path` setup as per:
# https://opensourcedbms.com/dbms/passwordless-authentication-using-mysql_config_editor-with-mysql-5-6/
# you can use it instead of the above REMOTE_DB_* constants; otherwise leave this blank
REMOTE_DB_LOGIN_PATH=""

# The `mysql` and `mysqldump` commands to run remotely
REMOTE_MYSQL_CMD="mysql"
REMOTE_MYSQLDUMP_CMD="mysqldump"

# Remote backups path; paths should always have a trailing /
REMOTE_BACKUPS_PATH="/home/forge/backups/"

# Remote Amazon S3 bucket name
REMOTE_S3_BUCKET="backups.nystudio107"

The only thing I’ve changed is I’ve XXX’d out my REMOTE_DB_PASSWORD, everything else is exactly how I use it. Don’t worry about understanding what all of the settings are now, I’m presenting it here just to give you a feel for what it looks like fully configured.

Now that the intro to Craft-Scripts is out of the way, let’s deal with some disasters!

Link Backups for Disasters Big and Small

When we talk about disaster recovery, we have to realize that disasters come in different shapes and sizes, and prepare for likely scenarios. By far the most common “disaster” is that the client has somehow lost data due to deleting the wrong entry, or deleting an asset by mistake.

Disaster Recovery

In cases like this, what we really want are local backups that are easy to access on the server, and thus easy to restore. We want to ensure that the content that the client creates in the form of database entries and uploaded assets are tucked away safely, awaiting the inevitable human error.

Link Local Database Backups

So our first step is making sure that we keep daily backups of the database, for the times when client error causes data loss. For this, we’ll use the backup_db.sh script.

When this script is executed, it will make a local copy of the database, excluding cache tables we don’t want, neatly compressed and time-stamped, and save it in the directory your specify in LOCAL_BACKUPS_PATH.

It will also rotate the backups, in that it will delete any backups that are older than GLOBAL_DB_BACKUPS_MAX_AGE days old. This way, you’ll never have to worry about running out of disk space due to backups gone wild.

I’ve found that in general, problems are usually noticed within 30 days or so of them happening, but I’m paranoid, so I keep these local database backups around for 90 days. What you should set it to depends on your use-case, and how often you do the backups.

Here’s an example output after running backup_db.sh:

vagrant@homestead ~/sites/nystudio107/scripts (develop) $ ./backup_db.sh
*** Backed up local database to /home/vagrant/backups/nystudio/db/nystudio-db-backup-20170320-022335.sql.gz
*** 2 old database backups removed; details logged to /tmp/nystudio-db-backups.log

The numbers at the end of the backup archive are a timestamp in the format of YYYYMMDD-HHMMSS.

Link Local Asset Backups

So great, we have the client’s database locally backed up. Next we need to back up their assets, the files that they upload into the CMS. To do this, we’ll use the backup_assets.sh script.

This script uses rsync to efficiently back up all of the asset directories specified in LOCAL_ASSETS_DIRS to the directory specified in LOCAL_BACKUPS_PATH. A sub-directory LOCAL_DB_NAME/assets inside the LOCAL_BACKUPS_PATH directory is used for the asset backups.

backup_assets.sh will also back up the Craft userphotos and rebrand directories from craft/storage by default. The directories it will backup are specified in LOCAL_CRAFT_FILE_DIRS

Because rsync is used, the files are effectively mirrored into a separate local directory, so only files that have actually changed are backed up. This makes the backups very quick, and because the files are stored uncompressed, you have quick and easy access to restore that wonderful image of a fluffy poodle that the client deleted.

If a file is deleted from a LOCAL_ASSETS_DIR, it doesn’t get deleted from the LOCAL_BACKUPS_PATH, so you can easily find the file to rescue it.

Here’s example output from backup_assets.sh:

vagrant@homestead ~/sites/nystudio107/scripts (develop) $ ./backup_assets.sh
sending incremental file list
blog/
blog/backup-boxes.png
         21,175 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=144/152)
blog/_desktop/
blog/_desktop/backups-are-not-sexy.jpg
        294,064 100%   25.49MB/s    0:00:00 (xfr#2, to-chk=29/152)
blog/_desktop/tidal-wave-disaster.jpg
        320,383 100%   12.73MB/s    0:00:00 (xfr#3, to-chk=6/152)
*** Backed up assets from /home/vagrant/sites/nystudio107/public/img/blog
sending incremental file list
*** Backed up assets from /home/vagrant/sites/nystudio107/public/img/clients
sending incremental file list
*** Backed up assets from /home/vagrant/sites/nystudio107/public/img/users
sending incremental file list
*** Backed up assets from /home/vagrant/sites/nystudio107/craft/storage/rebrand
sending incremental file list
*** Backed up assets from /home/vagrant/sites/nystudio107/craft/storage/userphotos

Because rsync is used for these backups, you can put a .rsync-filter in any directory to define files/folders to ignore. More info…

For example, if you don’t want any Craft image transforms backed up, your .rsync-filter file in each assets directory might look like this:

# This file allows you to add filter rules to rsync, one per line, preceded by either
# `-` or `exclude` and then a pattern to exclude, or `+` or `include` and then a pattern
# to include. More info: http://askubuntu.com/questions/291322/how-to-exclude-files-in-rsync
- _*
- _*/**

If you have arbitrary directories that you want backed up that exist outside of your project directory, you can use the backup_dirs.sh script.

This script uses rsync to efficiently back up all of the asset directories specified in LOCAL_DIRS_TO_BACKUP to the directory specified in LOCAL_BACKUPS_PATH. A sub-directory LOCAL_DB_NAME/files inside the LOCAL_BACKUPS_PATH directory is used for the directory backups.

Because rsync is used for these backups, you can put a .rsync-filter in any directory to define files/folders to ignore. More info…

For example, if you have a wiki with data/cache and data/tmp directories that you don’t want backed up, your .rsync-filter file in the wiki directory might look like this:

# This file allows you to add filter rules to rsync, one per line, preceded by either
# `-` or `exclude` and then a pattern to exclude, or `+` or `include` and then a pattern
# to include. More info: http://askubuntu.com/questions/291322/how-to-exclude-files-in-rsync
- public/data/cache
- public/data/tmp

Link Backups of Backups Offsite

Fantastic, we’ve got all of the website structure we created backed up in git, and we have local database backups and local asset backups. We’re covered for the most common scenarios where data has been lost in one way or another.

But what about when something goes truly wrong, and our server isn’t accessible?

What we need is some inception: backups of backups
Inception

While it’s great to have local backups—and they are by far the most useful in practice—we also want to have offsite backups that can be used if the proverbial sh*t hits the fan.

For this, we’ll use the pull_backups.sh script which pulls down all of the backups from the REMOTE_BACKUPS_PATH on a remote server to the LOCAL_BACKUPS_PATH on the computer it’s run from.

This pulls down all of the database & assets we’ve backed up on our remote server via the backup_db.sh and backup_assets.sh scripts, and it does so via rsync so it’s very efficient in pulling down only the files that have changed.

This effectively gives us an offsite mirror of all of our local backups that we can easily access should the need arise. This offsite backup can be to a local computer, or it can be to another VPS that you spin up, as described in the How Agencies & Freelancers Should Do Web Hosting article.

Assuming you have set up ssh keys, you won’t even have to enter your password for the remote server. Here’s what the output of pull_backups.sh looks like:

vagrant@homestead /htdocs/nystudio107/scripts (develop) $ ./pull_backups.sh
receiving incremental file list
nystudio/db/
nystudio/db/nystudio-db-backup-20170317-000432.sql.gz
        435,059 100%    2.46MB/s    0:00:00 (xfr#154, to-chk=5/180)
nystudio/db/nystudio-db-backup-20170317-133213.sql.gz
        436,133 100%    1.65MB/s    0:00:00 (xfr#155, to-chk=4/180)
nystudio/db/nystudio-db-backup-20170318-183601.sql.gz
        436,381 100%    1.25MB/s    0:00:00 (xfr#156, to-chk=3/180)
nystudio/db/nystudio-db-backup-20170319-000001.sql.gz
        436,533 100%    1.01MB/s    0:00:00 (xfr#157, to-chk=2/180)
nystudio/db/nystudio-db-backup-20170319-002746.sql.gz
        436,821 100%  863.53kB/s    0:00:00 (xfr#158, to-chk=1/180)
nystudio/db/nystudio-db-backup-20170319-132355.sql.gz
        436,839 100%  743.21kB/s    0:00:00 (xfr#159, to-chk=0/180)
*** Synced backups from /home/forge/backups/nystudio
vagrant@homestead /htdocs/nystudio107/scripts (develop) $

If you’d like to sync your backups to an Amazon S3 bucket, Craft-Scripts have you covered there, too.

The sync_backups_to_s3.sh script syncs the backups from LOCAL_BACKUPS_PATH to the Amazon S3 bucket specified in REMOTE_S3_BUCKET.

This script assumes that you have already installed awscli and have configured it with your credentials. Here’s what the output of the sync_backups_to_s3.sh looks like:

forge@nys-production /htdocs/nystudio107.com/scripts (master) $ ./sync_backups_to_s3.sh
upload: ../../backups/nystudio/db/nystudio-db-backup-20170322-000001.sql.gz to s3://backups.nystudio107/nystudio/db/nystudio-db-backup-20170322-000001.sql.gz
*** Synced backups to backups.nystudio107

It’s recommended that you set up a separate user with access to only S3, and set up a private S3 bucket for your backups.

Link Automatic Script Execution

If you want to run any of these scripts automatically at a set schedule, here’s how to do it. We’ll use the backup_db.sh script as an example, but the same applies to any of the scripts.

If you’re using Forge you can set the backup_db.sh script to run nightly (or whatever interval you want) via the Scheduler.

Forge Scheduled Backups

Forge scheduled backups

If you’re using ServerPilot.io or are managing the server yourself, just set the backup_db.sh script to run via cron at whatever interval you desire.

Craft-Scripts includes a crontab-helper.txt that you can add to your crontab to make configuring it easier. Remember to use full, absolute paths to the scripts when running them via cron, as cron does not have access to your environment paths, e.g.:

    /home/forge/nystudio107.com/scripts/backup_db.sh

There we go, set and forget automated backups.

Link Becoming a Digital Nomad

The other fantastic benefit of implementing a backup system like this is that you effectively become a digital nomad. If you’ve set up your website via a provisioning service like Laravel Forge or ServerPilot.io as described in the How Agencies & Freelancers Should Do Web Hosting article, you’re no longer tethered to any particular hosting arrangement.

Digital Nomad

You can quickly spin up a new server, deploy your website to it by linking it to your git repo, pull your assets down to it, pull your database down to it, and away you go!

​This kind of freedom is a wonderful thing

It makes what used to be a scary, fraught-ridden process of moving to a new server a piece of cake! Gone are the days when you’re dreading a server migration, or you don’t update or enhance your server out of fear that you’ll break something.

Link Disaster Recovery Drills

The final thing that I strongly recommend that you do are disaster recovery drills. Use this newfound freedom as a digital nomad to actually put your backups to the test.

Spin up a new VPS, and try restoring a website from scratch.

Practice Drill

There’s no better way to gain confidence in your disaster recovery plan than to practice doing it. It sure beats sacrificing chickens and praying when you’re under the gun and facing an actual disaster.

To help you with this, Craft-Scripts comes with the restore_db.sh script. You pass in a path to the database dump, and it will restore it to the local database (after backing up the local database first). You can pass in a path to either a .sql database dump, or a .gz compressed database dump, either works.

Here’s the example output of the restore_db.sh script:

vagrant@homestead /htdocs/nystudio107/scripts (develop) $ ./restore_db.sh /home/vagrant/backups/nystudio/db/nystudio-db-backup-20170320-022335.sql.gz
*** Backed up local database to /tmp/nystudio-db-backup-20170321.sql.gz
*** Restored local database from /home/vagrant/backups/nystudio/db/nystudio-db-backup-20170320-022335.sql.gz

If all this seems like a lot of work, just consider it practice. Craft-Scripts does a lot of the heavy lifting for you. The first time you do it, it’ll take a bit of time to get familiar with how it all works, but after that you’ll gain the confidence that comes with experience.

And you’ll also gain a very useful—and billable—skill set in your repertoire.

${ category } · ${ blog.postDate }

${ blog.title }

#${ tag.title }