Andrew Welch

Andrew Welch · Insights · #craftcms #frontend #devops

Making the web better one site at a time, with a focus on performance, usability & SEO

· 5 min read ·

Database & Asset Syncing Between Environments in Craft CMS

Syncing the database & assets between environments is an important workflow, but it’s a pain. Here’s some help

Workflow

Once you have developed and deployed a Craft CMS site, you then have the issue of ongoing maintenance, and often ongoing development. This requires that you come up with some kind of a system for syncing the database & assets between the live production server, and your local dev environment (and often a staging environment for client approval of changes, too).

Synchronizing databases is not new; techniques such as master/slave database replication have been around for years. Tools such as MySQL Workbench allow you to synchronize MySQL schema changes as well. However for all but the most elaborate website projects, these techniques are typically overkill.

And while Pixel & Tonic has announced Content Migrations for Craft 3, this is more of a framework for the future than anything that will allow you to adroitly move database schema changes between environments today.

So we need to come up with something that is practical, in terms of being relatively easy to set up and utilize on a daily basis.

The techniques presented here are not by any means the be-all and end-all of database & asset synching methodology. There are many ways to do it, and what I present here is far from perfect. However, it works pretty well for small to mid-size projects, and is a reasonable middle-ground between comprehensiveness and usability. It also doesn’t require a ton of setup work.

But first, let’s talk about code flow.

Code Flow

This is typically how a project should flow in terms of how we develop and deploy the websites we work on. The website itself exists in a git repo somewhere, whether you use GitHub.com private reposBeanstalkapp.comBitBucket.orgGitLab.com, or your own private Git server for your git repos, it doesn’t really matter. Just start using them.

This git repo serves both as a method to store your work in a collaborative, versioned manner, and also as a flow control device. All of the code you work on flows in and out of git. 

Note that this is a one-way street: we develop and test in local dev, then we push our changes to the git repo, and then we pull them down to live production and/or staging. While we may pull changes from other developers into our local dev environment, changes are never coming from live production or staging.

Updates to Craft CMS itself, as well as any plugins you use should always be done only in local dev, and then pushed to git after they’ve been tested. You can ensure this happens by setting  'allowAutoUpdates' => false in your craft/config/general.php file, so there’s no Update button for people to push. For more information, check out the Multi-Environment Config for Craft CMS article.

This helps us avoid heart-wrenching moments where an update causes our live production server to error, because we’ve tested it in local dev already. Worst-case, because the changes have all flowed through git, we can always roll back to a previous commit to get the website back up and running with alacrity.

This is much easier on your heart than having an “oh shit” moment, scrambling to find your most recent backups whilst you field your client’s panicked phone calls.

While a tutorial on git is beyond the scope of this article, you should have a similar one-way street paradigm when it comes to managing your various branches in git. Changes should never, ever be pushed directly to master. You should work in development or feature branches, and then merge master with them after you’ve tested the changes.

Come up with a system that works for you when it comes to branches in git, and stick to it adamantly. This will prevent you from creating a git graph that looks like Guitar Hero, and will save you many, many hours of frustration.

Exceptional developers have the best-practices habits they do not out of discipline, but out of pain. The pain of having done it the wrong way, and having had to deal with the fallout.

Link Enter: The Client

So this is all well and good, but what do we do now that we have a live website where the client is making constant changes to the database willy nilly, and uploads a new picture of their cat every day? Since we can’t expect the client to deal with database merges, the onus falls upon us.

Here’s what I suggest you do:

​Designate one server, either live production or staging, as the “gold master” from which all other environments pull from. All database schema changes are made there, and all new asset uploads happen there as well.

Whichever you choose, the client should be locked out of the other environment, to prevent them from accidentally making changes in the wrong place. A great way to do this is using the Maintenance plugin, which allows you to lock all non-admin users from making changes in the backend, and present them with a nice customizable message as well.

Another nice thing you can do for your clients (and yourself) is install the Environment Label plugin, which will clearly let you know what environment you’re working with at any given time.

So here’s what the database & asset flow might look like:

Db Assets Flow

In this case, we made our live production database the “gold master”, so we sync down to staging and local dev from it. This works well for the typical scenario where we develop a website, hand it off to the client, they use it for some months, and only occasionally ask us to make some changes/additions to it. “Sporadic enhancement”, let’s call it.

If instead you have a project where there is continuous development that’s being done on the website, often it’s better to designate staging as the “gold master”, and then you sync staging down to live production when changes are tested and approved. Both you and the client make changes in staging only, and the client needs to be made aware that changes they make will not be on live production immediately.

Tangent: The reason we’re using a script to sync assets is because we should be excluding all client-uploaded images & other files from the git repo via .gitignore. Failing to do so can result in all manner of hassles. What happens is the files will initially exist on live production when the client uploads them, then you download or sync them to local dev, and check them into git.

The problem crops up when you try to deploy back to live production, because the files already exist there, and git doesn’t want to overwrite existing untracked files. So it throws an error during deployment.

The golden rule is: if you created it, check it into git. If your client created it, exclude it from git, and sync it instead.

So okay, great. How?

Link Craft-Scripts to the Rescue!

Those of you who read my Hardening Craft CMS Permissions article are already familiar with using craft-scripts to secure Craft CMS install permissions. Well, we’re going to be using them again, but this time to sync the database & assets between environments. 

To use them, you’ll need to do the following:

  1. Download or clone the craft-scripts git repo
  2. Copy the scripts folder into the root directory of your Craft CMS project
  3. Duplicate the example.env.sh file, and rename it to .env.sh
  4. Add .env.sh to your .gitignore file
  5. Then open up the .env.sh file into your favorite editor, and replace REPLACE_ME with the appropriate settings.

All settings that are prefaced with LOCAL_ refer to the local environment where the script will be run, not your local dev environment.

The settings should all be fairly self-explanatory, but here’s my .env.sh for my local nystudio107.dev so you can see a working example:

# Craft Scripts Environment
#
# Local environmental config for nystudio107 Craft scripts
#
# @author    nystudio107
# @copyright Copyright (c) 2017 nystudio107
# @link      https://nystudio107.com/
# @package   craft-scripts
# @since     1.1.0
# @license   MIT
#
# This file should be renamed to '.env.sh' and it should reside in the
# `scripts` directory.  Add '.env.sh' to your .gitignore.

# -- GLOBAL settings --

# What to prefix the database table names with
GLOBAL_DB_TABLE_PREFIX="craft_"

# The path of the `craft` folder, relative to the root path; paths should always have a trailing /
GLOBAL_CRAFT_PATH="craft/"

# The maximum age of backups in days; backups older than this will be automatically removed
GLOBAL_DB_BACKUPS_MAX_AGE=90

# -- LOCAL settings --

# Local path constants; paths should always have a trailing /
LOCAL_ROOT_PATH="/home/vagrant/sites/nystudio107/"
LOCAL_ASSETS_PATH=${LOCAL_ROOT_PATH}"public/img/"

# Local user & group that should own the Craft CMS install
LOCAL_CHOWN_USER="vagrant"
LOCAL_CHOWN_GROUP="vagrant"

# Local directories relative to LOCAL_ROOT_PATH that should be writeable by the $CHOWN_GROUP
LOCAL_WRITEABLE_DIRS=(
                "${GLOBAL_CRAFT_PATH}storage"
                "public/img"
                )

# Local asset directories relative to LOCAL_ASSETS_PATH that should be synched with remote assets
LOCAL_ASSETS_DIRS=(
                "blog"
                "clients"
                "users"
                )

# Craft-specific file directories relative to LOCAL_CRAFT_FILES_PATH that should be synched with remote files
LOCAL_CRAFT_FILE_DIRS=(
                "rebrand"
                "userphotos"
                )

# Absolute paths to directories to back up, in addition to `LOCAL_ASSETS_DIRS` and `LOCAL_CRAFT_FILE_DIRS`
LOCAL_DIRS_TO_BACKUP=(
                "/home/forge/wiki.nystudio107.com"
                )

# Local database constants
LOCAL_DB_NAME="nystudio"
LOCAL_DB_PASSWORD="secret"
LOCAL_DB_USER="homestead"
LOCAL_DB_HOST="localhost"
LOCAL_DB_PORT="3306"

# If you are using mysql 5.6.10 or later and you have `login-path` setup as per:
# https://opensourcedbms.com/dbms/passwordless-authentication-using-mysql_config_editor-with-mysql-5-6/
# you can use it instead of the above LOCAL_DB_* constants; otherwise leave this blank
LOCAL_DB_LOGIN_PATH="localdev"

# The `mysql` and `mysqldump` commands to run locally
LOCAL_MYSQL_CMD="mysql"
LOCAL_MYSQLDUMP_CMD="mysqldump"

# Local backups path; paths should always have a trailing /
LOCAL_BACKUPS_PATH="/home/vagrant/backups/"

# -- REMOTE settings --

# Remote ssh credentials, user@domain.com and Remote SSH Port
REMOTE_SSH_LOGIN="forge@nystudio107.com"
REMOTE_SSH_PORT="22"

# Remote path constants; paths should always have a trailing /
REMOTE_ROOT_PATH="/home/forge/nystudio107.com/"
REMOTE_ASSETS_PATH=${REMOTE_ROOT_PATH}"public/img/"

# Remote database constants
REMOTE_DB_NAME="nystudio"
REMOTE_DB_PASSWORD="XXX"
REMOTE_DB_USER="nystudio"
REMOTE_DB_HOST="localhost"
REMOTE_DB_PORT="3306"

# If you are using mysql 5.6.10 or later and you have `login-path` setup as per:
# https://opensourcedbms.com/dbms/passwordless-authentication-using-mysql_config_editor-with-mysql-5-6/
# you can use it instead of the above REMOTE_DB_* constants; otherwise leave this blank
REMOTE_DB_LOGIN_PATH=""

# The `mysql` and `mysqldump` commands to run remotely
REMOTE_MYSQL_CMD="mysql"
REMOTE_MYSQLDUMP_CMD="mysqldump"

# Remote backups path; paths should always have a trailing /
REMOTE_BACKUPS_PATH="/home/forge/backups/"

Other than the X’d out password, this is the exact .env.sh that I use for my local dev environment.

You might wonder why all of this is in a .env.sh file, rather than in the script itself. The reason is so that the same scripts can be used in multiple environments such as local dev, staging, and live production without modification. We just create a .env.sh file in each environment, and keep it out of our git repo via .gitignore.

I upload any new images I need for my website on the production server via Craft’s Assets interface. Then to sync my local dev assets from the live production server, I simply cd to the scripts directory, and type:

./pull_assets.sh

Assuming you have set up ssh keys, you won’t even have to enter your password for the remote server. The output might look something like this:

vagrant@homestead /htdocs/nystudio107/scripts (master) $ ./pull_assets.sh
receiving incremental file list
blog/
blog/security.jpg
      1,285,927 100%    3.94MB/s    0:00:00 (xfr#1, to-chk=13/25)
blog/unix_permissions.png
         24,273 100%   67.73kB/s    0:00:00 (xfr#2, to-chk=10/25)
blog/unix_permissions_numbers.png
         28,358 100%   78.45kB/s    0:00:00 (xfr#3, to-chk=9/25)
blog/_desktop/
blog/_desktop/997321-code.jpg
        125,628 100%  343.65kB/s    0:00:00 (xfr#4, to-chk=7/25)
blog/_desktop/cache-bg.jpg
        136,138 100%  367.26kB/s    0:00:00 (xfr#5, to-chk=6/25)
blog/_desktop/image_optimzation.jpg
         66,463 100%  176.85kB/s    0:00:00 (xfr#6, to-chk=5/25)
blog/_desktop/lightning-trees.jpg
         92,460 100%  242.72kB/s    0:00:00 (xfr#7, to-chk=4/25)
blog/_desktop/rainforest-environment.jpg
        216,071 100%  559.70kB/s    0:00:00 (xfr#8, to-chk=3/25)
blog/_desktop/security.jpg
        165,875 100%  419.66kB/s    0:00:00 (xfr#9, to-chk=2/25)
blog/_desktop/112/twig_code.jpg
         90,046 100%  224.90kB/s    0:00:00 (xfr#10, to-chk=0/25)
*** Synced assets from blog
receiving incremental file list
*** Synced assets from users

For the curious, here’s what the script itself looks like. It’s pretty simple, really:

#!/bin/bash

# Pull Assets
#
# Pull remote assets down from a remote to local
#
# @author    nystudio107
# @copyright Copyright (c) 2017 nystudio107
# @link      https://nystudio107.com/
# @package   craft-scripts
# @since     1.1.0
# @license   MIT

# Get the directory of the currently executing script
DIR="$(dirname "${BASH_SOURCE[0]}")"

# Include files
INCLUDE_FILES=(
            "common/defaults.sh"
            ".env.sh"
            "common/common_env.sh"
            )
for INCLUDE_FILE in "${INCLUDE_FILES[@]}"
do
    if [ -f "${DIR}/${INCLUDE_FILE}" ]
    then
        source "${DIR}/${INCLUDE_FILE}"
    else
        echo 'File "${DIR}/${INCLUDE_FILE}" is missing, aborting.'
        exit 1
    fi
done

# Make sure the local assets directory exists
if [[ ! -d "${LOCAL_ASSETS_PATH}" ]] ; then
    echo "Creating asset directory ${LOCAL_ASSETS_PATH}"
    mkdir -p "${LOCAL_ASSETS_PATH}"
fi

# Pull down the asset dir files via rsync
for DIR in "${LOCAL_ASSETS_DIRS[@]}"
do
    rsync -a -z -e "ssh -p ${REMOTE_SSH_PORT}" "${REMOTE_SSH_LOGIN}:${REMOTE_ASSETS_PATH}${DIR}" "${LOCAL_ASSETS_PATH}" --progress
    echo "*** Synced assets from ${REMOTE_ASSETS_PATH}${DIR}"
done

# Make sure the Craft files directory exists
if [[ ! -d "${LOCAL_CRAFT_FILES_PATH}" ]] ; then
    echo "Creating Craft files directory ${LOCAL_CRAFT_FILES_PATH}"
    mkdir -p "${LOCAL_CRAFT_FILES_PATH}"
fi

# Pull down the Craft-specific dir files via rsync
for DIR in "${LOCAL_CRAFT_FILE_DIRS[@]}"
do
    rsync -a -z -e "ssh -p ${REMOTE_SSH_PORT}" "${REMOTE_SSH_LOGIN}:${REMOTE_CRAFT_FILES_PATH}${DIR}" "${LOCAL_CRAFT_FILES_PATH}" --progress
    echo "*** Synced assets from ${REMOTE_CRAFT_FILES_PATH}${DIR}"
done

# Normal exit
exit 0

It simply reads in the settings from the .env.sh file, and then uses the rsync command to copy down any asset directories that you’ve designated. rsync is pretty smart, it will only sync files that have changed, so it rarely takes long to do.

As a bonus, it will also pull down your rebrand and userphotos directories from craft/storage (which shouldn’t be checked into git).

Similarly, I make any database schema changes needed for the website on the live production server via Craft’s AdminCP interface. Then to sync my local dev database from the live production server, I simply cd to the scripts directory, and type:

./pull_db.sh

Assuming you have set up ssh keys, you won’t even have to enter your password for the remote server. The output might look something like this:

vagrant@homestead /htdocs/nystudio107/scripts (master) $ ./pull_db.sh
nystudio-db-dump-20161209.sql                                                               100% 2917KB   2.9MB/s   00:01
mysqldump: [Warning] Using a password on the command line interface can be insecure.
*** Backed up local database to /tmp/nystudio-db-backup-20161209.sql
mysql: [Warning] Using a password on the command line interface can be insecure.
*** Restored local database from /tmp/nystudio-db-dump-20161209.sql

Again for the curious, here’s what the script looks like:

#!/bin/bash

# Pull Database
#
# Pull remote database down from a remote and restore it to to local
#
# @author    nystudio107
# @copyright Copyright (c) 2017 nystudio107
# @link      https://nystudio107.com/
# @package   craft-scripts
# @since     1.1.0
# @license   MIT

# Get the directory of the currently executing script
DIR="$(dirname "${BASH_SOURCE[0]}")"

# Include files
INCLUDE_FILES=(
            "common/defaults.sh"
            ".env.sh"
            "common/common_env.sh"
            "common/common_db.sh"
            )
for INCLUDE_FILE in "${INCLUDE_FILES[@]}"
do
    if [ -f "${DIR}/${INCLUDE_FILE}" ]
    then
        source "${DIR}/${INCLUDE_FILE}"
    else
        echo 'File "${DIR}/${INCLUDE_FILE}" is missing, aborting.'
        exit 1
    fi
done

# Temporary db dump path (remote & local)
TMP_DB_PATH="/tmp/${REMOTE_DB_NAME}-db-dump-$(date '+%Y%m%d').sql"
BACKUP_DB_PATH="/tmp/${LOCAL_DB_NAME}-db-backup-$(date '+%Y%m%d').sql"

# Get the remote db dump
ssh $REMOTE_SSH_LOGIN -p $REMOTE_SSH_PORT "$REMOTE_MYSQLDUMP_CMD $REMOTE_DB_CREDS $MYSQLDUMP_SCHEMA_ARGS > '$TMP_DB_PATH' ; $REMOTE_MYSQLDUMP_CMD $REMOTE_DB_CREDS $MYSQLDUMP_DATA_ARGS >> '$TMP_DB_PATH' ; gzip -f '$TMP_DB_PATH'"
scp -P $REMOTE_SSH_PORT -- $REMOTE_SSH_LOGIN:"${TMP_DB_PATH}.gz" "${TMP_DB_PATH}.gz"

# Backup the local db
$LOCAL_MYSQLDUMP_CMD $LOCAL_DB_CREDS $MYSQLDUMP_SCHEMA_ARGS > "$BACKUP_DB_PATH"
$LOCAL_MYSQLDUMP_CMD $LOCAL_DB_CREDS $MYSQLDUMP_DATA_ARGS >> "$BACKUP_DB_PATH"
gzip -f "$BACKUP_DB_PATH"
echo "*** Backed up local database to ${BACKUP_DB_PATH}.gz"

# Restore the local db from the remote db dump
zcat "${TMP_DB_PATH}.gz" | $LOCAL_MYSQL_CMD $LOCAL_DB_CREDS
echo "*** Restored local database from ${TMP_DB_PATH}.gz"

# Normal exit
exit 0

What it does is it ssh’s into your remote server, then it dumps the database to /tmp using mysqldump, then it uses scp to copy the database dump down, backs up the local database to /tmp and finally it restores the database in the local environment by piping it into mysql. The database dumps are all timestamped as well.

If you’re using mysql 5.6 or later, you’ll note the warning from mysql (this is not an issue if you’re using MariaDB):

mysql: [Warning] Using a password on the command line interface can be insecure.

What the craft-scripts is doing isn’t any less secure than if you typed it on the command line yourself; everything sent over the wire is always encrypted via ssh. However, you can set up login-path to store your credentials in an encrypted file as per the Passwordless authentication using mysql_config_editor with MySQL 5.6 article.

If you set LOCAL_DB_LOGIN_PATH or REMOTE_DB_LOGIN_PATH it will use --login-path= for your db credentials on the respective environments instead of sending them in via the command line.

For example, for my local dev setup:

    mysql_config_editor set --login-path=localdev --user=homestead --host=localhost --port=3306 --password

…and then enter the password for that user. And then in the .env.sh I set it to:

    LOCAL_DB_LOGIN_PATH="localdev"

…and it will use my stored, encrypted credentials instead of passing them in via command line. You can also set this up on your remote server, and then set it via REMOTE_DB_LOGIN_PATH

Link Sync Away!

Note that since the craft-scripts use a .env.sh for all of their configuration, you can use the same scripts to sync from any environment to any environment.

Just set up a .env.sh on local dev, and you can sync to local dev from staging or live production. Set up a .env.sh on live production, and you can use these same scripts, the same way, to sync to live production from staging.

That’s it… now kick back and sync in style.

${ category } · ${ blog.postDate }

${ blog.title }

#${ tag }