Articles tagged with linux tag

PostgreSQL Backup and Recovery Orchestration: systemd Automation

Posts in this series have described the basic automation of PostgreSQL backup/recovery strategy. The process itself consists of different periodic tasks that shouldn’t be executed manually. There are essentially two tools dedicated to periodic task running in Linux: cron and systemd.

Cron used to be my first choice of automation in Linux, as it’s very easy to use. On the other hand, it’s quite messy (running crontab -e under different users to find out which one has the job defined) and a bit difficult to test - many times I ran into a situation when underlying bash script executed just fine, while cron job kept failing for reason unknown.

My own cron experience together with a few words from a workmate brought me into the arms of systemd, which is a Linux system and service manager. It’s capable of running periodic tasks just like cron, yet making it more transparent.

Important bits

Understanding the whole systemd is way out of scope of a poor GIS guy, yet I managed to tame three important parts of the ecosystem:

services
timers
targets

Services

Service is a configuration saved inside “.service” file specifying what you want systemd to do. Following code shows how you can tell systemd to vacuum your database once in a while.

[Unit]
Description=CR vacuumdb
OnFailure=unit-status-mail@%n.service unit-status-slack@%n.service
Wants=cr-sunday.timer

[Service]
User=postgres
Group=postgres
Type=simple
ExecStart=/bin/bash /usr/local/sbin/pgsql-vacuumdb.sh --port %i

[Install]
WantedBy=cr-sunday.target

Unit files come with several handy features. First of all, they are orchestrated with systemctl. Second, any service configuration file containing @ in its filename might be symlinked/copied and run for different instances. Third, notice OnFailure directive in the code above. If anything goes wrong, systemd might serve as a postman delivering the bad news. I set up both e-mail and Slack notifications and they’ve been working like a charm ever since.

On top of that, I find systemd orchestration much easier to test and maintain compared to cron.

With the above code saved in /lib/systemd/system/[email protected], you can copy the file to /lib/systemd/system/[email protected], /lib/systemd/system/[email protected] etc. If you look at ExecStart part of the service file, you’ll notice %i being used at the end - a placeholder replaced with the string between @ and .service in the filename.

This systemd service file is no more than a simple wrapper around the following bash code. We run three different database clusters on one machine and this approach makes their maintenance pretty comfortable.

#!/bin/bash
#
# @author: Michal Zimmermann &lt;[email protected]&gt;
# Vacuums the whole database cluster running on a given port.

while [[ $# &gt; 0 ]]
do
    key="$1"

    case $key in
        -p|--port)
            PORT="$2"
            shift
            ;;
        *)
            echo "Usage: `basename $0` --port|-p [port_number]"
            exit 1
            ;;
    esac
    shift
done

if [[ -z "$PORT" ]]
then
    echo "Port not provided!"
    $0 *
    exit 2
fi

/usr/bin/vacuumdb -U postgres -p $PORT --all --full --analyze

What you get so far is the possibility to run systemctl start pgsql-vacuumdb@5432 instead of calling the underlying bash code manually. Not much, really. That’s where timers come to the party.

Timers

Timer files ends with “.timer” and are responsible for running services on given time. The code below, coming from /lib/systemd/system/cr-sunday.timer file runs the pgsql-vacuumdb service every Sunday at 3:45 am.

[Unit]
Description=CR Sunday timer

[Timer]
OnCalendar=Sun *-*-* 03:45
Persistent=yes
Unit=cr-sunday.target

[Install]
WantedBy=multi-user.target

Targets

Target files end with “.target” and are used to group units in general. In our case, the target file for vacuumdb service is as simple as the following code.

[Unit]
Description=CR Sunday target
StopWhenUnneeded=yes

Targets might be called by other targets. Running systemctl start cr-sunday.target would eventually lead to running all the services wanted by that target.

As I already mentioned, I find systemd services easy to code and test. If any of them should fail, you’d find a message in syslog or via systemctl status pgsql-vacuumdb.

PostgreSQL Backup and Recovery Orchestration: Bash Automation

There is a bunch of periodic database-related tasks in a life of PostgreSQL administrator. Some should be done daily, others weekly, others can wait for a whole month. Many of them are essential for your database health. Forget to run such a task or screw up the run accidentally, and you’ll be snowed under with fixing your database.

Those tasks are easily done with bash, which is the first step to full automation. Following tasks are perfect candidates to be implemented as bash scripts:

full backups (both creation and removal)
WAL backups (both creation and removal)
vacuum
pgBadger log analysis (both creation and removal)
log maintenance (if you don’t use log rotate)

Full backup creation is just one example of how powerful bash can be.

#!/bin/bash
#
# @author: Michal Zimmermann &lt;[email protected]&gt;
# Creates base backup.

CUR_DIR=$(dirname "$0")
if [[ ! -f ${CUR_DIR}/pgsql-common.sh ]]
then
    echo "pgsql-common.sh not found!"
    exit 1
fi

source "${CUR_DIR}/pgsql-common.sh"
source "$CONFIG"

if [[ -d ${CR_BASE_BACKUP_DIR}/${CR_LABEL} ]]
then
    echo "${CR_BASE_BACKUP_DIR}/${CR_LABEL} already exists and is not empty!"
    exit 2
fi

pg_basebackup \
    --pgdata=${CR_BASE_BACKUP_DIR}/${CR_LABEL} \
    --format=plain \
    --write-recovery-conf \
    --wal-method=stream \
    --label=${CR_LABEL} \
    --checkpoint=fast \
    --progress \
    --verbose

if [[ $? -gt 0 ]]
then
    rm -rf ${CR_BASE_BACKUP_DIR}/${CR_LABEL}
    echo "pg_basebackup on ${CR_LABEL} failed!"
    exit 3
fi

tar -czf ${CR_BASE_BACKUP_DIR}/${CR_LABEL}.tar.gz ${CR_BASE_BACKUP_DIR}/${CR_LABEL} &amp;&amp; rm -rf ${CR_BASE_BACKUP_DIR}/${CR_LABEL}

As you probably noticed, a pgsql-common.sh file is sourced at the beginning of the script. This script in turn just loads the proper config file that provides variables to other, devops, scripts. As you might need those variables in several of your scripts, it is a good idea to put their load to a separate file.

#!/bin/bash
#
# @author: Michal Zimmermann &lt;[email protected]&gt;
# Sourced in pgsql-*.sh scripts.

while [[ $# &gt; 0 ]]
do
    key="$1"

    case $key in
        -c|--config)
            CONFIG="$2"
            shift
            ;;
        *)
            echo "Usage: `basename $0` --config|-c [config_file]"
            exit 1
            ;;
    esac
    shift
done
# /Input parameters

if [[ -z "$CONFIG" ]]
then
    echo "Config file is not set! See the script usage below."
    $0 *
    exit 2
fi

if [[ ! -f "$CONFIG" ]]
then
    echo "$CONFIG not found!"
    exit 3
fi

A config file might remain as simple as this:

# Base backup location
export CR_BASE_BACKUP_DIR="/mnt/backup/symap/base/"
# WAL backup location
export CR_WAL_BACKUP_DIR="/mnt/backup/symap/wal"
# PostgreSQL WAL location
export CR_PG_XLOG_DIR="/var/lib/postgresql/10/symap/pg_wal"
export CR_PG_LOG_DIR="/var/lib/postgresql/10/symap/pg_log"
# Base backup pattern (set to YYYYMMDD)
export CR_LABEL=symap_$(date +%Y%m%d)
export PGPORT=5432

Another, likely the simplest, example is a vacuumdb task:

#!/bin/bash
#
# @author: Michal Zimmermann &lt;[email protected]&gt;
# Vacuums the whole database cluster running on a given port.

while [[ $# &gt; 0 ]]
do
    key="$1"

    case $key in
        -p|--port)
            PORT="$2"
            shift
            ;;
        *)
            echo "Usage: `basename $0` --port|-p [port_number]"
            exit 1
            ;;
    esac
    shift
done

if [[ -z "$PORT" ]]
then
    echo "Port not provided!"
    $0 *
    exit 2
fi

/usr/bin/vacuumdb -U postgres -p $PORT --all --full --analyze

Tips

Always test your bash scripts before production deployment. Even a single line of code might lead to a very different, possibly unexpected, outcome.
Try to stay as defensive as possible. Imagine a variable did not get sourced properly. Is it going to blow your database? Trust me, I know what I am talking about (see the tweet below).

that feeling when you blog about keeping your #postgresql data safe and few days later you accidentally bzip all postgres-owned files WHILE the database is running with untested #bash script. What a #tuesday!
— Michal Zimmermann (@zimmicz) 27. února 2018

Pitfalls

You do not want to run your bash scripts by hand. You probably do not want them to be run by cron. You want to run them with systemd. More on this next time.

PostgreSQL Backup and Recovery Orchestration: Recovery

PostgreSQL continuous backups are very powerful, if you know how to use them for recovery. There’s nothing else to do to be sure about that other than actually trying it. Personally, I see recovery as a single process with two possibly different outcomes:

you’re recovering to the same state your cluster is/was in (because of a hardware failure, provider switch, …) - it’s more of a data migration, but you need your backup anyway
you’re doing a point-in-time-recovery (someone dropped the wrong table, data got corrupted, …)

Both scenarios follow the same steps and differ slighty at the end.

Stop the PostgreSQL cluster.
Copy the current PGDATA_DIR somewhere safe, just in case you screw up.
Replace the PGDATA_DIR with the full backup. If you start the cluster right away, it will boot to the last full backup state (in my case, missing a week of WAL segments tops).

General recovery

In this case, you’re trying to recover as far as possible. With previous steps done succesfully, the next follow:

Copy all archived WAL segments created after the last full backup to PGDATA_DIR/pg_xlog. These can be found with find -newer command run against the corresponding .backup file in your wal-archive/u/p directory.
If your full backup strategy includes recovery.conf file creation, you cane safely move it or remove it.
Start the database cluster again. It is going to boot to the last working state.

If you’re about to migrate your data, you might be better off with simple pg_dump, pg_dumpall and pg_restore commands rather than using full backup/WAL segments combination.

Point-in-time-recovery

PostgreSQL’s PITR can help you restore your accidentally deleted/corrupted data. After the first three steps mentioned above, you should follow with these:

Copy all archived segments created after the last full backup somewhere the PostgreSQL user can read them (/your-wal-recovery-folder/ for example).
Set up the recovery.conf file properly. If you know something nasty happened at 2018-01-29 08:00:00, try to recover right to that point (or to any other, as described in the documentation).

restore_command = 'cp /your-wal-recovery-folder/%f "%p"'
recovery_target_time = '2018-01-29 08:00:00'

Start the database cluster again. It is going to boot to the last full backup and then play all the WAL segments until the recovery target. Depending on how many WAL segments are about to be used, this might take a while.

Pitfalls

You don’t want to find yourself in the middle of the biggest database failure of the century just to find out your backups don’t work, and even if they did, you would have no idea how to use them. Or, even worse, there are no backups at all, because your backup strategy has been failing silently without a single notice for several months.

Tips

Try to recover from your backups once in a while.

I forget things and make mistakes. We all do. That’s why I built an ensemble that takes care of our database automatically. Nothing fancy, just a bunch of good old Bash scripts managed with systemd rathern than cron. Next time, I’d like to show you the code and walk you through our current setup.

PostgreSQL Backup and Recovery Orchestration: WAL Archiving

Just a very few of my day-to-day work tasks can be accomplished without PostgreSQL. For years I’ve been a (power) user of this wonderful relational database, knowing almost nothing about how its internals really work. Faced with the need to build a backup and recovery strategy, I’ve recently read up a lot on this topic.

As I don’t find it very odd for a GIS person to be given such an extraordinary task (nobody wants to lose the priceless spatial data, right?), I hope this series might shed light on how to prepare and manage the backup/recovery process to those, who are up to such a task. I won’t be discussing backup strategies based on pg_backup tool, as those don’t offer neither continuous archivation, nor point-in-time-recovery (PITR) - those two features disqualifies it as CleverMaps production backup strategy.

That leaves us with taking periodic base backups combined with continuous WAL archivation, as described below.

Taking base backups

Archived WAL segments are worthless without a base backup they can be run on. It’s crucial to have consistent, periodic base backups to keep your data safe.

pg_basebackup takes base backup of PostgreSQL cluster. Nothing fancy. Gzipping the output folder once the backup is done is definitely a good idea.

pg_basebackup \
    --pgdata=/mnt/backup/base/backup_number \
    --format=plain \
    --write-recovery-conf \
    --xlog-method=stream \
    --label=${CR_LABEL} \
    --checkpoint=fast \
    --progress \
    --verbose

In our current environment, we take a base backup of each of our clusters once a week.

WAL archiving configuration

To properly set WAL archiving, several postgresql.conf settings has to be adjusted:

wal_level = replica
archive_mode = on
archive_command = test ! -f /backup/wal/%f && cp %p /backup/wal/%f

Setting wal_level to replica writes enough information for WAL archiving. Turning on archive_mode will run archive_command each time a WAL segment is completed. archive_command might be anything from simple cp to rsync or aws s3 cp commands. It is absolutely critical that the command returns non-zero exit code in case of failure (including when a file with the same name already exists in your backup folder).

That’s it, after reloading PostgreSQL service, new WAL files should be copied to /backup/wal directory. The PostgreSQL process user (postgres usually) has to be able to write to the location.

Pitfalls

If archive_command fails, WAL segment remains on your database drive. If it keeps failing long enough, you’ll run out of space and the database will crash.
If the backup location fills up, the above-mentioned happens as well.
If you lose or corrupt any of the archived WAL segments, you won’t be able to pass through. That’s why you want to be sure that your archive_command actually does what you think it does.

Tips

It might be a real PITA (fiddling around WAL segments included) to start a crashed database cluster with no space left. Keeping a dummy file in your pg_xlog location might save you a lot of trouble. Create one with following command. If you run out of space, remove this file and you get 300 MB for free. Don’t forget to recreate it after you start the cluster.

dd if=/dev/zero of=/path_to_your_database_cluster/pg_xlog/DO_NOT_MOVE_THIS_FILE bs=1MB count=300

There’s no need to keep archived WAL segments forever. They’re only needed until you take another base backup. Again, deleting WAL segments manually (or using find ! -newer previous_base_backup.tar.gz) might lead to accidental corruption of your backups. It’s much safer to use pg_archivecleanup pointed to your WAL backup folder, referencing the last sucessful full backup. Below is the script we use to keep our WAL backup folder of reasonable size, keeping the last three full backups.

# Find base_backup files not older than 3 weeks
# Sort by date
# Use the oldest one as a reference
OLDEST_BASE_BACKUP=$(basename $(find ${CR_WAL_BACKUP_DIR}/u/p/ -type f -iname "*.backup" -mtime -21 -print0 | \
xargs -0 ls -t | \
tail -n 1))

# Find all subfolders
# Except the u/p backup subfolder
# Execute pg_archivecleanup for each of the subfolders
find $CR_WAL_BACKUP_DIR \
    -type d \
    -not -path "${CR_WAL_BACKUP_DIR}u*" \
    -exec pg_archivecleanup -d {} $OLDEST_BASE_BACKUP \;

Functional backups are crucial part of a solid backup/recovery system. They’re still just one half of that system, though. If not tested thoroughly, they’re even less than that. More on testing backups and recovering from failures next time.

Installing PostGIS 2.2 with SFCGAL on Ubuntu-based OS

I’ve seen a bunch of questions on GIS StackExchange recently related to SFCGAL extension for PostGIS 2.2. Great news are it can be installed with one simple query CREATE EXTENSION postgis_sfcgal. Not so great news are you have to compile it from source for Ubuntu-based OS (14.04) as recent versions of required packages are not available in the repositories.

I tested my solution on elementary OS 0.3.1 based on Ubuntu 14.04. And it works! It installs PostgreSQL 9.4 from repositories together with GDAL and GEOS and some other libs PostGIS depends on. PostGIS itself, CGAL, Boost, MPFR and GMP are built from source.

Here comes the code (commented where needed).

sudo -i
echo "deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main" | tee -a /etc/apt/sources.list
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
apt-get update
apt-get install -y postgresql-9.4 \
    postgresql-client-9.4 \
    postgresql-contrib-9.4 \
    libpq-dev \
    postgresql-server-dev-9.4 \
    build-essential \
    libgeos-c1 \
    libgdal-dev \
    libproj-dev \
    libjson0-dev \
    libxml2-dev \
    libxml2-utils \
    xsltproc \
    docbook-xsl \
    docbook-mathml \
    cmake \
    gcc \
    m4 \
    icu-devtools

exit # leave root otherwise postgis will choke

cd /tmp
touch download.txt
cat &lt;&lt;EOT &gt;&gt; download.txt
https://gmplib.org/download/gmp/gmp-6.0.0a.tar.bz2
https://github.com/Oslandia/SFCGAL/archive/v1.2.0.tar.gz
http://www.mpfr.org/mpfr-current/mpfr-3.1.3.tar.gz
http://downloads.sourceforge.net/project/boost/boost/1.59.0/boost_1_59_0.tar.gz
https://github.com/CGAL/cgal/archive/releases/CGAL-4.6.3.tar.gz
http://download.osgeo.org/postgis/source/postgis-2.2.0.tar.gz

EOT

cat download.txt | xargs -n 1 -P 8 wget # make wget a little bit faster

tar xjf gmp-6.0.0a.tar.bz2
tar xzf mpfr-3.1.3.tar.gz
tar xzf v1.2.0.tar.gz
tar xzf boost_1_59_0.tar.gz
tar xzf CGAL-4.6.3.tar.gz
tar xzf postgis-2.2.0.tar.gz

CORES=$(nproc)

if [[ $CORES &gt; 1 ]]; then
    CORES=$(expr $CORES - 1) # be nice to your PC
fi

cd gmp-6.0.0
./configure &amp;&amp; make -j $CORES &amp;&amp; sudo make -j $CORES install

cd ..
cd mpfr-3.1.3
./configure &amp;&amp; make -j $CORES &amp;&amp; sudo make -j $CORES install

cd ..
cd boost_1_59_0
./bootstrap.sh --prefix=/usr/local --with-libraries=all &amp;&amp; sudo ./b2 install # there might be some warnings along the way, don't panic
echo "/usr/local/lib" | sudo tee /etc/ld.so.conf.d/boost.conf
sudo ldconfig

cd ..
cd cgal-releases-CGAL-4.6.3
cmake . &amp;&amp; make -j $CORES &amp;&amp; sudo make -j $CORES install

cd ..
cd SFCGAL-1.2.0/
cmake . &amp;&amp; make -j $CORES &amp;&amp; sudo make -j $CORES install

cd ..
cd postgis-2.2.0
./configure \
    --with-geosconfig=/usr/bin/geos-config \
    --with-xml2config=/usr/bin/xml2-config \
    --with-projdir=/usr/share/proj \
    --with-libiconv=/usr/bin \
    --with-jsondir=/usr/include/json \
    --with-gdalconfig=/usr/bin/gdal-config \
    --with-raster \
    --with-topology \
    --with-sfcgal=/usr/local/bin/sfcgal-config &amp;&amp; \
make &amp;&amp; make cheatsheets &amp;&amp; sudo make install # deliberately one CPU only

sudo -u postgres psql
sudo -u postgres createdb spatial_template
sudo -u postgres psql -d spatial_template -c "CREATE EXTENSION postgis;"
sudo -u postgres psql -d spatial_template -c "CREATE EXTENSION postgis_topology;"
sudo -u postgres psql -d spatial_template -c "CREATE EXTENSION postgis_sfcgal;"
sudo -u postgres psql -d spatial_template -c "SELECT postgis_full_version();"