Michal ZimmermannPieces of knowledge from the world of GIS.

PostgreSQL Development History Revealed with PostgreSQL

I spend a lot of time reading PostgreSQL docs. It occurred to me just a few weeks ago that those versioned manuals are great opportunity to get an insight into PostgreSQL development history. Using PostgreSQL, of course.

TOP 5 functions with the most verbose docs in each version

SELECT
    version,
    string_agg(func, ' | ' ORDER BY letter_count DESC)
FROM (
    SELECT
        version,
        func,
        letter_count,
        row_number() OVER (PARTITION BY version ORDER BY letter_count DESC)
    FROM postgresql_development.data
) a
WHERE row_number <= 10
GROUP BY version
ORDER BY version DESC

Seems like a huge comeback for CREATE TABLE.

VERSION 1st 2nd 3rd 4th 5th
10.0 CREATE TABLE ALTER TABLE REVOKE GRANT SELECT
9.6 REVOKE ALTER TABLE GRANT CREATE TABLE SELECT
9.5 REVOKE ALTER TABLE GRANT CREATE TABLE SELECT
9.4 REVOKE GRANT ALTER TABLE CREATE TABLE SELECT
9.3 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
9.2 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
9.1 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
9.0 REVOKE GRANT CREATE TABLE ALTER TABLE ALTER DEFAULT PRIVILEGES
8.4 REVOKE GRANT CREATE TABLE ALTER TABLE SELECT
8.3 REVOKE CREATE TABLE GRANT ALTER TABLE COMMENT
8.2 REVOKE CREATE TABLE GRANT ALTER TABLE SELECT
8.1 REVOKE CREATE TABLE GRANT ALTER TABLE SELECT
8 CREATE TABLE REVOKE GRANT SELECT ALTER TABLE
7.4 CREATE TABLE REVOKE ALTER TABLE GRANT SELECT
7.3 CREATE TABLE SELECT ALTER TABLE REVOKE GRANT
7.2 CREATE TABLE SELECT INTO SELECT ALTER TABLE CREATE TYPE
7.1 CREATE TABLE SELECT INTO SELECT CREATE TYPE ALTER TABLE
7.0 SELECT SELECT INTO CREATE TYPE CREATE TABLE COMMENT

Number of functions available in each version

SELECT
    version,
    count(func),
    sum(letter_count)
FROM postgresql_development.data
GROUP BY version ORDER BY version;

The most verbose docs in each version

SELECT DISTINCT ON (version)
    version,
    func,
    letter_count
FROM postgresql_development.data
ORDER BY version, letter_count DESC;

Poor REVOKE, the defeated champion.

VERSION FUNCTION LETTER COUNT
10 CREATE TABLE 3142
9.6 REVOKE 2856
9.5 REVOKE 2856
9.4 REVOKE 2856
9.3 REVOKE 2856
9.2 REVOKE 2856
9.1 REVOKE 2508
9 REVOKE 2502
8.4 REVOKE 2105
8.3 REVOKE 1485
8.2 REVOKE 1527
8.1 REVOKE 1312
8 CREATE TABLE 1251
7.4 CREATE TABLE 1075
7.3 CREATE TABLE 929
7.2 CREATE TABLE 929
7.1 CREATE TABLE 871
7 SELECT 450

CREATE TABLE docs evolution

SELECT
    version,
    letter_count
FROM postgresql_development.data
WHERE func = 'CREATE TABLE'
ORDER BY func, version;

Something’s going on in an upcoming 10.0 version.

All the data was obtained with the following Python script and processed inside the PostgreSQL database. Plots done with Bokeh, though I probably wouldn’t use it again, the docs site is absurdly sluggish and the info is just all over the place.

PostGIS as a Mapbox Vector Tiles generator

PostGIS 2.4.0 was released recently bringing the possibilities to generate Mapbox Vector Tiles without any third party tools. I got a shot at it with Node.js and docker. Even if it’s not as straightforward as solely using ST_AsMVT, it still looks pretty great.

Docker container

There are no Ubuntu or Debian based PostGIS 2.4.0 packages as far as I know. As installation from source (especially considering GIS software) is always a bit risky, I prefer using Docker to stay away from trouble. The image is based on Ubuntu 17.04, has PostgreSQL 9.6 and PostGIS 2.4.0 installed. It exposes port 5432 to the host, so you can access the database from the outside the container.

FROM ubuntu:17.04
RUN apt update
RUN apt install -y wget less systemd
RUN touch /etc/apt/sources.list.d/pgdg.list
RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ zesty-pgdg main" > /etc/apt/sources.list.d/pgdg.list
RUN wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
RUN apt update
RUN apt -y install postgresql-9.6 postgresql-server-dev-9.6

USER postgres
RUN /usr/lib/postgresql/9.6/bin/pg_ctl -D /var/lib/postgresql/9.6/main -l /tmp/logfile start

USER root
RUN echo "host all  all    0.0.0.0/0  trust" >> /etc/postgresql/9.6/main/pg_hba.conf && \
    echo "listen_addresses='*'" >> /etc/postgresql/9.6/main/postgresql.conf


EXPOSE 5432
RUN apt install -y netcat build-essential libxml2 libxml2-dev libgeos-3.5.1 libgdal-dev gdal-bin libgdal20 libgeos-dev libprotobuf-c1 libprotobuf-c-dev libprotobuf-dev protobuf-compiler protobuf-c-compiler
RUN wget http://download.osgeo.org/postgis/source/postgis-2.4.0alpha.tar.gz
RUN tar -xvzf postgis-2.4.0alpha.tar.gz
RUN cd postgis-2.4.0alpha && ./configure && make && make install

USER postgres
RUN service postgresql start && psql -c "CREATE EXTENSION postgis"

USER root
COPY start.postgis.sh /start.postgis.sh
RUN chmod 0755 /start.postgis.sh

CMD ["/start.postgis.sh"]

start.postgis.sh file starts the database server and keeps it running forever.

#!/bin/bash

DATADIR="/var/lib/postgresql/9.6/main"
CONF="/etc/postgresql/9.6/main/postgresql.conf"
POSTGRES="/usr/lib/postgresql/9.6/bin/postgres"

su postgres sh -c "$POSTGRES -D $DATADIR -c config_file=$CONF" &
until nc -z localhost 5432;
do
    echo ...
    sleep 5
done
sleep 5 # just for sure
su - postgres -c "psql -c \"CREATE EXTENSION IF NOT EXISTS postgis\""
echo database up and running

wait $!

Data

I got a cadastre area dataset of the Czech Republic for testing, which contains ~ 13,000 polygons. The geometries should come in Web Mercator a.k.a. EPSG:3857 to work with MVT.

Vector tiles

I got a bit confused by the docs of ST_AsMVT and ST_AsMVTGeom. Especially the latter one took me a few hours to get it right. What is essential (I guess) about Mapbox Vector Tiles is that you have to abstract from the real world coordinates and start thinking inside the tile coordinates. What PostGIS does with ST_AsMVTGeom (and what any other MVT implemenation should do for you) is that it takes real world coordinates and put them inside a tile.

To make this work, you need to know every bounding box of every tile on every zoom level in a Web Mercator projection. Or you can use TileBBox procedure by Mapbox, if you wish.

The SQL query itself is pretty simple (this comes from an express route I’ll be discussing shortly).

SELECT ST_AsMVT('cadastre', 4096, 'geom', q)
FROM (
    SELECT
        code,
        name,
        ST_AsMVTGeom(
            geom,
            TileBBox(${req.params.z}, ${req.params.x}, ${req.params.y}, 3857),
            4096,
            0,
            false
        ) geom
    FROM cadastre_area
    WHERE ST_Intersects(geom, (SELECT ST_Transform(ST_MakeEnvelope($1, $2, $3, $4, $5), 3857)))
) q

When filled with proper arguments instead of placeholders, it returns a bytea.

\x1aa5dbd0070a047465737412e216120400000101180322d7160987913f8db38e01aa59160e2a010412012a0624060e001410420a1a00203b0a3914190e15085912010a0f0c0f06370804080a0e0e0234090e0

This can be consumed by a Leaflet map using Leaflet.VectorGrid plugin. To keep it short, the frontend code actually boils down to three lines of code.

var url = 'http://localhost:3000/mvt/{x}/{y}/{z}';
var cadastre = L.vectorGrid.protobuf(url);
map.addLayer(cadastre);

The server MVP is available as a GitHub gist.

Fighting Raster GeoPackage with GDAL

As I’m still running Ubuntu 16.04 based Linux Mint, I have no access to GDAL 2.x repositories (except for ubuntugis, that I really don’t like to use). Provided with a GeoPackage raster file recently, I had to find a way to load it into QGIS, somehow. The solution is simple: Docker with gdal_translate.

Preparing the Docker container

I like using Docker for experiments that might leave the OS in an unexpected state (which is exactly what happens to me with ubuntugis repository whenever I use it. That’s why I don’t anymore.). A very simple Dockerfile keeps the troubles away from you.

FROM ubuntu:17.04
RUN apt update
RUN apt install -y gdal-bin

cd into the folder and build the image with docker build -t gdal .. Once ready, summon the daemon, run the container, mount the GeoPackage file to the container directory and you’re ready to rock.

docker run -v /path/to/geopackage:/home/ -it gdal

Raster GeoPackage to GeoTiff translation

With the container running, the raster GeoPackage to GeoTiff translation can be done easily with gdal_translate. Note I chose to cut the source file into tiles, because the gdal_translate was choking about the resulting size.

#!/bin/bash
SIZE=10000
ULX=-630000
ULY=-1135450
LRX=-560000
LRY=-1172479
COUNTER_X=0
COUNTER_Y=0

while [[ $ULX -lt $LRX ]]
do
    while [[ $ULY -gt $LRY ]]
    do
        echo $ULX, $(($ULX+$SIZE)), $ULY, $(($ULY-$SIZE))

        gdal_translate \
            -co TILED=YES \
            -co COMPRESS=DEFLATE \
            -co TFW=YES \
            -co NUM_THREADS=ALL_CPUS \
            -a_nodata 0 \
            -of GTiff \
            -projwin $ULX, $ULY, $(($ULX+$SIZE)), $(($ULY-$SIZE)) \
            -projwin_srs EPSG:5514 \
            data/detected.gpkg data/detected_${COUNTER_X}_${COUNTER_Y}.tiff

        ULY=$(($ULY-$SIZE))
        COUNTER_Y=$((COUNTER_Y+1))
    done
    ULX=$(($ULX+$SIZE))
    ULY=-1135450
    COUNTER_X=$((COUNTER_X+1))
done

Final Touch: Raster to Vector

After the GeoTiff is written to hard drive, inotifywait can be used to generate overviews. And with ease of calling gdal_polygonize.py on each of GeoTiffs…vector layer, at you service.

Mapping North America with QGIS: Tips and Tricks

Recently I’ve bought a book called Maps by Aleksandra Mizielinska and Daniel Mizielinski to my nephew. The book’s absolutely wonderful and made me want to try crafting a map with similar looks. I don’t do maps much at CleverMaps, so this was a great opportunity to find out what new features became available during the last months of QGIS development.

Result

A map of North America in scale of 1:22,000,000 featuring the biggest lakes, rivers, mountain ranges and basic administrative units for the North American countries. I aimed for visually appealing overview map rather than perfectly correct topographic one.

Data

I used my beloved Natural Earth dataset for both cultural (boundaries, cities) and physical (rivers, lakes) map features. Different scales came to play for different map layers as they seemed a bit too/few simplified for the given scale.

Fonts

I usually use built-in system fonts (Ubuntu Condensed or such), but this kind of map needed a more handwritten looking, sort of childish font. After searching dafont.com I chose PreCursive by RaseOne Full Time Artists and KG Primary Penmanship by Kimberly Geswein.

Symbols

The mountain point symbol was one of the two custom symbols used on the map. It comes from BSGStudio. The ocean wave symbol was made by myself.

QGIS effects

I’ve used several techniques I find interesting enough to be listed here.

Coastlines

For a long time I’ve considered coastlines a field for cartographic invention. They can be emphasized by shading or 3D effects. I chose the set of four parallel coastlines subtly disappearing into the sea, hopefully invoking the feeling of waves coming to the shore.

It’s done by dissolving all the features and buffering them again and again.

Buffered labels

Buffered labels are usually hard to get right, because they fill so much space if the buffer color’s not corresponding to its surroundings. But choosing the proper color can be a real struggle at times.

On this map, almost all the labels are buffered with the color of its surroundings, which makes them more legible, yet not too expressive. This is possible thanks to QGIS expression based properties that let you define unique styling to different map features.

Where it isn’t possible (e.g. Bahamas or Honduras) to choose just one buffer color, the label is not buffered at all (or the semi-transparent white buffer is used).

Note the Rocky Mountains label is split on the borders of the U.S.A. and Canada and its both parts match the background color.

Tapered rivers

Rivers are tapered based on the Natural Earth’s width attribute value for each river segment.

Labels in separate layers

I’m used to put labels into separate layers in more complicated map compositions, especially when you need to draw label along path for areal features (such as countries or states).

It becomes a bit harder to keep the features in sync with the labels though. I’d like to use only one layer for all the map layers in the future, as I feel that’s the way to go for the best labeling.

Labels wrapped on character

Some labels just can’t fit the feature they belong to and QGIS lets you deal with this by wrapping labels on a special character, \ in my case.

Layer blending mode

The mechanics behind layer blending modes are still a mystery to me, but they can add that little extra to a map very easily. Thanks to the Overlay blending mode, the Rocky Mountains may remain very subtle on different kinds of background.

Wifileaks Wi-Fi Networks Dataviz

Wifileaks is a project by Jakub Čížek aimed to map the Czech wi-fi networks with Android/iOS app. The data gathered by people using the app is available to download and features ~ 90,000,000 records, each representing the position of the cellphone when connecting to the network. Just about perfect to craft some maps!

Using PostgreSQL cstore_fdw

I ran out of disk space immediately after loading the dataset into the PostgreSQL database. After fiddling around I remembered that columnar store should be a bit space-friendlier than the old fashioned relational database. Thus, I installed the cstore_fdw by Citus Data in just few steps.

sudo apt install libprotobuf-c-dev libprotobuf-c1 protobuf-c-compiler postgresql-server-dev-9.6
git clone [email protected]:citusdata/cstore_fdw.git
PATH=/usr/bin/:$PATH make
PATH=/usr/bin/:$PATH make install

# when the cstore_fdw installation finishes, add the following line to your postgresql.conf and restart the database cluster
shared_preload_libraries = 'cstore_fdw'

This makes another FDW available to you inside the PostgreSQL. The actual foreign server has to be created before loading the data into a foreign table.

cat <<END | psql -qAt --no-psqlrc
    CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;
    CREATE SCHEMA data_cstore;
    CREATE FOREIGN TABLE data_cstore.wifi (
        id integer,
        mac text,
        ssid text,
        signal_strength numeric,
        security integer,
        lat numeric,
        lon numeric,
        alt numeric,
        unixtime bigint,
        filename text
    )
    SERVER cstore_server
    OPTIONS (compression 'pglz');
END

The foreign table is 3× smaller than it’s standard counterpart. However, this comes with some costs:

To overcome these shortcomings I used COPY statement to spit out the slightly modified table and immediately loaded it back in.

cat <<END | psql -qAt --no-psqlrc
COPY (
    SELECT
        row_number() OVER (),
        mac,
        ssid,
        signal_strength,
        security,
        split_part(filename, '_', 2)::integer,
        to_timestamp(unixtime),
        ST_Transform(ST_SetSRID(ST_MakePoint(lon, lat, alt), 4326), 32633)
    FROM data_cstore.wifi
    WHERE lon BETWEEN 0 AND 20
        AND lat BETWEEN 18 AND 84
) TO '/tmp/wifileaks.db' WITH CSV DELIMITER ';'
    DROP SCHEMA IF EXISTS data_cstore CASCADE;

DROP SCHEMA data_cstore;
CREATE SCHEMA data_cstore;
CREATE FOREIGN TABLE data_cstore.wifi (
    id integer,
    mac text,
    ssid text,
    signal_strength numeric,
    security integer,
    userid integer,
    unixtime timestamp without time zone,
    geom geometry(POINTZ, 32633)
)
SERVER cstore_server
OPTIONS (compression 'pglz');
END

Putting the networks on the map

As mentioned, each row of data represents the cellphone’s location when connecting to a wi-fi network. To get real wi-fi transmitter position, I calculated the average of location of each cellphone ever connected (although the signal strength should be taken into account here as well).

CREATE UNLOGGED TABLE data_cstore.wifi_avg_loc AS
SELECT
    row_number() OVER () id,
    mac,
    ST_SetSRID(ST_MakePoint(x, y), 32633) geom
FROM (
    SELECT
        mac,
        AVG(ST_X(geom)) x,
        AVG(ST_Y(geom)) y
    FROM data_cstore.wifi_loc
    GROUP BY 1
) a;