Michal ZimmermannPieces of knowledge from the world of GIS.

Wifileaks Wi-Fi Networks Dataviz

Wifileaks is a project by Jakub Čížek aimed to map the Czech wi-fi networks with Android/iOS app. The data gathered by people using the app is available to download and features ~ 90,000,000 records, each representing the position of the cellphone when connecting to the network. Just about perfect to craft some maps!

Using PostgreSQL cstore_fdw

I ran out of disk space immediately after loading the dataset into the PostgreSQL database. After fiddling around I remembered that columnar store should be a bit space-friendlier than the old fashioned relational database. Thus, I installed the cstore_fdw by Citus Data in just few steps.

sudo apt install libprotobuf-c-dev libprotobuf-c1 protobuf-c-compiler postgresql-server-dev-9.6
git clone [email protected]:citusdata/cstore_fdw.git
PATH=/usr/bin/:$PATH make
PATH=/usr/bin/:$PATH make install

# when the cstore_fdw installation finishes, add the following line to your postgresql.conf and restart the database cluster
shared_preload_libraries = 'cstore_fdw'

This makes another FDW available to you inside the PostgreSQL. The actual foreign server has to be created before loading the data into a foreign table.

cat <<END | psql -qAt --no-psqlrc
    CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;
    CREATE SCHEMA data_cstore;
    CREATE FOREIGN TABLE data_cstore.wifi (
        id integer,
        mac text,
        ssid text,
        signal_strength numeric,
        security integer,
        lat numeric,
        lon numeric,
        alt numeric,
        unixtime bigint,
        filename text
    SERVER cstore_server
    OPTIONS (compression 'pglz');

The foreign table is 3× smaller than it’s standard counterpart. However, this comes with some costs:

To overcome these shortcomings I used COPY statement to spit out the slightly modified table and immediately loaded it back in.

cat <<END | psql -qAt --no-psqlrc
        row_number() OVER (),
        split_part(filename, '_', 2)::integer,
        ST_Transform(ST_SetSRID(ST_MakePoint(lon, lat, alt), 4326), 32633)
    FROM data_cstore.wifi
    WHERE lon BETWEEN 0 AND 20
        AND lat BETWEEN 18 AND 84
) TO '/tmp/wifileaks.db' WITH CSV DELIMITER ';'

DROP SCHEMA data_cstore;
CREATE SCHEMA data_cstore;
CREATE FOREIGN TABLE data_cstore.wifi (
    id integer,
    mac text,
    ssid text,
    signal_strength numeric,
    security integer,
    userid integer,
    unixtime timestamp without time zone,
    geom geometry(POINTZ, 32633)
SERVER cstore_server
OPTIONS (compression 'pglz');

Putting the networks on the map

As mentioned, each row of data represents the cellphone’s location when connecting to a wi-fi network. To get real wi-fi transmitter position, I calculated the average of location of each cellphone ever connected (although the signal strength should be taken into account here as well).

CREATE UNLOGGED TABLE data_cstore.wifi_avg_loc AS
    row_number() OVER () id,
    ST_SetSRID(ST_MakePoint(x, y), 32633) geom
        AVG(ST_X(geom)) x,
        AVG(ST_Y(geom)) y
    FROM data_cstore.wifi_loc
    GROUP BY 1
) a;

Routing with GRASS GIS: Catchment Area Calculation

I got my hands on pgRouting in the last post and I’m about to do the same with GRASS GIS in this one.

GRASS GIS stores the topology for the native vector format by default, which makes it easy to use for the network analysis. All the commands associated with the network analysis can be found in the v.net family. The ones I’m going to discuss in this post are v.net itself, v.net.path, .v.net.alloc and v.net.iso, respectively.


I’m going to use the roads data from the previous post together with some random points used as catchment areas centers.

# create the new GRASS GIS location
grass -text -c ./osm/czech

# import the roads
v.in.ogr input="PG:host=localhost dbname=pgrouting" layer=cze.roads output=roads -eo  --overwrite

# import the random points
v.in.ogr input="PG:host=localhost dbname=pgrouting" layer=temp.points output=points -eo --overwrite

I got six different points and the pretty dense road network. Note none of the points is connected to the existing network.

You have to have routable network to do the actual routing (the worst sentence ever written). To do so, let’s:

Note I’m using the 500m as the max distance in which to connect the points to the network.

v.net input=roads points=points operation=connect threshold=500 output=network
v.net input=network output=network_noded operation=nodes

Finding the shortest path between two points

Once the network is routable, it is easy to find the shortest path between points number 1 and 4 and store it in the new map.

echo "1 1 4" | v.net.path input=network_noded output=path_1_4

The algorithm doesn’t take bridges, tunnels and oneways into account, it’s capable of doing so though.

Distributing the subnets for nearest centers

v.net.alloc input=network_noded output=network_alloc center_cats=1-6 node_layer=2

v.net.alloc module takes the given centers and distributes the network so each of its parts belongs to exactly one center - the nearest one (speaking the distance, time units, …).

Creating catchment areas

v.net.iso input=network_noded output=network_iso center_cats=1-6 costs=1000,3000,5000

v.net.iso splits net by cost isolines. Again, the costs might be specified as lengths, time units, ….

Two different ways lead to the actual catchment area creation. First, you extract nodes from the roads with their values, turn them into the raster grid and either extract contours or polygonize the raster. I find the last step suboptimal and would love to find another way of polygonizing the results.

Note when extracting contours the interval has to be set to the reasonable number depending on the nodes values.


Routing with pgRouting: Catchment Area Calculation

For a long time I’ve wanted to play with pgRouting and that time has finally come. Among many other routing functions there is one that caught my eye, called pgr_drivingdistance. As the documentation says, it returns the driving distance from a start node using Dijkstra algorithm. The aforementioned distance doesn’t need to be defined in Euclidean space (the real distance between two points), it might be calculated in units of time, slopeness etc. How to get it going?


OSM will do as it always does. There is a tool called osm2pgrouting to help you load the data, the pure GDAL seems to be a better way to me though. Importing the downloaded data is trivial.

ogr2ogr -f "PostgreSQL" PG:"dbname=pgrouting active_schema=cze" \
    -s_srs EPSG:4326 \
    -t_srs EPSG:5514 \
    roads.shp \
    -nln roads \
    -lco GEOMETRY_NAME=the_geom \
    -lco FID=id \
    -gt 65000 \
    -clipsrc 16.538 49.147 16.699 49.240

To route the network, it has to be properly noded. Although pgRouting comes with built-in pgr_nodenetwork, it didn’t seem to work very well. To node the network, use PostGIS ST_Node. Note this doesn’t consider bridges and tunnels.

CREATE TABLE cze.roads_noded AS
    (ST_Dump(geom)).geom the_geom
        ST_Node(geom) geom
    FROM (
        SELECT ST_Union(the_geom) geom
        FROM cze.roads
    ) a
) b;

After noding the network, all the information about speed limits and oneways is lost. If needed, it can be brought back with following:

CREATE INDEX ON cze.roads_noded USING gist(the_geom);
ALTER TABLE cze.roads_noded ADD COLUMN maxspeed integer;

UPDATE cze.roads_noded
SET maxspeed = a.maxspeed
    FROM cze.roads_noded rn
    JOIN cze.roads r ON (ST_Intersects(rn.the_geom, r.the_geom))
    ORDER BY rn.id, ST_Length(ST_Intersection(rn.the_geom, r.the_geom)) DESC
) a
WHERE cze.roads_noded.id = a.id;

With everything set, the topology can be built.

ALTER TABLE cze.roads_noded ADD COLUMN source integer;
ALTER TABLE cze.roads_noded ADD COLUMN target integer;
SELECT pgr_createTopology('cze.roads_noded', 1);

This function creates the cze.roads_noded_vertices_pgr that contains all the extracted nodes from the network.

As already mentioned, measures other than length can be used as a distance, I chose the time to get to a given node on foot.

ALTER TABLE cze.roads_noded ADD COLUMN cost_minutes integer;
UPDATE cze.roads_noded
SET cost_minutes = (ST_Length(the_geom) / 83.0)::integer; -- it takes average person one minute to walk 83 meters

UPDATE cze.roads_noded
SET cost_minutes = 1
WHERE cost_minutes = 0;


Now the interesting part. All the routing functions are built on what’s called inner queries that are expected to return a certain data structure with no geometry included. As I want to see the results in QGIS immediately, I had to use a simple anonymous PL/pgSQL block that writes polygonal catchment areas to a table (consider it a proof of concept, not the final solution).

FROM cze.roads_noded_vertices_pgr ver
    SELECT *
    FROM pgr_drivingDistance(
        'SELECT id, source, target, cost_minutes as cost, cost_minutes as reverse_cost FROM cze.roads_noded',
)dist ON ver.id = dist.node;

DO $$
    c integer;
    CREATE TABLE tmp (
        agg_cost integer,
        geom geometry(MULTIPOLYGON, 5514)

    -- order by the biggest area so the polygons are not hidden beneath the bigger ones
    FOR c IN SELECT agg_cost FROM cze.temp GROUP BY agg_cost HAVING COUNT(1) > 3 ORDER BY 1 DESC LOOP
        RAISE INFO '%', c;
        INSERT INTO tmp (agg_cost, geom)
                        ST_X(temp.the_geom)::float AS x,
                        ST_Y(temp.the_geom)::float AS y
                FROM cze.temp
                WHERE agg_cost = ' || c
            ), 5514));

Using pgr_pointsAsPolygon renders resulting nodes accessible in 10-minute walk in polygons, but weird looking ones. Not bad, could be better though.

How about seeing only nodes instead of polygons?

    ST_PointN(geom, i)
        ST_ExteriorRing((ST_Dump(geom)).geom) geom,
        generate_series(0,ST_NumPoints(ST_ExteriorRing((ST_Dump(geom)).geom))) i
    FROM tmp
) a;

Looks good, could be better though.

How about creating concave hulls from the extracted nodes?

    ST_ConcaveHull(ST_Union(geom)) geom
        ST_PointN(geom, i) geom
    FROM (
            ST_ExteriorRing((ST_Dump(geom)).geom) geom,
            generate_series(0,ST_NumPoints(ST_ExteriorRing((ST_Dump(geom)).geom))) i
        FROM tmp
    ) a
) b
GROUP BY agg_cost
ORDER BY agg_cost DESC;

This one looks the best I guess.


Exploiting Prague Open Data without API

Speaking the Czech Republic, Prague is an undoubted leader in open data publishing. However, there is no public API to explore/search existing datasets.

I wanted to download the ESRI Shapefile of the city urban plan that is divided into more than a hundred files (a file representing a cadastral area).

This becomes a piece of cake with Opera Developer tools and a bit of JavaScript code

let links = document.getElementsByClassName('open-data-icon-rastr open-data-link tooltipstered')

for (let link of links) {
    if (link.href.indexOf('SHP') === -1) { continue;}console.log(link.href)

With the list saved to a file called list.txt, wget --input-file=list.txt will download the data. Followed by for f in *.zip; do unzip $f -d ${f%%.zip}; done, each archive will be extracted in the directory called by its name.

Once done and assuming that the files are named consistently across the folders, ogr2ogr will merge all of them into a single GeoPackage file, resulting in just four files. Not bad considered I began with more than a hundred × 4.

ogr2ogr -f "GPKG" pvp_fvu_p.gpkg ./PVP_fvu_p_Bechovice_SHP/PVP_fvu_p.shp
find -type f -not -path './PVP_fvu_p_Bechovice_SHP*' -iname '*fvu_p.shp' -exec ogr2ogr -update -append -f "GPKG" pvp_fvu_p.gpkg '{}' \;

ogr2ogr -f "GPKG" pvp_fvu_popis_z_a.gpkg ./PVP_fvu_p_Bechovice_SHP/PVP_fvu_popis_z_a.shp
find -type f -not -path './PVP_fvu_p_Bechovice_SHP*' -iname '*fvu_popis_z_a.shp' -exec ogr2ogr -update -append -f "GPKG" pvp_fvu_popis_z_a.gpkg '{}' \;

ogr2ogr -f "GPKG" pvp_pp_pl_a.gpkg ./PVP_fvu_p_Bechovice_SHP/PVP_pp_pl_a.shp
find -type f -not -path './PVP_fvu_p_Bechovice_SHP*' -iname '*pp_pl_a.shp' -exec ogr2ogr -update -append -f "GPKG" pvp_pp_pl_a.gpkg '{}' \;

ogr2ogr -f "GPKG" pvp_pp_s_a.gpkg ./PVP_fvu_p_Bechovice_SHP/PVP_pp_s_a.shp
find -type f -not -path './PVP_fvu_p_Bechovice_SHP*' -iname '*pp_s_a.shp' -exec ogr2ogr -update -append -f "GPKG" pvp_pp_s_a.gpkg '{}' \;

A boring task that would take me hours five years ago transformed into simple, yet fun, piece of work done in no more than half an hour.

Upgrading PostgreSQL 9.5 to PostgreSQL 9.6 with PostGIS

Thanks to pg_upgrade tool the PostgreSQL upgrade on Ubuntu is pretty straightforward. Different PostGIS versions might cause troubles though. This post covers PostgreSQL 9.5, PostGIS 2.2 to PostgreSQL 9.6, PostGIS 2.3 migration.

First of all, install the PostgreSQL 9.6 with PostGIS 2.3.

apt install postgresql-9.6 postgresql-9.6-postgis-2.3

Mind that newly installed database cluster runs on port 5433.

If you run pg_upgrade at this stage, it will fail with the following error.

could not load library "$libdir/postgis_topology-2.2":
ERROR:  could not access file "$libdir/postgis_topology-2.2": No such file or directory

pg_upgrade can’t run the upgrade because PostGIS versions don’t match. Install the PostGIS 2.3 for PostgreSQL 9.5 and update extensions in all your databases.

apt install postgresql-9.5-postgis-2.3


With both clusters using the same PostGIS version, the upgrade can begin. First, stop them with

service postgresql stop

Then, run the actual pg_upgrade command as postgres user. Make sure the pg_hba.conf file is set to allow local connections.

/usr/lib/postgresql/9.6/bin/pg_upgrade \
-b /usr/lib/postgresql/9.5/bin/ \
-B /usr/lib/postgresql/9.6/bin/ \
-d /var/lib/postgresql/9.5/main \
-D /var/lib/postgresql/9.6/main \
-o ' -c config_file=/etc/postgresql/9.5/main/postgresql.conf' \
-O ' -c config_file=/etc/postgresql/9.6/main/postgresql.conf'

The following result means the upgrade was smooth.

Performing Consistency Checks
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* system OID user data types                ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for roles starting with 'pg_'                      ok
Creating dump of global objects                             ok
Creating dump of database schemas
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
Analyzing all rows in the new cluster                       ok
Freezing all rows on the new cluster                        ok
Deleting files from new pg_clog                             ok
Copying old pg_clog to new server                           ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Copying old pg_multixact/offsets to new server              ok
Deleting files from new pg_multixact/members                ok
Copying old pg_multixact/members to new server              ok
Setting next multixact ID and offset for new cluster        ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster
Copying user relation files
Setting next OID for new cluster                            ok
Sync data directory to disk                                 ok
Creating script to analyze new cluster                      ok
Creating script to delete old cluster                       ok

Upgrade Complete
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:

Running this script will delete the old cluster's data files:

The old cluster can be removed and the new one switched back to port 5432. Run /usr/lib/postgresql/9.6/bin/vacuumdb -p 5433 --all --analyze-in-stages to collect statistics.