Michal Zimmermann Pieces of knowledge from the world of GIS.

Mapping North America with QGIS: Tips and Tricks

Written on May 21, 2017 and marked as qgis, visualization | GIS

Recently I’ve bought a book called Maps by Aleksandra Mizielinska and Daniel Mizielinski to my nephew. The book’s absolutely wonderful and made me want to try crafting a map with similar looks. I don’t do maps much at CleverMaps, so this was a great opportunity to find out what new features became available during the last months of QGIS development.

Result

A map of North America in scale of 1:22,000,000 featuring the biggest lakes, rivers, mountain ranges and basic administrative units for the North American countries. I aimed for visually appealing overview map rather than perfectly correct topographic one.

Data

I used my beloved Natural Earth dataset for both cultural (boundaries, cities) and physical (rivers, lakes) map features. Different scales came to play for different map layers as they seemed a bit too/few simplified for the given scale.

Fonts

I usually use built-in system fonts (Ubuntu Condensed or such), but this kind of map needed a more handwritten looking, sort of childish font. After searching dafont.com I chose PreCursive by RaseOne Full Time Artists and KG Primary Penmanship by Kimberly Geswein.

Symbols

The mountain point symbol was one of the two custom symbols used on the map. It comes from BSGStudio. The ocean wave symbol was made by myself.

QGIS effects

I’ve used several techniques I find interesting enough to be listed here.

Coastlines

For a long time I’ve considered coastlines a field for cartographic invention. They can be emphasized by shading or 3D effects. I chose the set of four parallel coastlines subtly disappearing into the sea, hopefully invoking the feeling of waves coming to the shore.

It’s done by dissolving all the features and buffering them again and again.

Buffered labels

Buffered labels are usually hard to get right, because they fill so much space if the buffer color’s not corresponding to its surroundings. But choosing the proper color can be a real struggle at times.

On this map, almost all the labels are buffered with the color of its surroundings, which makes them more legible, yet not too expressive. This is possible thanks to QGIS expression based properties that let you define unique styling to different map features.

Where it isn’t possible (e.g. Bahamas or Honduras) to choose just one buffer color, the label is not buffered at all (or the semi-transparent white buffer is used).

Note the Rocky Mountains label is split on the borders of the U.S.A. and Canada and its both parts match the background color.

Tapered rivers

Rivers are tapered based on the Natural Earth’s width attribute value for each river segment.

Labels in separate layers

I’m used to put labels into separate layers in more complicated map compositions, especially when you need to draw label along path for areal features (such as countries or states).

It becomes a bit harder to keep the features in sync with the labels though. I’d like to use only one layer for all the map layers in the future, as I feel that’s the way to go for the best labeling.

Labels wrapped on character

Some labels just can’t fit the feature they belong to and QGIS lets you deal with this by wrapping labels on a special character, \ in my case.

Layer blending mode

The mechanics behind layer blending modes are still a mystery to me, but they can add that little extra to a map very easily. Thanks to the Overlay blending mode, the Rocky Mountains may remain very subtle on different kinds of background.

Wifileaks Wi-Fi Networks Dataviz

Written on May 2, 2017 and marked as qgis, visualization | GIS

Wifileaks is a project by Jakub Čížek aimed to map the Czech wi-fi networks with Android/iOS app. The data gathered by people using the app is available to download and features ~ 90,000,000 records, each representing the position of the cellphone when connecting to the network. Just about perfect to craft some maps!

Using PostgreSQL cstore_fdw

I ran out of disk space immediately after loading the dataset into the PostgreSQL database. After fiddling around I remembered that columnar store should be a bit space-friendlier than the old fashioned relational database. Thus, I installed the cstore_fdw by Citus Data in just few steps.

sudo apt install libprotobuf-c-dev libprotobuf-c1 protobuf-c-compiler postgresql-server-dev-9.6
git clone [email protected]:citusdata/cstore_fdw.git
PATH=/usr/bin/:$PATH make
PATH=/usr/bin/:$PATH make install

# when the cstore_fdw installation finishes, add the following line to your postgresql.conf and restart the database cluster
shared_preload_libraries = 'cstore_fdw'

This makes another FDW available to you inside the PostgreSQL. The actual foreign server has to be created before loading the data into a foreign table.

cat <<END | psql -qAt --no-psqlrc
    CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;
    CREATE SCHEMA data_cstore;
    CREATE FOREIGN TABLE data_cstore.wifi (
        id integer,
        mac text,
        ssid text,
        signal_strength numeric,
        security integer,
        lat numeric,
        lon numeric,
        alt numeric,
        unixtime bigint,
        filename text
    )
    SERVER cstore_server
    OPTIONS (compression 'pglz');
END

The foreign table is 3× smaller than it’s standard counterpart. However, this comes with some costs:

  • neither UPDATE nor DELETE can be used
  • no CREATE INDEX
  • no SERIAL

To overcome these shortcomings I used COPY statement to spit out the slightly modified table and immediately loaded it back in.

cat <<END | psql -qAt --no-psqlrc
COPY (
    SELECT
        row_number() OVER (),
        mac,
        ssid,
        signal_strength,
        security,
        split_part(filename, '_', 2)::integer,
        to_timestamp(unixtime),
        ST_Transform(ST_SetSRID(ST_MakePoint(lon, lat, alt), 4326), 32633)
    FROM data_cstore.wifi
    WHERE lon BETWEEN 0 AND 20
        AND lat BETWEEN 18 AND 84
) TO '/tmp/wifileaks.db' WITH CSV DELIMITER ';'
    DROP SCHEMA IF EXISTS data_cstore CASCADE;

DROP SCHEMA data_cstore;
CREATE SCHEMA data_cstore;
CREATE FOREIGN TABLE data_cstore.wifi (
    id integer,
    mac text,
    ssid text,
    signal_strength numeric,
    security integer,
    userid integer,
    unixtime timestamp without time zone,
    geom geometry(POINTZ, 32633)
)
SERVER cstore_server
OPTIONS (compression 'pglz');
END

Putting the networks on the map

As mentioned, each row of data represents the cellphone’s location when connecting to a wi-fi network. To get real wi-fi transmitter position, I calculated the average of location of each cellphone ever connected (although the signal strength should be taken into account here as well).

CREATE UNLOGGED TABLE data_cstore.wifi_avg_loc AS
SELECT
    row_number() OVER () id,
    mac,
    ST_SetSRID(ST_MakePoint(x, y), 32633) geom
FROM (
    SELECT
        mac,
        AVG(ST_X(geom)) x,
        AVG(ST_Y(geom)) y
    FROM data_cstore.wifi_loc
    GROUP BY 1
) a;

Routing with GRASS GIS: Catchment Area Calculation

Written on Apr 20, 2017 and marked as grass | GIS

I got my hands on pgRouting in the last post and I’m about to do the same with GRASS GIS in this one.

GRASS GIS stores the topology for the native vector format by default, which makes it easy to use for the network analysis. All the commands associated with the network analysis can be found in the v.net family. The ones I’m going to discuss in this post are v.net itself, v.net.path, .v.net.alloc and v.net.iso, respectively.

Data

I’m going to use the roads data from the previous post together with some random points used as catchment areas centers.

# create the new GRASS GIS location
grass -text -c ./osm/czech

# import the roads
v.in.ogr input="PG:host=localhost dbname=pgrouting" layer=cze.roads output=roads -eo  --overwrite

# import the random points
v.in.ogr input="PG:host=localhost dbname=pgrouting" layer=temp.points output=points -eo --overwrite

I got six different points and the pretty dense road network. Note none of the points is connected to the existing network.

You have to have routable network to do the actual routing (the worst sentence ever written). To do so, let’s:

  • connect the random points to the network
  • add nodes to ends and intersections of the roads

Note I’m using the 500m as the max distance in which to connect the points to the network.

v.net input=roads points=points operation=connect threshold=500 output=network
v.net input=network output=network_noded operation=nodes

Finding the shortest path between two points

Once the network is routable, it is easy to find the shortest path between points number 1 and 4 and store it in the new map.

echo "1 1 4" | v.net.path input=network_noded output=path_1_4

The algorithm doesn’t take bridges, tunnels and oneways into account, it’s capable of doing so though.

Distributing the subnets for nearest centers

v.net.alloc input=network_noded output=network_alloc center_cats=1-6 node_layer=2

v.net.alloc module takes the given centers and distributes the network so each of its parts belongs to exactly one center - the nearest one (speaking the distance, time units, …).

Creating catchment areas

v.net.iso input=network_noded output=network_iso center_cats=1-6 costs=1000,3000,5000

v.net.iso splits net by cost isolines. Again, the costs might be specified as lengths, time units, ….

Two different ways lead to the actual catchment area creation. First, you extract nodes from the roads with their values, turn them into the raster grid and either extract contours or polygonize the raster. I find the last step suboptimal and would love to find another way of polygonizing the results.

Note when extracting contours the interval has to be set to the reasonable number depending on the nodes values.

Remarks

  • Once you grasp the basics, GRASS GIS is real fun. Grasping the basics is pretty tough though.
  • Pedestrians usually don’t follow the road network.
  • Bridges and tunnels might be an issue.
  • Personally, I find GRASS GIS easier to use for the network analysis compared to pgRouting.

Routing with pgRouting: Catchment Area Calculation

Written on Apr 14, 2017 and marked as postgis, pgrouting | SQL

For a long time I’ve wanted to play with pgRouting and that time has finally come. Among many other routing functions there is one that caught my eye, called pgr_drivingdistance. As the documentation says, it returns the driving distance from a start node using Dijkstra algorithm. The aforementioned distance doesn’t need to be defined in Euclidean space (the real distance between two points), it might be calculated in units of time, slopeness etc. How to get it going?

Data

OSM will do as it always does. There is a tool called osm2pgrouting to help you load the data, the pure GDAL seems to be a better way to me though. Importing the downloaded data is trivial.

ogr2ogr -f "PostgreSQL" PG:"dbname=pgrouting active_schema=cze" \
    -s_srs EPSG:4326 \
    -t_srs EPSG:5514 \
    roads.shp \
    -nln roads \
    -lco GEOMETRY_NAME=the_geom \
    -lco FID=id \
    -gt 65000 \
    -nlt PROMOTE_TO_MULTI \
    -clipsrc 16.538 49.147 16.699 49.240

To route the network, it has to be properly noded. Although pgRouting comes with built-in pgr_nodenetwork, it didn’t seem to work very well. To node the network, use PostGIS ST_Node. Note this doesn’t consider bridges and tunnels.

CREATE TABLE cze.roads_noded AS
SELECT
    (ST_Dump(geom)).geom the_geom
FROM (
    SELECT
        ST_Node(geom) geom
    FROM (
        SELECT ST_Union(the_geom) geom
        FROM cze.roads
    ) a
) b;

After noding the network, all the information about speed limits and oneways is lost. If needed, it can be brought back with following:

CREATE INDEX ON cze.roads_noded USING gist(the_geom);
ALTER TABLE cze.roads_noded ADD COLUMN id SERIAL PRIMARY KEY;
ALTER TABLE cze.roads_noded ADD COLUMN maxspeed integer;

UPDATE cze.roads_noded
SET maxspeed = a.maxspeed
FROM (
    SELECT DISTINCT ON (rn.id)
        rn.id,
        r.maxspeed
    FROM cze.roads_noded rn
    JOIN cze.roads r ON (ST_Intersects(rn.the_geom, r.the_geom))
    ORDER BY rn.id, ST_Length(ST_Intersection(rn.the_geom, r.the_geom)) DESC
) a
WHERE cze.roads_noded.id = a.id;

With everything set, the topology can be built.

ALTER TABLE cze.roads_noded ADD COLUMN source integer;
ALTER TABLE cze.roads_noded ADD COLUMN target integer;
SELECT pgr_createTopology('cze.roads_noded', 1);

This function creates the cze.roads_noded_vertices_pgr that contains all the extracted nodes from the network.

As already mentioned, measures other than length can be used as a distance, I chose the time to get to a given node on foot.

ALTER TABLE cze.roads_noded ADD COLUMN cost_minutes integer;
UPDATE cze.roads_noded
SET cost_minutes = (ST_Length(the_geom) / 83.0)::integer; -- it takes average person one minute to walk 83 meters

UPDATE cze.roads_noded
SET cost_minutes = 1
WHERE cost_minutes = 0;

Routing

Now the interesting part. All the routing functions are built on what’s called inner queries that are expected to return a certain data structure with no geometry included. As I want to see the results in QGIS immediately, I had to use a simple anonymous PL/pgSQL block that writes polygonal catchment areas to a table (consider it a proof of concept, not the final solution).

DROP TABLE IF EXISTS cze.temp;
CREATE TABLE cze.temp AS
SELECT *
FROM cze.roads_noded_vertices_pgr ver
JOIN (
    SELECT *
    FROM pgr_drivingDistance(
        'SELECT id, source, target, cost_minutes as cost, cost_minutes as reverse_cost FROM cze.roads_noded',
        6686,
        10,
        true
    )
)dist ON ver.id = dist.node;

DO $$
DECLARE
    c integer;
BEGIN
    DROP TABLE IF EXISTS tmp;
    CREATE TABLE tmp (
        agg_cost integer,
        geom geometry(MULTIPOLYGON, 5514)
    );

    -- order by the biggest area so the polygons are not hidden beneath the bigger ones
    FOR c IN SELECT agg_cost FROM cze.temp GROUP BY agg_cost HAVING COUNT(1) > 3 ORDER BY 1 DESC LOOP
        RAISE INFO '%', c;
        INSERT INTO tmp (agg_cost, geom)
        SELECT
            c,
            ST_Multi(ST_SetSRID(pgr_pointsAsPolygon(
                'SELECT
                        temp.id::integer,
                        ST_X(temp.the_geom)::float AS x,
                        ST_Y(temp.the_geom)::float AS y
                FROM cze.temp
                WHERE agg_cost = ' || c
            ), 5514));
    END LOOP;
END$$;

Using pgr_pointsAsPolygon renders resulting nodes accessible in 10-minute walk in polygons, but weird looking ones. Not bad, could be better though.

How about seeing only nodes instead of polygons?

SELECT
    agg_cost,
    ST_PointN(geom, i)
FROM (
    SELECT
        agg_cost,
        ST_ExteriorRing((ST_Dump(geom)).geom) geom,
        generate_series(0,ST_NumPoints(ST_ExteriorRing((ST_Dump(geom)).geom))) i
    FROM tmp
) a;

Looks good, could be better though.

How about creating concave hulls from the extracted nodes?

SELECT
    agg_cost,
    ST_ConcaveHull(ST_Union(geom)) geom
FROM (
    SELECT
        agg_cost,
        ST_PointN(geom, i) geom
    FROM (
        SELECT
            agg_cost,<div class="text-center"><img src="{filename}/assets/routing-with-pgrouting-catchment-area-calculation/nodes1.png" width="70%" /></div>
            ST_ExteriorRing((ST_Dump(geom)).geom) geom,
            generate_series(0,ST_NumPoints(ST_ExteriorRing((ST_Dump(geom)).geom))) i
        FROM tmp
    ) a
) b
GROUP BY agg_cost
ORDER BY agg_cost DESC;

This one looks the best I guess.

Remarks

  • The documentation doesn’t help much.
  • I’d expect existing functions to return different data structures to be easy-to-use, actually.
  • LATERAL might be really handy with those inner queries, have to give it a shot in the future.
  • Pedestrians usually don’t follow the road network.
  • Bridges and tunnels might be an issue.

Exploiting Prague Open Data without API

Written on Apr 3, 2017 and marked as bash | data

Speaking the Czech Republic, Prague is an undoubted leader in open data publishing. However, there is no public API to explore/search existing datasets.

I wanted to download the ESRI Shapefile of the city urban plan that is divided into more than a hundred files (a file representing a cadastral area).

This becomes a piece of cake with Opera Developer tools and a bit of JavaScript code

let links = document.getElementsByClassName('open-data-icon-rastr open-data-link tooltipstered')

for (let link of links) {
    if (link.href.indexOf('SHP') === -1) { continue;}console.log(link.href)
}

With the list saved to a file called list.txt, wget --input-file=list.txt will download the data. Followed by for f in *.zip; do unzip $f -d ${f%%.zip}; done, each archive will be extracted in the directory called by its name.

Once done and assuming that the files are named consistently across the folders, ogr2ogr will merge all of them into a single GeoPackage file, resulting in just four files. Not bad considered I began with more than a hundred × 4.

ogr2ogr -f "GPKG" pvp_fvu_p.gpkg ./PVP_fvu_p_Bechovice_SHP/PVP_fvu_p.shp
find -type f -not -path './PVP_fvu_p_Bechovice_SHP*' -iname '*fvu_p.shp' -exec ogr2ogr -update -append -f "GPKG" pvp_fvu_p.gpkg '{}' \;

ogr2ogr -f "GPKG" pvp_fvu_popis_z_a.gpkg ./PVP_fvu_p_Bechovice_SHP/PVP_fvu_popis_z_a.shp
find -type f -not -path './PVP_fvu_p_Bechovice_SHP*' -iname '*fvu_popis_z_a.shp' -exec ogr2ogr -update -append -f "GPKG" pvp_fvu_popis_z_a.gpkg '{}' \;

ogr2ogr -f "GPKG" pvp_pp_pl_a.gpkg ./PVP_fvu_p_Bechovice_SHP/PVP_pp_pl_a.shp
find -type f -not -path './PVP_fvu_p_Bechovice_SHP*' -iname '*pp_pl_a.shp' -exec ogr2ogr -update -append -f "GPKG" pvp_pp_pl_a.gpkg '{}' \;

ogr2ogr -f "GPKG" pvp_pp_s_a.gpkg ./PVP_fvu_p_Bechovice_SHP/PVP_pp_s_a.shp
find -type f -not -path './PVP_fvu_p_Bechovice_SHP*' -iname '*pp_s_a.shp' -exec ogr2ogr -update -append -f "GPKG" pvp_pp_s_a.gpkg '{}' \;

A boring task that would take me hours five years ago transformed into simple, yet fun, piece of work done in no more than half an hour.

Upgrading PostgreSQL 9.5 to PostgreSQL 9.6 with PostGIS

Written on Mar 1, 2017 and marked as sql, postgresql | SQL

Thanks to pg_upgrade tool the PostgreSQL upgrade on Ubuntu is pretty straightforward. Different PostGIS versions might cause troubles though. This post covers PostgreSQL 9.5, PostGIS 2.2 to PostgreSQL 9.6, PostGIS 2.3 migration.

First of all, install the PostgreSQL 9.6 with PostGIS 2.3.

apt install postgresql-9.6 postgresql-9.6-postgis-2.3

Mind that newly installed database cluster runs on port 5433.

If you run pg_upgrade at this stage, it will fail with the following error.

could not load library "$libdir/postgis_topology-2.2":
ERROR:  could not access file "$libdir/postgis_topology-2.2": No such file or directory

pg_upgrade can’t run the upgrade because PostGIS versions don’t match. Install the PostGIS 2.3 for PostgreSQL 9.5 and update extensions in all your databases.

apt install postgresql-9.5-postgis-2.3

:::sql
ALTER EXTENSION postgis UPDATE;

With both clusters using the same PostGIS version, the upgrade can begin. First, stop them with

service postgresql stop

Then, run the actual pg_upgrade command as postgres user. Make sure the pg_hba.conf file is set to allow local connections.

/usr/lib/postgresql/9.6/bin/pg_upgrade \
-b /usr/lib/postgresql/9.5/bin/ \
-B /usr/lib/postgresql/9.6/bin/ \
-d /var/lib/postgresql/9.5/main \
-D /var/lib/postgresql/9.6/main \
-o ' -c config_file=/etc/postgresql/9.5/main/postgresql.conf' \
-O ' -c config_file=/etc/postgresql/9.6/main/postgresql.conf'

The following result means the upgrade was smooth.

Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* system OID user data types                ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for roles starting with 'pg_'                      ok
Creating dump of global objects                             ok
Creating dump of database schemas
                                                            ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster                       ok
Freezing all rows on the new cluster                        ok
Deleting files from new pg_clog                             ok
Copying old pg_clog to new server                           ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Copying old pg_multixact/offsets to new server              ok
Deleting files from new pg_multixact/members                ok
Copying old pg_multixact/members to new server              ok
Setting next multixact ID and offset for new cluster        ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster
                                                            ok
Copying user relation files
                                                            ok
Setting next OID for new cluster                            ok
Sync data directory to disk                                 ok
Creating script to analyze new cluster                      ok
Creating script to delete old cluster                       ok

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

The old cluster can be removed and the new one switched back to port 5432. Run /usr/lib/postgresql/9.6/bin/vacuumdb -p 5433 --all --analyze-in-stages to collect statistics.

Executing dynamic SQL query right away

Written on Feb 28, 2017 and marked as sql, postgresql | SQL

PostgreSQL 9.6 comes with a handy psql command called \gexec that sends the current query input buffer to the server and treats the result as a SQL statement to be executed (right, whatever). What that means is that instead of doing this

psql -c "SELECT 'DROP TABLE ' || tablename FROM information_schema.tables WHERE table_name LIKE '%to_be_dropped%" | psql

you’ll do that

SELECT 'DROP TABLE ' || tablename FROM information_schema.tables WHERE table_name LIKE '%to_be_dropped%'\gexec

Brilliant.

Subdivide and Conquer: Effective Spatial Indexes in PostGIS

Written on Jan 10, 2017 and marked as sql, postgresql, postgis | SQL

Spatial indexes are absolutely crucial part of any spatial database and - as I tend to say quite often - only a fool would try to query spatial data without building spatial indexes beforehand.

Spatial indexes are based on bounding box comparisons, which are generally very fast. Yet, there are situations when spatial indexes don’t help much (or they don’t help as much as they could, if you wish).

Bounding box comparisons are effective with lots of small bounding boxes rather then few large ones. Why? See the picture above. The curved line (imagine it’s a pipeline for example) clearly demonstrates when the spatial index/bounding box comparison might fall short of what you’d expect.

Once the bounding box gets really big, it intersects so many other geometries’ bounding boxes that the whole comparison starts to slow down.

Luckily, PostGIS 2.2 introduced a ST_Subdivide function that can lend a helping hand in here.

Until today, we delivered the parcel geometries into our real estate acquisition process system with the following query, that takes all the geometries from the req_geom table (pipelines, remember?) and intersects them with cadastral parcels. The second part of the query adds those parcels that haven’t been digitalized and were created manually by one of my workmates.

INSERT INTO requested_parcels (uid, par_id)
SELECT
    reqs.uid,
    b.id par_id
 FROM
    running_requests reqs
 JOIN
    req_geom a ON (reqs.uid = a.uid)
 JOIN
    pargeo b ON (ST_Intersects(a.geom, b.geom))
 UNION
 SELECT
    reqs.uid,
    a.idpar::numeric
 FROM
    running_requests reqs
 JOIN
     req_man a ON (reqs.uid = a.uid);

It’s a perfectly standard query that intersects several request geometries with ~20M parcels, nothing really fancy. Except that it takes 25 minutes to finish. Why? Pipelines, remember?

Yet, the query below takes only 30 seconds to finish (that’s a huge time saver considering that the whole process used to take ~40 minutes)! Why? Because the ST_Subdivide effectively shrinks the req_geom geometries until they have 50 vertices each at most. Such small geometries are perfect input for the bounding box comparison. Remember to call DISTINCT when using ST_Subdivide, you’d probably get duplicate parcel ids otherwise.

I also replaced the UNION with the WHERE NOT EXISTS expression, as it’s reasonable to assume that numeric ids comparison will be faster.

INSERT INTO requested_parcels (uid, par_id)
SELECT DISTINCT
    reqs.uid,
    b.id par_id
 FROM
    running_requests reqs
 JOIN
    (
        SELECT
            uid,
            ST_Subdivide(geom, 50) geom
        FROM
            req_geom
     ) a ON (reqs.uid = a.uid)
 JOIN
     pargeo b ON (ST_Intersects(a.geom, b.geom));

 INSERT INTO requested_parcels (uid, par_id)
 SELECT
     reqs.uid,
     a.idpar::numeric
 FROM
     running_requests reqs
 JOIN
     req_man a ON (reqs.uid = a.uid)
 WHERE NOT EXISTS (
     SELECT 1
     FROM pozadovane_parcely pp
     WHERE pp.par_id = a.idpar
  );

Counting substring occurrences in PostgreSQL

Written on Dec 19, 2016 and marked as sql, postgresql | SQL

I got to count occurrences of / character today and found out no built-in function exists in PostgreSQL, so here’s my shot at it. Pretty simple, yet useful.

CREATE OR REPLACE FUNCTION how_many(IN text, IN varchar, OUT integer)
RETURNS integer
AS
$how_many$
    SELECT length($1) - length(replace($1, $2, ''));
$how_many$
LANGUAGE SQL
SECURITY DEFINER;

-- SELECT how_many('test', 't'); -- returns number 2

Plotting the Czech Cadastre Land Use with d3: Data Viz (part IV)

Written on Nov 20, 2016 and marked as d3, javascript, svg | data

This post is the fourth part of the series summarizing the process of visualizing land use data with bash, PostgreSQL and d3.js. Read other parts:

  1. Plotting the Czech Cadastre Land Use with d3: Data Extraction (part I)
  2. Plotting the Czech Cadastre Land Use with d3: Data Transformation (part II)
  3. Plotting the Czech Cadastre Land Use with d3: Data Load (part III)

Data vizualization

Those of you who’ve been following this series know all the data are set and ready to be used. The rest of you, shame on you by the way, can go through the above posts to catch up.

The result is available at https://www.zimmi.cz/kn-landuse-monitor and works like the gif below.

Features

  • land use data for 13,093 cadastral areas between 2015/01/01 and 2016/10/01
  • relative area and parcel count per land use type
  • similar cadastres based on land use relative area values
  • time series plots for various charachteristics (including agricultural land area and parcel count)

Todos

  • time series chart titles onmouseover
  • barchart titles onmouseover
  • absolute values chart (?)
  • fetch API polyfill
  • Firefox seems to be broken

Technologies

I implemented the whole app with vanilla JavaScript. The app resided in the Monitor variable, had several modules that were communicating via custom events with each other.

So far, so good. Once the app was production-ready, I stumbled upon vue.js, which is by miles the best JavaScript framework experience I’ve had so far. Reinventing the app once again was the matter of two days (thanks to this amazing setup - hot reload included).

Thus, the current version of the app is based on:

vue.js

Thanks to the easy-to-understand system of components, properties and methods, learning curve is really steep. The app is now divided into several components (Search, Dashboard with child components for charts and similar cadastres list).

vuex

Vuex, probably inspired by Flux or Redux, is the “state management pattern + library”, the single source of truth for your apps. That’s pretty much it: there’s only one place in your app (called the store), where you go to put or get your data. Not necessarily every single piece of data, just those pieces used across several components. It plays really nice with the vue.js.

D3.js

Tried it before, D3.js was really hard to grasp. And it still is, I guess. At the same time, it’s damn good at plotting the data. Yet, being a bit less low-level would be great.

Dexie

I hate writing servers for my pet projects. The server means no Github Pages. Thus, I decided to load the whole dataset with fetch API from the external JSON file. Loading the 13K objects × 30 properties × array with 8 items in each didn’t seem like the best idea ever, so… Here comes Dexie, a IndexedDB API wrapper that makes it easy on you (unlike the IndexedDB API itself, which doesn’t even let you find out whether the database you’re creating already exists. Seriously?).

Dexie loads the initial dataset into the IndexedDB storage and reads it every time user comes back without loading the JSON file again. On data change, the fresh file will be loaded, the database flushed and the new data written. Behold; I hate the way it’s written.

Flex

Used flex for the first time, I’m not sure I understand how it actually works though. CSS feels more complicated every time I need it.

Bottom line: I use localStorage to keep track of the database existence.

Resume

Two pet projects completed in one month definitely means the winter is here! Looking forward to using more vue.js.