Michal ZimmermannPieces of knowledge from the world of GIS.

Articles in the automation category

Clip Raster With Vector Using GDAL

Recently I needed to clip several raster files with polygonal layer of municipalities. A solution to this task is pretty straightforward using GDAL and a bit of Bash and QGIS thrown in.

The necessary steps are:

  1. Put each polygon to a separate file. This can be done easily with Vector - Data Management Tools - Split Vector Layer in QGIS. The solution below assumes that each shapefile has the same basename as the raster file.
  2. These polygons are stored in the obce subfolder relative to the folder with rasters.
  3. An output folder exists that is used for… output, yes.
  4. Rasters are saved with output alpha band for nodata (-dstalpha flag).
  5. The script takes one argument - raster name.
  6. Profit!
#!/usr/bin/env bash

OBEC=$1
BASE=$(basename $OBEC _jpeg.tif)
echo $BASE
EXTENT=$(ogrinfo -so obce/${BASE}.shp $BASE | grep Extent \
| sed 's/Extent: //g' | sed 's/(//g' | sed 's/)//g' \
| sed 's/ - /, /g')
EXTENT=$(echo $EXTENT | awk -F ',' '{print $1 " " $4 " " $3 " " $2}')
gdal_translate -projwin $EXTENT -of GTiff $OBEC output/${BASE}.tif
gdalwarp -dstalpha -s_srs 'EPSG:5514' -t_srs 'EPSG:5514' \
    -co COMPRESS=JPEG \
    -co TILED=YES -\
    of GTiff \
    -cutline obce/${BASE}.shp \
    output/${BASE}.tif output/${BASE}.final.tif

Note that if gdalwarp doesn’t recognize an EPSG code (which is the case for my country national grid), you might pass it as a PROJ.4 string.

According to the point 5 in the above list, the script needs to be run in a loop:

for f in *_jpeg.tif;
    do the_script_above.sh $f
;done

GRASS: Big Buffers Made Easy

Recently I’ve written about struggling with fairly complex geometries in PostGIS. Lesson learned from the last time was to use more smaller geometries instead of several really huge. You can obtain the small ones from the big by cutting it with a grid.

A supervisor of a project I’ve been working on came up with a task that totally buried the previous process in a blink of an eye: Give me the buffer (diffed with original geometries) that is smoothed on the outer bounds so there are no edges shorter than 10 cm. I sighed. Then, I cursed. Then, I gave it a try with PostGIS. Achieving this goal involves these steps:

Two of those four are rather problematic with PostGIS: line smoothing and diffing the original geometries (I didn’t get to the last one, so it might be 3 of 4 as well).

Hello, I’m GRASS

I haven’t used GRASS for ages and even back then I didn’t get to know it much, but it saved the day for me this time.

grass -text path/to/mapset -c

v.in.ogr input="PG:host=localhost dbname=db user=postgres password=postgres" output=ilot_050 layer=ilot_2015_050 snap=-1 --overwrite
# turn the snapping off, I don't want the input changed in any way, even though it is not topologically valid
v.in.ogr input="PG:host=localhost dbname=db user=postgres password=postgres" output=lollipops_050 layer=lollipops.lollipops_2015_050_tmp snap=-1 --overwrite
v.in.ogr input="PG:host=localhost dbname=db user=postgres password=postgres" output=holes_050 layer=phase_3.holes_050 snap=-1 --overwrite
v.db.addcolumn map=ilot_050 columns="id_0 int"
v.db.update map=ilot_050 column=id_0 value=1

# dissolve didn't work without a column specified, dunno why
v.dissolve input=ilot_050 column=id_0 output=ilot_050_dissolve --overwrite
v.buffer input=ilot_050_dissolve output=ilot_050_buffer distance=20 --overwrite

# v.out and v.in routine used just because I didn't get the way attributes work in GRASS, would do it differently next time
v.out.ogr input=ilot_050_buffer output=ilot_050_buffer format=ESRI_Shapefile --overwrite
v.in.ogr input=ilot_050_buffer output=ilot_050_buffer snap=-1 --overwrite
v.overlay ainput=ilot_050_buffer binput=holes_050 operator=or output=combined_050_01 snap=-1 --overwrite

# tried v.patch to combine the three layers, it gave some strange results in the final overlay
v.overlay ainput=combined_050_01 binput=lollipops_050 operator=or output=combined_050_02 snap=-1 --overwrite
v.out.ogr input=combined_050_02 output=combined_050 format=ESRI_Shapefile --overwrite
v.in.ogr input=combined_050 output=combined_050_in snap=-1 --overwrite
v.db.addcolumn map=combined_050_in columns="id_1 int"
v.db.update map=combined_050_in column=id_1 value=1
v.dissolve input=combined_050_in column=id_1 output=combined_050_dissolve --overwrite

# get rid of < 10cm edges
v.generalize input=combined_050_dissolve output=combined_050_gen method=reduction threshold=0.1 --overwrite
v.out.ogr input=combined_050_gen output=combined_050_gen format=ESRI_Shapefile --overwrite
v.in.ogr input=combined_050_gen output=combined_050_gen snap=-1 --overwrite
v.overlay ainput=combined_050_gen binput=ilot_050 operator=not snap=1e-05 --overwrite output=ilot_050_diff
v.out.postgis input=ilot_050_diff output="PG:dbname=db user=postgres password=postgres" output_layer=onf3.buffer_050_diff options="GEOMETRY_NAME=wkb_geometry,SRID=2154" --overwrite
v.in.ogr input="PG:host=localhost dbname=ign user=postgres password=postgres" output=buffer_050 layer=onf3.buffer_050_diff snap=-1 --overwrite
v.in.ogr input="PG:host=localhost dbname=ign user=postgres password=postgres" output=grid layer=grid snap=-1 --overwrite
v.db.connect -d map=buffer_050

# instead of v.out and v.in routine
db.connect driver=sqlite database='$GISDBASE/$LOCATION_NAME/$MAPSET/sqlite.db'
v.db.addtable map=buffer_050
v.overlay ainput=buffer_050 binput=grid operator=and output=buffer_050_grid snap=-1 --overwrite
v.out.postgis input=buffer_050_grid output="PG:dbname=ign user=postgres password=postgres" output_layer=onf3.buffer_050_diff_grid options="GEOMETRY_NAME=wkb_geometry,SRID=2154" --overwrite

It is damn fast compared to PostGIS. It can be automated. It can be parametrized. It is robust. It is great!

Lesson learned

The more I work with big data, the more I get used to not seeing them. That’s kind of a twist after crafting maps at university.

WMS Download For Future Offline Use

Using WMS in real time might easily become pain in the ass due to low connection speed or slow server response. Downloading images beforehand seems to be a reasonable choice both to avoid any slowdowns and to improve user experience when working with WMS layers.

OWSLib is a great tool to help you get images from WMS server. Code and some comments follow.

import math
import os
import random
import time
from owslib.wms import WebMapService

BOTTOM_LEFT = (-679363,-1120688)
TOP_RIGHT   = (-565171,-1042703)
SRS_WIDTH   = math.fabs(-639084 - -638825) # tile width in units of crs => 259 m
SRS_HEIGHT  = math.fabs(-1070426 - -1070273) # tile height in units of crs => 153 m
PX_WIDTH    = 977
PX_HEIGHT   = 578

FORMAT      = 'image/png'
LAYERS      = ['KN', 'RST_PK']
SIZE        = (PX_WIDTH, PX_HEIGHT)
SRS         = 'EPSG:5514'
STYLES      = ['default', 'default']
TRANSPARENT = True

DIRECTORY = 'tiles/'
SLEEP     = random.randint(0,20) # seconds

dx = math.fabs(BOTTOM_LEFT[0] - TOP_RIGHT[0]) # area width in units of crs
dy = math.fabs(BOTTOM_LEFT[1] - TOP_RIGHT[1]) # area height in units of crs

cols = int(math.ceil(dx / SRS_WIDTH)) + 1
rows = int(math.ceil(dy / SRS_HEIGHT)) + 1

counter = 0

with open('world_file.pngw', 'r') as wld_template:
    tmpl = wld_template.read()

wms = WebMapService('http://services.cuzk.cz/wms/wms.asp', version='1.1.1')

for i in xrange(0, rows):
    if not os.path.exists(DIRECTORY + str(i)):
        os.mkdir(DIRECTORY + str(i))

    for j in xrange(0, cols):
        if os.path.exists(DIRECTORY + str(i) +'/kn_' + str(i) + '_' + str(j) + '.png'):
            counter += 1
            continue

        bbox = (
            i * SRS_WIDTH + BOTTOM_LEFT[0],
            j * SRS_HEIGHT + BOTTOM_LEFT[1],
            (i + 1) * SRS_WIDTH + BOTTOM_LEFT[0],
            (j + 1) * SRS_HEIGHT + BOTTOM_LEFT[1]
        )

        img = wms.getmap(
            layers=LAYERS,
            styles=STYLES,
            srs=SRS,
            bbox=bbox,
            size=SIZE,
            format=FORMAT,
            transparent=TRANSPARENT
        )

        with open(DIRECTORY + str(i) +'/kn_' + str(i) + '_' + str(j) + '.png', 'wb') as png:
            png.write(img.read())

        with open(DIRECTORY + str(i) + '/kn_' + str(i) + '_' + str(j) + '.pngw', 'w') as wld_file:
            wld_file.write(tmpl)
            wld_file.write('\n' + str(i * SRS_WIDTH + BOTTOM_LEFT[0]))
            wld_file.write('\n' + str((j+1) * SRS_HEIGHT + BOTTOM_LEFT[1]))

        counter += 1
        print str(counter), ' out of ', str(rows * cols)
        time.sleep(SLEEP)

First, always make sure you are not violating terms of use defined by service provider. If you are not, here are the necessary steps:

  1. Define your area of interest with bottom left and top right coordinates.
  2. Calculate width of single image both in pixels and units of CRS to get the rightsized image. Note that there may be image size restrictions defined by provider (2048 × 2048 px is usually the biggest you can get).
  3. Define template world file for referencing images. OWSLib doesn’t provide world files to saved images, these have to be created by you. I recommend to use a template file for creating real world files.
  4. Be nice! Don’t overload the service. I use time.sleep() for this.
  5. Profit.

The trouble with WMS is that you can’t set an arbitrary scale you want to obtain images in (e.g. 1:1 000). It’s fairly easy to get all values needed to imitate this behavior though.

Using QGIS you can:

  1. Get bounding box of area you’re interested in.
  2. Save current view as an image (together with the world file!) and use it as a specimen for your own world files.
  3. Derive image width (CRS, pixels) from the saved image, thus getting the same zoom level you were using in QGIS.

Code given is not bulletproof, it will fail on any network error. However, if you restart it after such a crash, it checks for existing files and starts with the first missing, so you don’t have to download all the tiles again.