Michal ZimmermannPieces of knowledge from the world of GIS.

Articles in the development category

Degrees To Decimal With Javascript Reworked

Two years ago I was pretty happy with this little piece of code to transform degrees to the decimal value. Yesterday, I found a neater way to do the same:

let deg = [50, 30, 0];

function degToDec(prev, cur, curIndex) {
    return prev + cur / Math.pow(60, curIndex);
}

deg.reduce(degToDec);

Once you have an input array, that’s pretty much it. Love JavaScript.

Do You Really Need Gulp? Or Grunt? Or Bower? Or What?

Disclaimer: I’m an enthuastic developer, but I do not code for a living. I’m just the ordinary guy who keeps editing a wrong file wondering why the heck the changes are not being applied.

TL;DR: I do think npm might be the answer.

Wonderful world of JavaScript DevOps

When I first started using JavaScript on the server side with node.js, I felt overwhelmed by numerous options to automate tasks. There was npm taking care of backend dependencies. Then I would build a frontend and found out about bower for handling frontend dependencies. Then it would be great to have some kind of minification/obfuscation/uglification/you-name-it task. And the build task. And the build:prod task. And how about eslint task? And then I would end up spending hours doing nothing, just reading blogs about the tools being used by others who do code for a living.

Intermezzo: I think my coding is slow. Definitely slower than yours. I’m getting better though.

Using the force

Looking back I find it a bit stressful - how the heck do I choose the right tools? Where’s Yoda to help me out? Anyway, next to adopt after npm was bower. And I liked it, even though some packages were missing - but who cares as long as there is no better way, right? Except there is… I guess.

Automation was next in the line to tackle. So I chose gulp without a bit of hesitation. It was a hype, a bigger than grunt back then. I even heard of yeoman, but until now I still don’t know what it actually does. And I’m happy with that.

A short summary so far:

So far, so good.

Is Bower going to die?

Then I stumbled upon this tweet and started panicking. Or rather started to feel cheated. It took me time to set all this up and now it’s useless? Or what?

Seeing it now, I’m glad I read this. And I really don’t know what happened to Bower, if anything at all.

Keeping it simple

So Bower’s dying, what are you going to do about that? You’ll use npm instead! And you’ll have a single source of truth called package.json. You’l resolve all the dependencies with a single npm install command and feel like a king. We’re down to two now - npm and gulp.

Gulp, Gulp everywhere!

When you get rid of Bower, next feeling you have is your gulpfile.js just got off the leash. It got really big and grew to ~160 lines of code and became a nightmare to manage.

So you split it into task files and a config file. What a relief. But you still realize a half of your package.json dependencies starts with gulp-. And you hate it.

Webpack for the win

For me, a non-developer, setting the webpack wasn’t easy. I didn’t find docs very helpful either. Reading the website for the first time, I didn’t even understand what it should be used for. I got it working eventually. And I got rid of gulp, gulp-connect, gulp-less, gulp-nodemon, gulp-rename, gulp-replace, gulp-task-listing and gutil. And the whole gulpfile.js. That was a big win for me.

But how do you run tasks?

Well…

npm run start-dev # which in turn calls the code below
npm run start-webpack & NODE_ENV=development nodemon server.js # where start-webpack does the following
node_modules/webpack-dev-server/bin/webpack-dev-server.js --quiet --inline --hot --watch

That’s it. If I need to build code, I run npm run build, which calls some other tasks from scripts section in the package.json.

That’s pretty much it. I don’t think it’s a silver bullet, but I feel like I finally found peace of mind for my future JavaScript development. At least for a month or so before some other guy comes to town.

Twitter REST API Data Mining on OpenShift (Part II)

Last time I described the setup of my OpenShift Twitter crawler and let it running and downloading data. It’s been more than two months since I started and I got interesting amount of data. I also made a simple ETL process to load it into my local PostGIS database, which I’d like to cover in this post.

Extract data

Each day is written to the separate sqlite file with a name like tw_day_D_M_YYYY. Bash is used to gzip all the files before downloading them from OpenShift.

#!/bin/bash

ssh openshift << EOF
    cd app-root/data
    tar czf twitter.tar.gz *.db
EOF

scp openshift:/var/lib/openshift/55e487587628e1280b0000a9/app-root/data/twitter.tar.gz ./data
cd data &&
tar -xzf twitter.tar.gz &&
cd -

echo "Extract done"

Transform data

The transformation part operates on downloaded files and merges them into one big CSV file. That’s pretty straightforward. Note that’s quite simple with sqlite flags, some sed and tail commands.

#!/bin/bash

rm -rf ./data/csv
mkdir ./data/csv

for db in ./data/*.db; do
    FILENAME=$(basename $db)
    DBNAME=${FILENAME%%.db}
    CSVNAME=$DBNAME.csv
    echo "$DBNAME to csv..."
    sqlite3 -header -csv $db "select * from $DBNAME;" > ./data/csv/$CSVNAME
done

cd ./data/csv
touch tweets.csv
echo $(sed -n 1p $(ls -d -1 *.csv | head -n 1)) > tweets.csv # get column names

for csv in tw_*.csv; do
    echo $csv
    tail -n +2 $csv >> tweets.csv # get all lines without the first one
done

Load data

In the last step, the data is loaded with SQL \copy command.

#!/bin/bash

export PG_USE_COPY=YES

DATABASE=mzi_dizertace
SCHEMA=dizertace
TABLE=tweets

psql $DATABASE << EOF
    DROP TABLE IF EXISTS $SCHEMA.$TABLE;
    CREATE UNLOGGED TABLE $SCHEMA.$TABLE (id text, author text, author_id text, tweet text, created_at text, lon float, lat float, lang text);
    \copy $SCHEMA.$TABLE FROM 'data/csv/tweets.csv' CSV HEADER DELIMITER ','
    ALTER TABLE $SCHEMA.$TABLE ADD COLUMN wkb_geometry geometry(POINT, 4326);
    UPDATE $SCHEMA.$TABLE SET wkb_geometry = ST_SetSRID(ST_MakePoint(lon, lat), 4326);
    CREATE INDEX ${TABLE}_geom_idx ON $SCHEMA.$TABLE USING gist(wkb_geometry);
    COMMIT;
EOF

First statistics

Some interesting charts and numbers follow.

Twitter REST API Data Mining on OpenShift (Part I)

More than a year ago I wrote about analyzing Twitter languages with Streaming API. Back then I kept my laptop running for a week to download data. Not a comfortable way, especially if you decide to get more data. One year uptime doesn’t sound like anything you want to be part of. OpenShift by Red Hat seems to be almost perfect replacement. Almost.

OpenShift setup

I started with Node.js application running on one small gear. Once running, you can easily git push the code to your OpenShift repo and login via SSH. I quickly found simple copy-pasting my local solution wasn’t going to work. and fixed it with some minor tweaks. That’s where the fun begins…

I based the downloader on Node.js a year ago. Until now I still don’t get how that piece of software works. Frankly, I don’t really care as long as it works.

Pitfalls

If your application doesn’t generate any traffic, OpenShift turns it off. It wakes up once someone visits again. I had no idea about that and spent some time trying to stop that behavior. Obviously, I could have scheduled a cron job on my laptop pinging it every now and then. Luckily, OpenShift can run cron jobs itself. All you need is to embed a cron cartridge into the running application (and install a bunch of ruby dependencies beforehand).

rhc cartridge add cron-1.4 -a app-name

Then create .openshift/cron/{hourly,daily,weekly,monthly} folder in the git repository and put your script running a simple curl command into one of those.

curl http://social-zimmi.rhcloud.com > /dev/null

Another problem was just around the corner. Once in a while, the app stopped writing data to the database without saying a word. What helped was restarting it - the only automatic way to do so being a git push command. Sadly, I haven’t found a way to restart the app from within itself; it probably can’t be done.

When you git push, the gear stops, builds, deploys and restarts the app. By using hot deployment you can minimize the downtime. Just put the hot_deploy file into .openshift/markers folder.

git commit --allow-empty -m "Restart gear" && git push

This solved the problem until I realize that every restart deleted all the data collected so far. If your data are to stay safe and sound, save them in process.env.OPENSHIFT_DATA_DIR (which is app-root/data).

Anacron to the rescue

How do you push an empty commit once a day? With cron of course. Even better, anacron.

mkdir ~/.anacron
cd ~/.anacron
mkdir cron.daily cron.weekly cron.monthly spool etc

cat <<EOT > ~/.anacron/etc/anacrontab

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/$HOME/bin
HOME=$HOME
LOGNAME=$USER

1 5  daily-cron nice run-parts --report $HOME/.anacron/cron.daily
7 10 weekly-cron nice run-parts --report $HOME/.anacron/cron.weekly
@monthly 15 monthly-cron nice run-parts --report $HOME/.anacron/cron.monthly

EOT

cat <<EOT >> ~/.zprofile # I use zsh shell
rm -f $HOME/.anacron/anacron.log
/usr/sbin/anacron -t /home/zimmi/.anacron/etc/anacrontab -S /home/zimmi/.anacron/spool &> /home/zimmi/.anacron/anacron.log

EOT

Anacron is to laptop what cron is to 24/7 running server. It just runs automatic jobs when the laptop is running. If it’s not and the job should be run, it runs it once the OS boots. Brilliant idea.

It runs the following code for me to keep the app writing data to the database.

#!/bin/bash

workdir='/home/zimmi/documents/zimmi/dizertace/social'
logfile=$workdir/restart-gear.log
date > $logfile

{
HOME=/home/zimmi
cd $workdir && \
git merge origin/master && \
git commit --allow-empty -m "Restart gear" && \
git push && \
echo "Success" ;
} >> $logfile 2>&1

UPDATE: Spent a long time debugging the “Permission denied (publickey).”-like errors. What seems to help is:

  1. Use id_rsa instead of any other SSH key
  2. Put a new entry into the ~/.ssh/config file

I don’t know which one did the magic though.

I’ve been harvesting Twitter for a month with about 10-15K tweets a day (only interested in the Czech Republic). 16 to 15 of them is located with latitude and longitude. More on this next time.

Installing PostGIS 2.2 with SFCGAL on Ubuntu-based OS

I’ve seen a bunch of questions on GIS StackExchange recently related to SFCGAL extension for PostGIS 2.2. Great news are it can be installed with one simple query CREATE EXTENSION postgis_sfcgal. Not so great news are you have to compile it from source for Ubuntu-based OS (14.04) as recent versions of required packages are not available in the repositories.

I tested my solution on elementary OS 0.3.1 based on Ubuntu 14.04. And it works! It installs PostgreSQL 9.4 from repositories together with GDAL and GEOS and some other libs PostGIS depends on. PostGIS itself, CGAL, Boost, MPFR and GMP are built from source.

Here comes the code (commented where needed).

sudo -i
echo "deb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main" | tee -a /etc/apt/sources.list
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
apt-get update
apt-get install -y postgresql-9.4 \
    postgresql-client-9.4 \
    postgresql-contrib-9.4 \
    libpq-dev \
    postgresql-server-dev-9.4 \
    build-essential \
    libgeos-c1 \
    libgdal-dev \
    libproj-dev \
    libjson0-dev \
    libxml2-dev \
    libxml2-utils \
    xsltproc \
    docbook-xsl \
    docbook-mathml \
    cmake \
    gcc \
    m4 \
    icu-devtools

exit # leave root otherwise postgis will choke

cd /tmp
touch download.txt
cat <<EOT >> download.txt
https://gmplib.org/download/gmp/gmp-6.0.0a.tar.bz2
https://github.com/Oslandia/SFCGAL/archive/v1.2.0.tar.gz
http://www.mpfr.org/mpfr-current/mpfr-3.1.3.tar.gz
http://downloads.sourceforge.net/project/boost/boost/1.59.0/boost_1_59_0.tar.gz
https://github.com/CGAL/cgal/archive/releases/CGAL-4.6.3.tar.gz
http://download.osgeo.org/postgis/source/postgis-2.2.0.tar.gz

EOT

cat download.txt | xargs -n 1 -P 8 wget # make wget a little bit faster

tar xjf gmp-6.0.0a.tar.bz2
tar xzf mpfr-3.1.3.tar.gz
tar xzf v1.2.0.tar.gz
tar xzf boost_1_59_0.tar.gz
tar xzf CGAL-4.6.3.tar.gz
tar xzf postgis-2.2.0.tar.gz

CORES=$(nproc)

if [[ $CORES > 1 ]]; then
    CORES=$(expr $CORES - 1) # be nice to your PC
fi

cd gmp-6.0.0
./configure && make -j $CORES && sudo make -j $CORES install

cd ..
cd mpfr-3.1.3
./configure && make -j $CORES && sudo make -j $CORES install

cd ..
cd boost_1_59_0
./bootstrap.sh --prefix=/usr/local --with-libraries=all && sudo ./b2 install # there might be some warnings along the way, don't panic
echo "/usr/local/lib" | sudo tee /etc/ld.so.conf.d/boost.conf
sudo ldconfig

cd ..
cd cgal-releases-CGAL-4.6.3
cmake . && make -j $CORES && sudo make -j $CORES install

cd ..
cd SFCGAL-1.2.0/
cmake . && make -j $CORES && sudo make -j $CORES install

cd ..
cd postgis-2.2.0
./configure \
    --with-geosconfig=/usr/bin/geos-config \
    --with-xml2config=/usr/bin/xml2-config \
    --with-projdir=/usr/share/proj \
    --with-libiconv=/usr/bin \
    --with-jsondir=/usr/include/json \
    --with-gdalconfig=/usr/bin/gdal-config \
    --with-raster \
    --with-topology \
    --with-sfcgal=/usr/local/bin/sfcgal-config && \
make && make cheatsheets && sudo make install # deliberately one CPU only

sudo -u postgres psql
sudo -u postgres createdb spatial_template
sudo -u postgres psql -d spatial_template -c "CREATE EXTENSION postgis;"
sudo -u postgres psql -d spatial_template -c "CREATE EXTENSION postgis_topology;"
sudo -u postgres psql -d spatial_template -c "CREATE EXTENSION postgis_sfcgal;"
sudo -u postgres psql -d spatial_template -c "SELECT postgis_full_version();"