Michal ZimmermannPieces of knowledge from the world of GIS.

Wifileaks Wi-Fi Networks Dataviz

Wifileaks is a project by Jakub Čížek aimed to map the Czech wi-fi networks with Android/iOS app. The data gathered by people using the app is available to download and features ~ 90,000,000 records, each representing the position of the cellphone when connecting to the network. Just about perfect to craft some maps!

Using PostgreSQL cstore_fdw

I ran out of disk space immediately after loading the dataset into the PostgreSQL database. After fiddling around I remembered that columnar store should be a bit space-friendlier than the old fashioned relational database. Thus, I installed the cstore_fdw by Citus Data in just few steps.

sudo apt install libprotobuf-c-dev libprotobuf-c1 protobuf-c-compiler postgresql-server-dev-9.6
git clone [email protected]:citusdata/cstore_fdw.git
PATH=/usr/bin/:$PATH make
PATH=/usr/bin/:$PATH make install

# when the cstore_fdw installation finishes, add the following line to your postgresql.conf and restart the database cluster
shared_preload_libraries = 'cstore_fdw'

This makes another FDW available to you inside the PostgreSQL. The actual foreign server has to be created before loading the data into a foreign table.

cat <<END | psql -qAt --no-psqlrc
    CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;
    CREATE SCHEMA data_cstore;
    CREATE FOREIGN TABLE data_cstore.wifi (
        id integer,
        mac text,
        ssid text,
        signal_strength numeric,
        security integer,
        lat numeric,
        lon numeric,
        alt numeric,
        unixtime bigint,
        filename text
    )
    SERVER cstore_server
    OPTIONS (compression 'pglz');
END

The foreign table is 3× smaller than it’s standard counterpart. However, this comes with some costs:

To overcome these shortcomings I used COPY statement to spit out the slightly modified table and immediately loaded it back in.

cat <<END | psql -qAt --no-psqlrc
COPY (
    SELECT
        row_number() OVER (),
        mac,
        ssid,
        signal_strength,
        security,
        split_part(filename, '_', 2)::integer,
        to_timestamp(unixtime),
        ST_Transform(ST_SetSRID(ST_MakePoint(lon, lat, alt), 4326), 32633)
    FROM data_cstore.wifi
    WHERE lon BETWEEN 0 AND 20
        AND lat BETWEEN 18 AND 84
) TO '/tmp/wifileaks.db' WITH CSV DELIMITER ';'
    DROP SCHEMA IF EXISTS data_cstore CASCADE;

DROP SCHEMA data_cstore;
CREATE SCHEMA data_cstore;
CREATE FOREIGN TABLE data_cstore.wifi (
    id integer,
    mac text,
    ssid text,
    signal_strength numeric,
    security integer,
    userid integer,
    unixtime timestamp without time zone,
    geom geometry(POINTZ, 32633)
)
SERVER cstore_server
OPTIONS (compression 'pglz');
END

Putting the networks on the map

As mentioned, each row of data represents the cellphone’s location when connecting to a wi-fi network. To get real wi-fi transmitter position, I calculated the average of location of each cellphone ever connected (although the signal strength should be taken into account here as well).

CREATE UNLOGGED TABLE data_cstore.wifi_avg_loc AS
SELECT
    row_number() OVER () id,
    mac,
    ST_SetSRID(ST_MakePoint(x, y), 32633) geom
FROM (
    SELECT
        mac,
        AVG(ST_X(geom)) x,
        AVG(ST_Y(geom)) y
    FROM data_cstore.wifi_loc
    GROUP BY 1
) a;