PostGIS Data Anonymization
Among all the sensitive spatial data being collected through cellphones and credit cards, our address of residency is probably the most delicate one. Can it be anonymized/pseudonymized/obscured before you share it with your business partners?
Imagine given a set of address points for each of your clients and the set of all address points in the country, you should adjust it in the following way:
- find the two nearest address points for each address point of your client
- find the center of these two and the client address point
- measure the distance of the computed center to each of three points and keep the maximum value
- make the biggest distance even bigger by adding 10 % of its value
- ceil the value
- output the new position and the ceiled distance
This shifts each address point by a dynamic distance, giving us at least three points within the given distance (one of them being the original address point).
SELECT tmp.code, ST_X(tmp.new_position) x, ST_Y(tmp.new_position) y, ceil(MAX(biggest_distance) + MAX(biggest_distance) * 0.1) round_distance FROM ( SELECT tmp.code, tmp.geom, ST_Centroid((ST_Union(two_closest_points, tmp.geom))) new_position, -- get distance to two closest points and the client address point ST_Centroid((ST_Union(two_closest_points, tmp.geom))) <-> (ST_DumpPoints(ST_Union(two_closest_points, tmp.geom))).geom biggest_distance FROM ( SELECT r1.code, r1.geom, ST_Union(neighbours.geom) two_closest_points FROM address_points r1, LATERAL ( -- keep two closest points to each client address point SELECT r2.code, r2.geom, r1.geom <-> r2.geom distance FROM address_points r2 WHERE r1.code <> r2.code ORDER BY r1.geom <-> r2.geom ASC LIMIT 2 ) neighbours GROUP BY r1.code, r1.geom ) tmp ) tmp GROUP BY tmp.code, tmp.geom, tmp.new_position;
You might want to use
LATERAL for tasks like this.