Product arrow right Stacks arrow right Analytical arrow right Geospatial

Geospatial

The Tembo Geospatial Stack is designed to bring spatial database capabilities to PostgreSQL.

No matter whether you need to handle spatial objects, location queries, or GIS (geographic information systems)-facing workloads in general, this stack brings out-of-the-box geospatial support to PostgreSQL for quick, easy, & reliable deployments.

Container image

This stack is built with the custom image geo-cnpg. Specific technical specifications for this container image can be found within the geo-cnpg Dockerfile.

Interested in looking through the code for other Stack-specific images? You can find them all within the official tembo-images repository.

Extensions

  • pg_stat_statements - pg_stat_statements is an additional supplied module that provides valuable metrics related to query performance.
  • fuzzystrmatch - fuzzystrmatch is implemented to help with string matching including approximate or “fuzzy” matches. It’s especially useful for tasks like deduplication or linking different data sets where string data may not be exactly the same.
  • postgis - postgis is the primary PostGIS extension used to add support for geographic objects to PostgreSQL. It allows for storing and querying data based on location.
  • postgis_raster - postgis_raster is an extension for handling raster data. It’s used for storing and analyzing grid-based data like satellite imagery or digital elevation models.
  • postgis_tiger_geocoder - postgis_tiger_geocoder provides geocoding and reverse geocoding functionality. It uses the TIGER (Topologically Integrated Geographic Encoding and Referencing) data from the US Census Bureau.
  • postgis_topology - postgis_topology focuses on topological data models and functions, allowing for more advanced spatial data analysis and consistency.
  • address_standardizer - address_standardizer helps in standardizing address data to make it consistent and easier to work with, especially for geocoding purposes.
  • address_standardizer_data_us - address_standardizer_data_us provides the necessary data for the address_standardizer extension that is specifically tailored for US addresses.
  • Extensions from Trunk can be installed on-demand.

Getting started

Ready to try out some examples? Let’s walk through a scenario that would involve loading geospatial data into Postgres.

Download GDAL library

If you haven’t already, please download the GDAL library which includes the tool ogr2ogr.

Download sample dataset

For the purposes of this demonstration, we will utilize the PostGIS-supplied New York City data bundle.

You can follow the PostGIS tutorial here or download this dataset directly from this link.

Once downloaded, move the .zip file to your target directory and unzip.

Setup

Once you’ve established a Tembo Geospatial Stack instance, you can copy the connection string from the UI and execute it in your terminal. Alternatively, you can fill in and run the following psql command:

psql 'postgresql://postgres:<your-password>@<your-host>:5432/postgres'

Define a database

While the default database is postgres, it may be useful to create a separate database for your geospatial workload.

This can be achieved by running CREATE DATABASE <your-database> once you’ve connected to Tembo. You can then switch the connected database by running \c <your-database>.

If you’d like to learn more, check out our guide on how to select a database in Postgres.

Load the data

Navigate to your local directory where the data is stored; the path will look similar to path/to/target/directory/postgis-workshop/data/.

For this simple exercise, we’ll focus exclusively on shapefiles, but bear in mind that they alone do not represent the entire dataset.

Consider the following files:

  • nyc_cencus_blocks.shp
  • nyc_homicides.shp
  • nyc_neighborhoods.shp
  • nyc_streets.shp
  • nyc_subway_stations.shp

PostGIS does a great job in their free workshop on how to use ogr2ogr with select flags to load data into Postgres.

Their explanation can be found here.

With the following command, we can load files (individually) into Postgres:

ogr2ogr \
  -nln nyc_streets \
  -nlt PROMOTE_TO_MULTI \
  -lco GEOMETRY_NAME=geom \
  -lco FID=gid \
  -lco PRECISION=NO \
  Pg:"dbname=<your-database> host=<your-host> user=postgres" \
  nyc_streets.shp

Loading files individually may take some time, so an alternative may be to create a script that iterates across target files and loads them from one executable.

PLEASE NOTE: The following is strictly for demonstration purposes and we strongly advise against hardcoding sensitive information in a script like the one below.

Start by creating a file:

touch <your-file-name>.sh

Then, with your preferred text editor or IDE (integrated development environment), introduce the following script to your file:

#!/bin/bash

DATABASE="dbname=<your-database> \
host=<your-host> \
user=postgres \
password=<your-password>"

for f in *.shp
do
  echo "Processing $f file..."
  TABLE_NAME=$(basename "$f" .shp)
  ogr2ogr \
    -nln "$TABLE_NAME" \
    -nlt PROMOTE_TO_MULTI \
    -lco GEOMETRY_NAME=geom \
    -lco FID=gid \
    -lco PRECISION=NO \
    Pg:"$DATABASE" \
    "$f"
done

Make the file executable by running the following:

chmod +x <your-file-name>.sh

You can then load the data by running the following command within the local, target file-containing directory:

./<your-file-name>.sh

Confirm successful data upload

If you’re not already connected, you can connect to your database by following the same instructions as laid out in the Setup section above.

Then, simply confirm you are in the correct database and run \t to list the current tables.

Sample queries

Wondering what kinds of queries you’re able to run at this point? Let’s view some examples.

Query 1

How many rows are there in the nyc_streets table?

SELECT COUNT(*) FROM nyc_streets;

Result:

 count
-------
 19091
(1 row)

Query 2

Which fields (columns) are represented in the nyc_streets table?

SELECT
    column_name,
    data_type
FROM
    information_schema.columns
WHERE
    table_name = 'nyc_streets';

Result:

 column_name |     data_type
-------------+-------------------
 gid         | integer
 id          | bigint
 geom        | USER-DEFINED
 name        | character varying
 oneway      | character varying
 type        | character varying
(6 rows)

Query 3

What are all the streets in the nyc_streets table that have the Ave suffix (while limiting results to 10 rows)?

SELECT name FROM nyc_streets WHERE name LIKE '%Ave%' LIMIT 10;

Result:

      name
----------------
 Avenue O
 Avenue Z
 Avenue Y
 Avenue N
 Carlton Ave
 Ryder Ave
 Willoughby Ave
 Gee Ave
 Myrtle Ave
 7th Ave
(10 rows)

Query 4

How many results are available for the specific street Lexington Avenue?

SELECT COUNT(*)
FROM nyc_streets
WHERE name = 'Lexington Ave';

Result:

 count
-------
     5
(1 row)

Query 5

What is the SRID (Spatial Reference Identifier) of the data in this table, and if possible, how can I change it?

SELECT DISTINCT ST_SRID(geom) FROM nyc_streets;

Result:

 st_srid
---------
    4326
(1 row)

Say the SRID was actually 26918. How can I change the SRID to 4326?

ALTER TABLE nyc_streets
ALTER COLUMN geom
TYPE geometry(Geometry, 4326)
USING ST_Transform(ST_SetSRID(geom, 26918), 4326);

Result:

ALTER TABLE

Next

chevron right arrow

Analytics

Analytical

postgresgeospatialanalytical