Anonymized dump of your Postgres data

Oct 24, 2023 3 min read
blog post hero image

Someone on X wanted to know how to get an anonymous dump of Postgres data, but doesn’t want to install an extension in their production DB. I want to show how you can start a local database, dump the production data there, then do an anonymized dump from that without too much hassle.

Getting set up

Dockerfile:

FROM quay.io/tembo/tembo-local:latest

# Install extensions from Trunk
RUN trunk install pgcrypto
RUN trunk install postgresql_anonymizer

# Setting samples to use for anonymization
RUN cd /var/lib/postgresql/data/tembo/extension/anon && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/lorem_ipsum.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/identifiers_category.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/identifier_fr_FR.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/identifier_en_US.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/address.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/city.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/company.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/country.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/email.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/first_name.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/iban.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/last_name.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/postcode.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/siret.csv && \
  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/lorem_ipsum.csv

Build and run it like this:

docker build -t example-local-image .
docker rm --force local-tembo || true
docker run -it --name local-tembo -p 5432:5432 --rm example-local-image

This guide covers how to quickly try out Postgres extensions locally in more detail.

Dump the data into your local DB

pg_dump 'your-connection-string-here' | psql 'postgres://postgres:postgres@localhost:5432'

Anonymize the local DB

Initialize the extension:

SET session_preload_libraries = 'anon';
LOAD 'anon';
CREATE EXTENSION IF NOT EXISTS anon CASCADE;
SELECT anon.init();

For example, I have a table called “extension_owners”, and I would like to anonymize the user_name column:

postgres=# select extension_id,user_name from extension_owners limit 1;
 extension_id |      user_name
--------------+---------------------
           26 | coredb-service-user
(1 row)

I configured anonymization on that column like this:

SECURITY LABEL FOR anon ON COLUMN extension_owners.user_name
  IS 'MASKED WITH FUNCTION anon.lorem_ipsum( words := 1 )';

There are a lot of other options for anonymizing data, and you can even write your own functions. More information in these docs.

warning

This next step replaces data in the local database.

Since we are working on a local copy of the data, we can just use this function to replace anonymized columns in-place.

SELECT anon.anonymize_database();

We can see now this column has been anonymized.

postgres=# select user_name from extension_owners limit 10;
 user_name
------------
 First
 They
 Cell
 Price
 Them
 Go
 Parent
 Republican
 With
 Between
(10 rows)

You can do further modification from here, for example masking and replacing additional columns, formatting columns, etc.

Done!

Now you have an anonymized database locally. From here, you can pg_dump to a file, or do something else!

If you think this kind of thing is cool, follow me on X (@sjmiller609) for more content. At Tembo, we are all about Postgres extensions. You can try out extensions on Tembo Cloud for free.