Someone on X wanted to know how to get an anonymous dump of Postgres data, but doesn’t want to install an extension in their production DB. I want to show how you can start a local database, dump the production data there, then do an anonymized dump from that without too much hassle.
Getting set up
Dockerfile:
FROM quay.io/tembo/tembo-local:latest
# Install extensions from Trunk
RUN trunk install pgcrypto
RUN trunk install postgresql_anonymizer
# Setting samples to use for anonymization
RUN cd /var/lib/postgresql/data/tembo/extension/anon && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/lorem_ipsum.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/identifiers_category.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/identifier_fr_FR.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/identifier_en_US.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/address.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/city.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/company.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/country.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/email.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/first_name.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/iban.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/last_name.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/postcode.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/siret.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/lorem_ipsum.csv
Build and run it like this:
docker build -t example-local-image .
docker rm --force local-tembo || true
docker run -it --name local-tembo -p 5432:5432 --rm example-local-image
This guide covers how to quickly try out Postgres extensions locally in more detail.
Dump the data into your local DB
pg_dump 'your-connection-string-here' | psql 'postgres://postgres:postgres@localhost:5432'
Anonymize the local DB
Initialize the extension:
SET session_preload_libraries = 'anon';
LOAD 'anon';
CREATE EXTENSION IF NOT EXISTS anon CASCADE;
SELECT anon.init();
For example, I have a table called “extension_owners”, and I would like to anonymize the user_name column:
postgres=# select extension_id,user_name from extension_owners limit 1;
extension_id | user_name
--------------+---------------------
26 | coredb-service-user
(1 row)
I configured anonymization on that column like this:
SECURITY LABEL FOR anon ON COLUMN extension_owners.user_name
IS 'MASKED WITH FUNCTION anon.lorem_ipsum( words := 1 )';
There are a lot of other options for anonymizing data, and you can even write your own functions. More information in these docs.
This next step replaces data in the local database.
Since we are working on a local copy of the data, we can just use this function to replace anonymized columns in-place.
SELECT anon.anonymize_database();
We can see now this column has been anonymized.
postgres=# select user_name from extension_owners limit 10;
user_name
------------
First
They
Cell
Price
Them
Go
Parent
Republican
With
Between
(10 rows)
You can do further modification from here, for example masking and replacing additional columns, formatting columns, etc.
Done!
Now you have an anonymized database locally. From here, you can pg_dump to a file, or do something else!
If you think this kind of thing is cool, follow me on X (@sjmiller609) for more content. At Tembo, we are all about Postgres extensions. You can try out extensions on Tembo Cloud for free.