Skip to main content

2 posts tagged with "trunk"

View All Tags

· 18 min read
Steven Miller

back-in-time

A nice feature of AWS S3 is version history and lifecycle policies. When objects are updated or deleted, the old object version remains in the bucket, but it’s hidden. Old versions are deleted eventually by the lifecycle policy.

I would like something like that for my Postgres table data. We can use the temporal_tables extension for version history, and combine it with pg_partman to partition by time, automatically expiring old versions.

Data model

Let's say we have a table employees, and it looks like this:

       name       |  salary
------------------+----------
Bernard Marx | 10000.00
Lenina Crowne | 7000.00
Helmholtz Watson | 18500.00

We will add one more column to this table, sys_period, which is a time range. This time range represents "since when" is this row the current version. This range is unbounded on the right side, because all the rows in the employees table are the present version.

       name       |  salary  |             sys_period
------------------+----------+------------------------------------
Helmholtz Watson | 18500.00 | ["2023-09-28 13:30:19.24318+00",)
Bernard Marx | 11600.00 | ["2023-09-28 13:33:58.735932+00",)
Lenina Crowne | 11601.00 | ["2023-09-28 13:33:58.738827+00",)

We will make a new table employees_history to store previous versions. This will have the same columns as the employees table, but all the rows in sys_period are bounded on the the right and the left sides. These ranges represent when this row was the current version. We will configure temporal_tables to automatically create these rows when anything changes in the employees table.

     name      |  salary  |                            sys_period
---------------+----------+-------------------------------------------------------------------
Bernard Marx | 10000.00 | ["2023-09-28 13:30:19.18544+00","2023-09-28 13:33:58.683279+00")
Bernard Marx | 11200.00 | ["2023-09-28 13:33:58.683279+00","2023-09-28 13:33:58.731332+00")
Bernard Marx | 11400.00 | ["2023-09-28 13:33:58.731332+00","2023-09-28 13:33:58.735932+00")
Lenina Crowne | 7000.00 | ["2023-09-28 13:30:19.239152+00","2023-09-28 13:33:58.738827+00")

To automatically delete old versions, we'll add one more column to the employees_table, created_at. We will use this information to expire old versions after they are older than our retenion configuration, with the help of pg_partman.

Getting set up

This guide covers how to quickly try out Postgres extensions locally. I've followed that guide to set up my environment with temporal_tables and pg_partman.

I have a Dockefile, two SQL scripts, and a file with Postgres configurations.

.
├── Dockerfile
├── 0_startup.sql
├── 1_create_versioned_table.sql
└── custom.conf

Dockerfile: We use Trunk to install pg_partman and temporal_tables. Then, we copy the three other files into the image.

FROM quay.io/tembo/tembo-local:latest

RUN trunk install pg_partman
RUN trunk install temporal_tables

COPY 0_startup.sql $PGDATA/startup-scripts

COPY 1_create_versioned_table.sql $PGDATA/startup-scripts

COPY custom.conf $PGDATA/extra-configs

0_startup.sql: Enables temporal_tables and pg_partman when Postgres starts.

CREATE EXTENSION IF NOT EXISTS temporal_tables;
CREATE EXTENSION IF NOT EXISTS pg_partman;

1_create_versioned_table.sql: Creates a sample table, then enables version history on it.

-- Sample: an existing table we want to enable versioning on
CREATE TABLE employees
(
name text NOT NULL PRIMARY KEY,
department text,
salary numeric(20, 2)
);

/*
Adding version history to the table,
first we need to add a time range to the existing table.
This represents "since when" has this row been current.
*/
ALTER TABLE employees ADD COLUMN sys_period tstzrange NOT NULL;

/*
Creating a time-partitioned version table
each row has the range the data was valid for,
and also the time this version was created.
*/
CREATE TABLE employees_history (
LIKE employees INCLUDING DEFAULTS EXCLUDING INDEXES EXCLUDING CONSTRAINTS,
created_at timestamptz NOT NULL DEFAULT now())
PARTITION BY RANGE (created_at);

-- Allow efficient querying of partition key and name
CREATE INDEX ON employees_history (created_at);

/*
Enable automatic partitioning with pg_partman, partitioning every 1 minute.

It's more realistic to partition daily or greater.
*/
SELECT create_parent('public.employees_history', 'created_at', 'native', '1 minute');

-- This connects employees table to employees_history
CREATE TRIGGER versioning_trigger
BEFORE INSERT OR UPDATE OR DELETE ON employees
FOR EACH ROW EXECUTE PROCEDURE versioning('sys_period',
'employees_history',
true);

/*
Configure retention policy for employee history to keep old versions for 10 minutes.

It's more realistic to configure retention for 1 year.
*/
UPDATE part_config
SET retention = '10 minutes',
retention_keep_table = false,
retention_keep_index = false,
infinite_time_partitions = true
WHERE parent_table = 'public.employees_history';

custom.conf: our additions to the Postgres configuration.

# Enable pg_partman background worker
shared_preload_libraries = 'pg_partman_bgw'

# How many seconds between pg_partman background worker runs
# It's more realistic to run every 3600 seconds, or longer
pg_partman_bgw.interval = 10

# Which database pg_partman should target
pg_partman_bgw.dbname = 'postgres'

# It's best practice to use limited permissions for the background worker
# pg_partman_bgw.role = 'limitedrole'

# This was helpful when I was working on getting the settings working
# log_min_messages = 'DEBUG1'

With those four files in place, we can run Postgres like this:

docker build -t example-local-image .
docker run -it -d --name local-tembo -p 5432:5432 --rm example-local-image

In a separate shell, I connect into the Postgres container.

psql postgres://postgres:postgres@localhost:5432

Basic demo of saving old versions

After we are set up, we have version history and retention policy configured on the employees table, but both the employees table and the employees_history table are empty.

SELECT * FROM employees;
 name | department | salary | sys_period
------+------------+--------+------------
(0 rows)
SELECT * FROM employees_history;
 name | department | salary | sys_period | created_at
------+------------+--------+------------+------------
(0 rows)

Adding data:

INSERT INTO employees (name, department, salary)
VALUES ('Bernard Marx', 'Hatchery and Conditioning Centre', 10000);

INSERT INTO employees (name, department, salary)
VALUES ('Lenina Crowne', 'Hatchery and Conditioning Centre', 7000);

INSERT INTO employees (name, department, salary)
VALUES ('Helmholtz Watson', 'College of Emotional Engineering', 18500);

Now, the employees has some data, and employees_history is still empty.

SELECT name, salary, sys_period FROM employees;
       name       |   salary  |             sys_period
------------------+-----------+------------------------------------
Bernard Marx | 10000.00 | ["2023-09-28 20:23:14.840624+00",)
Lenina Crowne | 7000.00 | ["2023-09-28 20:23:14.911528+00",)
Helmholtz Watson | 18500.00 | ["2023-09-28 20:23:14.913555+00",)
(3 rows)
SELECT * FROM employees_history;
 name | department | salary | sys_period | created_at
------+------------+--------+------------+------------
(0 rows)

Modifying data:

UPDATE employees SET salary = 11200 WHERE name = 'Bernard Marx';
UPDATE employees SET salary = 11400 WHERE name = 'Bernard Marx';
UPDATE employees SET salary = 11600 WHERE name = 'Bernard Marx';
UPDATE employees SET salary = 11601 WHERE name = 'Lenina Crowne';

Now, the employees_history table has past versions.

SELECT name, salary, sys_period FROM employees;
       name       |  salary  |             sys_period
------------------+----------+------------------------------------
Helmholtz Watson | 18500.00 | ["2023-09-28 20:23:14.913555+00",)
Bernard Marx | 11600.00 | ["2023-09-28 20:23:50.731597+00",)
Lenina Crowne | 11601.00 | ["2023-09-28 20:23:50.734214+00",)
(3 rows)
SELECT name, salary, sys_period FROM employees_history;
     name      |  salary  |                            sys_period
---------------+----------+-------------------------------------------------------------------
Bernard Marx | 10000.00 | ["2023-09-28 20:23:14.840624+00","2023-09-28 20:23:50.684293+00")
Bernard Marx | 11200.00 | ["2023-09-28 20:23:50.684293+00","2023-09-28 20:23:50.727283+00")
Bernard Marx | 11400.00 | ["2023-09-28 20:23:50.727283+00","2023-09-28 20:23:50.731597+00")
Lenina Crowne | 7000.00 | ["2023-09-28 20:23:14.911528+00","2023-09-28 20:23:50.734214+00")
(4 rows)

Looking up past versions

Let's say we want to look up Bernard's salary at a previous date. We can check the employees_history table to find the row where the time range matches our provided timestamp. However, this wouldn't find the correct salary if we provide a timestamp that is after the most recent update to Bernard's salary, since that row is in the employees table.

We can first create a view for this purpose. We only need to do this once, then we can query this view like a table going forward.

CREATE VIEW employee_history_view AS

SELECT name, department, salary, sys_period
FROM employees

UNION ALL

SELECT name, department, salary, sys_period
FROM employees_history;

Then, we can use this query to find Bernard's salary at any given date.

SELECT salary
FROM employee_history_view
WHERE name = 'Bernard Marx'
AND sys_period @> TIMESTAMP WITH TIME ZONE '2023-09-28 20:23:30+00'
LIMIT 1;

@> Is a containment operator and you might recognize it if you have used JSONB.

Comparing to the employees_history table shown above, it is returning the correct value.

  salary
----------
10000.00
(1 row)

It also works to look up the current salary:

SELECT salary
FROM employee_history_view
WHERE name = 'Bernard Marx'
AND sys_period @> now()::TIMESTAMP WITH TIME ZONE
LIMIT 1;
  salary
----------
11600.00
(1 row)
SELECT salary FROM employees WHERE name = 'Bernard Marx';
  salary
----------
11600.00
(1 row)

If I try to query a salary from the future, it will return the current salary. If I try to query a salary from before Bernard is known in the employees_history table, then I get an empty result.

Partitioning

What is partitioning? Postgres documentation has detailed information on partitioning but just to summarize, partitioning is about splitting what is logically one large table into smaller tables. Typically, this is done for query performance. In our case, we are partitioning to expire old versions.

Partitioning tables is something I’m familiar with from Tembo’s work in PGMQ, which is a queueing extension for Postgres.

Performance

Writes

We should expect write performance to be slower, since we are writing to two tables for every update.

I created a new table that does not have versioning enabled to compare write performance.

-- Create a table like employees
CREATE TABLE employees_write_test
AS TABLE employees
WITH NO DATA;

-- ...and insert one row
INSERT INTO employees_write_test (name, department, salary, sys_period)
VALUES ('Bernard Marx', 'Hatchery and Conditioning Centre', 11600.00, tstzrange(now(), null));

Then, I used EXPLAIN ANALYZE to compare the write performance. I ran the query a few times for each.

Without versioning:

EXPLAIN ANALYZE
UPDATE employees_write_test
SET salary = 11608 WHERE name = 'Bernard Marx';

Three samples:

 Planning Time: 1.654 ms
Execution Time: 1.540 ms

Planning Time: 0.760 ms
Execution Time: 0.707 ms

Planning Time: 1.707 ms
Execution Time: 2.079 ms

With versioning:

EXPLAIN ANALYZE
UPDATE employees
SET salary = 11610 WHERE name = 'Bernard Marx';

Three samples:

 Planning Time: 2.423 ms
Trigger versioning_trigger: time=2.430 calls=1
Execution Time: 4.783 ms

Planning Time: 2.311 ms
Trigger versioning_trigger: time=1.091 calls=1
Execution Time: 2.979 ms

Planning Time: 2.825 ms
Trigger versioning_trigger: time=1.711 calls=1
Execution Time: 5.686 ms

It's more than twice as slow on a single update. That's because we have to write to two rows instead of one, there is more data to write (the time ranges), and because there is some additional processing, for instance determining which range to put on each row. In the next section, I also compare how much time it takes to write 100,000 rows in each of these tables.

Reads

We created a view which is a union between employees and employees_history, then we query the view to find an employee's salary at a given time.

To generate some data, let's make a procedure to update a salary 100,000 times in a row. The below example uses PL/pgSQL. By default, PL/pgSQL functions run as a single transaction, so it would only result in a single update to the employees_history table. For this reason, I am using a procedure with COMMIT so that each increment will be a separate transaction, this way we also get 100,000 updates to the employees_history table. I had to explain that nuance to chatGPT in order for this procedure to be produced properly.

-- Table name and employee name as inputs
CREATE OR REPLACE PROCEDURE increment_salary(p_name text, p_table_name text)
LANGUAGE plpgsql AS $$
DECLARE
v_salary numeric(20,2);
i integer;
v_sql text;
BEGIN
-- Dynamically construct the SQL to get the current salary
v_sql := format('SELECT salary FROM %I WHERE name = $1', p_table_name);
EXECUTE v_sql INTO v_salary USING p_name;

-- Loop 100 thousand times
FOR i IN 1..100000
LOOP
-- Increment the salary
v_salary := v_salary + 1;

-- Dynamically construct the SQL to update the salary
v_sql := format('UPDATE %I SET salary = $2 WHERE name = $1', p_table_name);
EXECUTE v_sql USING p_name, v_salary;

COMMIT; -- Commit the transaction, triggering the versioning procedure
END LOOP;
END
$$;

Run the procedure:

CALL increment_salary('Bernard Marx', 'employees');

This took 55 seconds to run on my laptop. I also tried it on the table without versioning enabled, at in this case it took 38 seconds. I ran it a couple more times on the table with versioning enabled, so that the versions would be distributed across multiple partitions. Now we have an employees_history table that's populated with many rows for Bernard.

SELECT count(*) FROM employees_history WHERE name = 'Bernard Marx';
 count
--------
300000
(1 row)

Let's run the same type of query command we ran before, with EXPLAIN ANALYZE. I picked a timestamp that will not be found to ensure it's as slow as possible.

EXPLAIN ANALYZE
SELECT salary
FROM employee_history_view
WHERE name = 'Bernard Marx'
AND sys_period @> TIMESTAMP WITH TIME ZONE '2023-09-28 15:28:25+00'
LIMIT 1;

Simplified query plan output:

Limit
-> Append
-> Bitmap Heap Scan on employees
Recheck Cond: (name = 'Bernard Marx'::text)
Filter: (sys_period @> '...')
Rows Removed by Filter: 1
Heap Blocks: exact=1
-> Bitmap Index Scan on employees_pkey
Index Cond: (name = 'Bernard Marx'::text)

... Empty partitions omitted ...

-> Seq Scan on employees_history_p2023_09_29_0030
Filter: ((sys_period @> '...') AND (name = 'Bernard Marx'::text))
Rows Removed by Filter: 31

-> Seq Scan on employees_history_p2023_09_29_0031
Filter: ((sys_period @> '...') AND (name = 'Bernard Marx'::text))
Rows Removed by Filter: 99969

... Empty partitions omitted ...

-> Seq Scan on employees_history_p2023_09_29_0035
Filter: ((sys_period @> '...') AND (name = 'Bernard Marx'::text))
Rows Removed by Filter: 97393

-> Seq Scan on employees_history_p2023_09_29_0036
Filter: ((sys_period @> '...') AND (name = 'Bernard Marx'::text))
Rows Removed by Filter: 102607

... Empty partitions omitted ...

Planning Time: 12.427 ms
Execution Time: 262.706 ms
(47 rows)

This query took 263 milliseconds. We notice this query needs to scan all partitions, because we are partitioning by created_at, and querying sys_period. We can improve the speed with indexes.

If this was a real workload, I doubt that employees' salaries are being updated so frequently, or at least that's been the case in my personal experience. However, if it's a big company, then there could be a lot of employees. In that case, it would be best to add an index on the name (or more realistically, employee ID) in the employees_history table. Then, withing each partition it will find only rows for the employee being queryed using the index, then it would scan the remaining rows, probably typically zero, one, or two rows, to find the correct salary.

Expiring old versions

Earlier in this blog, we configured pg_partman to partition in 1 minute increments, to expire partitions that are older than 15 minutes, and to check every 30 seconds. Every 30 seconds, any partition that is older that 15 minutes is deleted by the pg_partman background worker.

With this query, I can check how many rows and the total data size in each partition.

-- This query was produced by ChatGPT 4 with the prompt:
-- "How can I check the number of rows in each partition of employees_history?"
SELECT
child.relname AS partition_name,
pg_total_relation_size(child.oid) AS total_size,
pg_relation_size(child.oid) AS data_size,
pg_stat_user_tables.n_live_tup AS row_count
FROM
pg_inherits
JOIN
pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN
pg_class child ON pg_inherits.inhrelid = child.oid
LEFT JOIN
pg_stat_user_tables ON child.oid = pg_stat_user_tables.relid
WHERE
parent.relname='employees_history'
ORDER BY
partition_name;

In order to check that old versions are being dropped, I ran the procedure to create a lot of salaray increments several times in a row.

Then, running the above query, I find an output like this:

           partition_name           | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2204 | 16384 | 0 | 0
employees_history_p2023_09_28_2205 | 16384 | 0 | 0
employees_history_p2023_09_28_2206 | 16384 | 0 | 0
employees_history_p2023_09_28_2207 | 16384 | 0 | 0
employees_history_p2023_09_28_2208 | 16384 | 0 | 0
employees_history_p2023_09_28_2209 | 16384 | 0 | 0
employees_history_p2023_09_28_2210 | 32768 | 8192 | 4
employees_history_p2023_09_28_2211 | 9584640 | 7995392 | 68267
employees_history_p2023_09_28_2212 | 4489216 | 3719168 | 31733
employees_history_p2023_09_28_2213 | 13180928 | 11018240 | 94144
employees_history_p2023_09_28_2214 | 868352 | 688128 | 5856
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
(16 rows)

In this output, we can see that we have 1 partition for every minute, and a total of 15 partitions. I have old versions expiring after 10 minutes. I thought it's interesting to note that pg_partman is preemptively creating partitions for the future, in this case 5 minutes into the future.

If you refer to the original set up steps, I have configured infinite_time_partitions = true, and this means we will generate partitions even when we are not generating any data for them. I think this is the proper configuration since we also have a retention policy that will drop the old partitions. The concern of making infinite partitions as time passes, even if no data is being generated, is not applicable because old tables are being dropped.

To confirm data was being deleted, I sampled the above query over time, and we can see the large body of inserts moving up into the oldest available partitions, then falling outside of the retention policy and being deleted.


partition_name | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2207 | 16384 | 0 | 0
employees_history_p2023_09_28_2208 | 16384 | 0 | 0
employees_history_p2023_09_28_2209 | 16384 | 0 | 0
employees_history_p2023_09_28_2210 | 32768 | 8192 | 4
employees_history_p2023_09_28_2211 | 9584640 | 7995392 | 68267
employees_history_p2023_09_28_2212 | 4489216 | 3719168 | 31733
employees_history_p2023_09_28_2213 | 13189120 | 11018240 | 94144
employees_history_p2023_09_28_2214 | 876544 | 688128 | 5856
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
employees_history_p2023_09_28_2219 | 16384 | 0 | 0
employees_history_p2023_09_28_2220 | 16384 | 0 | 0
employees_history_p2023_09_28_2221 | 16384 | 0 | 0
(16 rows)


partition_name | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2211 | 9584640 | 7995392 | 68267
employees_history_p2023_09_28_2212 | 4489216 | 3719168 | 31733
employees_history_p2023_09_28_2213 | 13189120 | 11018240 | 94144
employees_history_p2023_09_28_2214 | 876544 | 688128 | 5856
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
employees_history_p2023_09_28_2219 | 16384 | 0 | 0
employees_history_p2023_09_28_2220 | 16384 | 0 | 0
employees_history_p2023_09_28_2221 | 16384 | 0 | 0
employees_history_p2023_09_28_2222 | 16384 | 0 | 0
employees_history_p2023_09_28_2223 | 16384 | 0 | 0
employees_history_p2023_09_28_2224 | 16384 | 0 | 0
employees_history_p2023_09_28_2225 | 16384 | 0 | 0
(16 rows)


partition_name | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2212 | 4489216 | 3719168 | 31733
employees_history_p2023_09_28_2213 | 13189120 | 11018240 | 94144
employees_history_p2023_09_28_2214 | 876544 | 688128 | 5856
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
employees_history_p2023_09_28_2219 | 16384 | 0 | 0
employees_history_p2023_09_28_2220 | 16384 | 0 | 0
employees_history_p2023_09_28_2221 | 16384 | 0 | 0
employees_history_p2023_09_28_2222 | 16384 | 0 | 0
employees_history_p2023_09_28_2223 | 16384 | 0 | 0
employees_history_p2023_09_28_2224 | 16384 | 0 | 0
employees_history_p2023_09_28_2225 | 16384 | 0 | 0
employees_history_p2023_09_28_2226 | 16384 | 0 | 0
(16 rows)

postgres=# select count(*) from employees_history;
count
--------
131733
(1 row)

partition_name | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2214 | 876544 | 688128 | 5856
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
employees_history_p2023_09_28_2219 | 16384 | 0 | 0
employees_history_p2023_09_28_2220 | 16384 | 0 | 0
employees_history_p2023_09_28_2221 | 16384 | 0 | 0
employees_history_p2023_09_28_2222 | 16384 | 0 | 0
employees_history_p2023_09_28_2223 | 16384 | 0 | 0
employees_history_p2023_09_28_2224 | 16384 | 0 | 0
employees_history_p2023_09_28_2225 | 16384 | 0 | 0
employees_history_p2023_09_28_2226 | 16384 | 0 | 0
employees_history_p2023_09_28_2227 | 16384 | 0 | 0
employees_history_p2023_09_28_2228 | 16384 | 0 | 0
(16 rows)

postgres=# select count(*) from employees_history;
count
-------
5856
(1 row)

partition_name | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
employees_history_p2023_09_28_2219 | 16384 | 0 | 0
employees_history_p2023_09_28_2220 | 16384 | 0 | 0
employees_history_p2023_09_28_2221 | 16384 | 0 | 0
employees_history_p2023_09_28_2222 | 16384 | 0 | 0
employees_history_p2023_09_28_2223 | 16384 | 0 | 0
employees_history_p2023_09_28_2224 | 16384 | 0 | 0
employees_history_p2023_09_28_2225 | 16384 | 0 | 0
employees_history_p2023_09_28_2226 | 16384 | 0 | 0
employees_history_p2023_09_28_2227 | 16384 | 0 | 0
employees_history_p2023_09_28_2228 | 16384 | 0 | 0
employees_history_p2023_09_28_2229 | 16384 | 0 | 0
(16 rows)

postgres=# select count(*) from employees_history;
count
-------
0
(1 row)

Thanks!

If you got this far, thank you for reading this! I hope that you are inspired to try out extensions on your own and see what they can do. The next time you have some problem to solve with your data, consider that maybe it could just be handled by a Postgres extension.

If you want to try extensions without any local setup, you should try Tembo Cloud at cloud.tembo.io.

Just use Postgres!

· 13 min read
Steven Miller

Before working for a Postgres company, I had never used extensions.

Now, I'm part of a team working to fully automate turning on any extension. I didn't find great resources to explain the process of turning on an extension, and why it varies between different extensions.

I want to share my mental model for the different types of extensions, how to know what type of extension you're working with, and how to get it turned on.

Turn on an extension

one-does-not-simply

What's traditionally involved:

  • Find the extension you want
  • Figure out how to build it
  • Sometimes, installation of dependencies (for example with apt-get or yum)
  • Sometimes, installation of other extensions (goto ‘figure out how to build it’)
  • Install your extension
  • Sometimes, load a library
  • Sometimes, provide extension-specific configurations
  • Sometimes, run the CREATE EXTENSION command

Building and installing extensions is well covered by other resources. In this blog, I want to focus on steps to get an extension up and running after it's installed, and how I believe that all extensions fit into four mostly-tidy categories.

Terminology

Extensions consist of SQL and / or libraries.

A library simply means compiled code, for example written in C or Rust.

SQL objects, let's just call it SQL, are extensions of SQL, for example new functions and data types. These are often implemented by a library, but can also be implemented in other ways, for example using a procedural language like PL/pgSQL.

Hooks: A Postgres feature informally called hooks can be used to connect into Postgres' existing functionality. Hooks allow for overwriting default Postgres functionality, or calling back into an extension's code at the appropriate time. For example, one type of hook can modify Postgres start up behavior to launch a background worker, and a different type of hook can be used to redirect queries to a different table.

note

Sometimes extensions are instead referred to as 'modules', but I like to simply refer to everything as an 'extension', but feel free to @ me on X to tell me why that's wrong (@sjmiller609).

Enter the matrix

im-in-matrix

A big part of what I have been working on is fully automating enabling any extension. In order to do that, we have to understand exactly how extensions vary. We can break it down into a 2x2 matrix by defining two boolean categories.

Requires LOAD true or false and requires CREATE EXTENSION true or false:

Requires CREATE EXTENSIONDoes not require CREATE EXTENSION
Requires LOADExtensions that use SQL and their libraries have hooksExtensions that do not use SQL, may or may not have hooks
Does not require LOADSQL-only extensions, and SQL + libraries without hooksOutput plugins

By categorizing an extension into this 2x2 matrix, we can know how to turn it on.

LOAD

LOAD is a command that tells Postgres to load a library, meaning make the code accessible to Postgres by loading the compiled code on disk into memory. If a library has hooks, performing a load will activate the hooks.

Requires LOAD: true means you have to do one of the following steps to load a library:

  • LOAD: using the LOAD command directly loads a library for the current connection only
  • session_preload_libraries: configuration, specifies which libraries to LOAD for new connections
  • shared_preload_libraries: configuration, specifies which libraries to LOAD at server start, and therefore requires a restart
note

Even though code is loaded in other ways during CREATE EXTENSION, that is not requires LOAD: true under this definition. I mean that the user must do something other than CREATE EXTENSION to load in libraries. Also, we are conflating local_preload_libraries with session_preload_libraries to simplify things in this blog post.

For example, if you installed the extension auto explain, then you may have a library file called auto_explain.so in your library directory, which can be found with pg_config --pkglibdir. Libraries are not always named exactly the same as the extension.

$ trunk install auto_explain

Using pkglibdir: /var/lib/postgresql/data/tembo/15/lib

[+] auto_explain.so => /var/lib/postgresql/data/tembo/15/lib
$ ls $(pg_config --pkglibdir) | grep auto_explain
auto_explain.so

Auto explain can be loaded into your session like LOAD 'auto_explain';. This command will always match exactly the name of the library file, less the file type, in this example .so. With a couple of configurations, now this extension will automatically log the EXPLAIN ANALYZE output for long-running queries.

postgres=# LOAD 'auto_explain';
LOAD

However, the LOAD command is not typically used directly, and many extensions require you do not load them in this way. Instead, typically the Postgres configuration shared_preload_libraries is used instead.

postgres=# LOAD 'pg_cron';
ERROR: pg_cron can only be loaded via shared_preload_libraries
HINT: Add pg_cron to the shared_preload_libraries configuration variable in postgresql.conf.

The best reason to use LOAD directly is for debugging. It can be nice to LOAD on-demand while troubleshooting.

info

What to do when an extension requires load:

Extensions that requires LOAD: true can always be configured in shared_preload_libraries, but this configuration requires a restart to take effect. Some extensions can be loaded without a restart using LOAD directly, but in this case it's usually better to use the session_preload_libraries configuration, and reload the Postgres configuration with SELECT pg_reload_conf();. You should run LOAD directly when you are intentionally loading for only the current connection.

CREATE EXTENSION

When you run CREATE EXTENSION, this basically just runs an extension's SQL script. The script will typically create new SQL objects such as functions, data types, operators and so on.

CREATE EXTENSION looks at the extension's control file, which is installed to the extension directory of sharedir.

$ trunk install pg_jsonschema;

Using pkglibdir: "/var/lib/postgresql/data/tembo/15/lib"
Using sharedir: "/var/lib/postgresql/data/tembo"

[+] pg_jsonschema.so => /var/lib/postgresql/data/tembo/15/lib
[+] extension/pg_jsonschema--0.1.4.sql => /var/lib/postgresql/data/tembo
[+] extension/pg_jsonschema.control => /var/lib/postgresql/data/tembo

sharedir can be located with pg_config --sharedir

$  ls $(pg_config --pkglibdir) | grep pg_jsonschema
pg_jsonschema.so

$ ls $(pg_config --sharedir)/extension | grep pg_jsonschema
pg_jsonschema--0.1.4.sql
pg_jsonschema.control

The information in a control file is used to determine what start up or upgrade scripts to run. We'll cover upgrades in-depth in a future blog, so let's focus on first-time enabling. For example, in the above installation output, we notice a file pg_jsonschema--0.1.4.sql. Postgres knows to run this because the name of the control file matches the name of the script suffixed by the default_version defined in the control file.

$ cat $(pg_config --sharedir)/extension/pg_jsonschema.control
comment = 'JSON schema validation on json and jsonb data types'
default_version = '0.1.4'
module_pathname = '$libdir/pg_jsonschema'
relocatable = false
superuser = true

When running CREATE EXTENSION, the extension name always matches exactly the name of a control file, less the .control file type.

postgres=# CREATE EXTENSION pg_jsonschema;
CREATE EXTENSION

I mentioned that a start up script creates new SQL, including new functions. For example in the case of pg_jsonschema, the start up script pg_jsonschema--0.1.4.sql includes the following SQL to create a new function called jsonb_matches_schema. Even though we have a library file, we don't need LOAD because CREATE FUNCTION is another way to load code from a file. This is an example of requires LOAD: false, requires CREATE EXTENSION: true.

CREATE FUNCTION ... AS 'obj_file' documentation

obj_file is the name of the shared library file containing the compiled [code]

CREATE FUNCTION "jsonb_matches_schema"(
"schema" json,
"instance" jsonb
) RETURNS bool
IMMUTABLE STRICT
LANGUAGE c
AS 'MODULE_PATHNAME', 'jsonb_matches_schema_wrapper';
info

You can always know whether or not an extension requires CREATE EXTENSION by the presence of a control file in $(pg_config --sharedir)/extension

Hooks that require a restart

An extension is in the category requires CREATE EXTENSION: true and requires LOAD: true if the extension has libraries that use hooks which require a restart and it has a control file.

You will be able to identify this is the case when the extension's documentation mentions both CREATE EXTENSION and shared_preload_libraries. Sometimes an error message or hint is provided if you run CREATE EXTENSION before loaded the library, or if you try to run LOAD directly, but you can't count on that.

For example, in the case of both pg_cron and pg_partman, there are a background workers. These are examples of extensions using hooks in the start up process of Postgres. So, in both of these cases the user is expected to configure shared_preload_libraries to start the background worker, then run CREATE EXTENSION on a cluster where that background worker is already running.

LOAD is needed when there isn't a control file

In the case of auto_explain, it uses hooks that do not require a restart. In this case, there is no control file and no extra SQL objects to be created. So LOAD is required simply because we have to load it into memory somehow. To demonstrate, it is technically possible to make a control file for auto_explain to allow for CREATE EXTENSION behavior instead of LOAD:

auto_explain.control:

comment = 'auto explain'
default_version = '0.0.1'
module_pathname = '$libdir/auto_explain'
relocatable = false
superuser = true

auto_explain--0.0.1.sql

LOAD 'auto_explain';
caution

In practice, do not use LOAD in an extension start up script to activate hooks. LOAD is only applicable for the current connection.

postgres=# CREATE EXTENSION auto_explain;
CREATE EXTENSION

postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
-----------------+---------+------------+------------------------------
auto_explain | 0.0.1 | public | auto explain
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)

postgres=# SET auto_explain.log_min_duration = 0;
SET
postgres=# SET auto_explain.log_analyze = true;
SET

After running the above, now my subsequent queries have their EXPLAIN ANALYZE logged.

So, if that could work, why not just have control files for all extensions?

Having a control file requires version upgrade handling.

When you have a control file, you also have to write upgrade scripts for every new version. In the case of pg_cron, we can find all these files in sharedir. When enabling version 1.5, it will run pg_cron--1.0.sql, then each migration script up to 1.5.

pg_cron--1.0--1.1.sql
pg_cron--1.0.sql
pg_cron--1.1--1.2.sql
pg_cron--1.2--1.3.sql
pg_cron--1.3--1.4.sql
pg_cron--1.4-1--1.5.sql
pg_cron--1.4--1.4-1.sql
pg_cron.control

Since that's not really applicable on auto_explain, because it's just logging outputs and there is nothing to migrate or handle between versions, it's just cleaner to not have a control file. Upgrading auto_explain only involves replacing the library, then loading it again.

info

Upgrade logic is not applicable for extensions that do not require CREATE EXTENSION. These cases just involve re-loading a new version of the library.

You don't load hooks during CREATE EXTENSION

It made sense to me for activating hooks that require a restart they have to be configured in shared_preload_libraries. But for extensions that do not require a restart, it's not obvious why the hooks can't just be loaded during the CREATE EXTENSION start up script like I just demonstrated is possible with auto_explain.

Even though it's technically possible to LOAD hooks during CREATE EXTENSION, it's a bad idea.

First of all, when using the LOAD command directly, it's only applicable to the current connection. So, in the above example with auto explain, the queries are only logged in the connection where I ran CREATE EXTENSION. To apply to all connections without a restart, it would need to go into session_preload_libraries. It is technically possible to do that inside of CREATE EXTENSION by doing ALTER SYSTEM SET session_preload_libraries then SELECT pg_reload_conf() in your start up script, but it is not a good approach for CREATE EXTENSION to automatically perform a configuration update. First of all it would confuse a user to change a config on the fly, and secondly there is currently no concept to automatically merge multi-value, comma-separated configurations like session_preload_libraries.

info

The 2x2 matrix makes it easier to understand how to enable an extension.

Just ask yourself "do I need to run CREATE EXTENSION?" determined by presence of a control file, and "do I need to do a LOAD?" determined by any mention of LOAD, shared_preload_libraries, or session_preload_libraries in the extension's documentation or an error message.

In all cases of needing a LOAD, you can get away with setting it in shared_preload_libraries. You can optimize to avoid restarts in some cases.

Output plugins

There are some extensions, for example wal2json that require neither CREATE EXTENSION or LOAD. In all known cases so far, these are output plugins. I think it's more of a stretch to call these 'extensions', but since they provide additional functionality to Postgres, that counts in my book.

In the case of output plugins, the library is loaded when a replication slot is created:

Postgresql documentation

SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);

Installing extensions

Some of the above examples use the free and open source Trunk project that Tembo created, which allows us to skip the build process. It also installs extension dependencies, and provides metadata about other dependencies. When I'm trying out extensions, I am starting from one of Tembo’s container images to handle the system dependencies installation.

I wrote this guide for trying out extensions locally. If you have any issues just reach out on our community Slack channel and we can help.

Perhaps it's not so bad after all...

files-in-computer

Automate everything

We want to make it possible to automatically install and turn on any Postgres extension. For this reason, we are seeking to qualify all known extensions by these two dimensions: requires CREATE EXTENSION true or false, and requires LOAD true or false.

To enable the community, that metadata is being published on Trunk. On Tembo Cloud, we leverage that information to automatically enable extensions. Currently, we've got this working for over 150 extensions.

Dear experts, tell me how I'm wrong (seriously!)

I'm serious that I want you to tell me where this is incorrect! If you're a Postgres extensions expert, or maybe just know a thing or two about extensions that seems to conflict with something in this blog, please reach out on X @sjmiller609 and let me know. Even if it's just minor correction or subjective information, I'd love to hear from you. I also want to hear if there is an easier mental model than this. I hope this blog can serve as a minimal yet comprehensive explanation of what it takes to get extensions turned on.

Another way to contribute is to click the "Edit this page" link below, and suggest changes. I will happily accept improvements to this blog.