Enter the matrix: the four types of Postgres extensions

Sep 14, 2023 12 min read
blog post hero image

Before working for a Postgres company, I had never used extensions.

Now, I’m part of a team working to fully automate turning on any extension. I didn’t find great resources to explain the process of turning on an extension, and why it varies between different extensions.

I want to share my mental model for the different types of extensions, how to know what type of extension you’re working with, and how to get it turned on.

Turn on an extension

one-does-not-simply

What’s traditionally involved:

  • Find the extension you want
  • Figure out how to build it
  • Sometimes, installation of dependencies (for example with apt-get or yum)
  • Sometimes, installation of other extensions (goto ‘figure out how to build it’)
  • Install your extension
  • Sometimes, load a library
  • Sometimes, provide extension-specific configurations
  • Sometimes, run the CREATE EXTENSION command

Building and installing extensions is well covered by other resources. In this blog, I want to focus on steps to get an extension up and running after it’s installed, and how I believe that all extensions fit into four mostly-tidy categories.

Terminology

Extensions consist of SQL and / or libraries.

A library simply means compiled code, for example written in C or Rust.

SQL objects, let’s just call it SQL, are extensions of SQL, for example new functions and data types. These are often implemented by a library, but can also be implemented in other ways, for example using a procedural language like PL/pgSQL.

Hooks: A Postgres feature informally called hooks can be used to connect into Postgres’ existing functionality. Hooks allow for overwriting default Postgres functionality, or calling back into an extension’s code at the appropriate time. For example, one type of hook can modify Postgres start up behavior to launch a background worker, and a different type of hook can be used to redirect queries to a different table.

info

Sometimes extensions are instead referred to as ‘modules’, but I like to simply refer to everything as an ‘extension’, but feel free to @ me on X to tell me why that’s wrong (@sjmiller609).

Enter the matrix

im-in-matrix

A big part of what I have been working on is fully automating enabling any extension. In order to do that, we have to understand exactly how extensions vary. We can break it down into a 2x2 matrix by defining two boolean categories.

Requires LOAD true or false and requires CREATE EXTENSION true or false:

Requires CREATE EXTENSIONDoes not require CREATE EXTENSION
Requires LOADExtensions that use SQL and their libraries have hooksExtensions that do not use SQL, may or may not have hooks
Does not require LOADSQL-only extensions, and SQL + libraries without hooksOutput plugins

By categorizing an extension into this 2x2 matrix, we can know how to turn it on.

LOAD

LOAD is a command that tells Postgres to load a library, meaning make the code accessible to Postgres by loading the compiled code on disk into memory. If a library has hooks, performing a load will activate the hooks.

Requires LOAD: true means you have to do one of the following steps to load a library:

  • LOAD: using the LOAD command directly loads a library for the current connection only
  • session_preload_libraries: configuration, specifies which libraries to LOAD for new connections
  • shared_preload_libraries: configuration, specifies which libraries to LOAD at server start, and therefore requires a restart
info

Even though code is loaded in other ways during CREATE EXTENSION, that is not requires LOAD: true under this definition. I mean that the user must do something other than CREATE EXTENSION to load in libraries. Also, we are conflating local_preload_libraries with session_preload_libraries to simplify things in this blog post.

For example, if you installed the extension auto explain, then you may have a library file called auto_explain.so in your library directory, which can be found with pg_config —pkglibdir. Libraries are not always named exactly the same as the extension.

$ trunk install auto_explain

Using pkglibdir: /var/lib/postgresql/data/tembo/15/lib

[+] auto_explain.so => /var/lib/postgresql/data/tembo/15/lib
$ ls $(pg_config --pkglibdir) | grep auto_explain
auto_explain.so

Auto explain can be loaded into your session like LOAD 'auto_explain';. This command will always match exactly the name of the library file, less the file type, in this example .so. With a couple of configurations, now this extension will automatically log the EXPLAIN ANALYZE output for long-running queries.

postgres=# LOAD 'auto_explain';
LOAD

However, the LOAD command is not typically used directly, and many extensions require you do not load them in this way. Instead, typically the Postgres configuration shared_preload_libraries is used instead.

postgres=# LOAD 'pg_cron';
ERROR:  pg_cron can only be loaded via shared_preload_libraries
HINT:  Add pg_cron to the shared_preload_libraries configuration variable in postgresql.conf.

The best reason to use LOAD directly is for debugging. It can be nice to LOAD on-demand while troubleshooting.

info

What to do when an extension requires load:

Extensions that requires LOAD: true can always be configured in shared_preload_libraries, but this configuration requires a restart to take effect. Some extensions can be loaded without a restart using LOAD directly, but in this case it’s usually better to use the session_preload_libraries configuration, and reload the Postgres configuration with SELECT pg_reload_conf();. You should run LOAD directly when you are intentionally loading for only the current connection.

CREATE EXTENSION

When you run CREATE EXTENSION, this basically just runs an extension’s SQL script. The script will typically create new SQL objects such as functions, data types, operators and so on.

CREATE EXTENSION looks at the extension’s control file, which is installed to the extension directory of sharedir.

$ trunk install pg_jsonschema;

Using pkglibdir: "/var/lib/postgresql/data/tembo/15/lib"
Using sharedir: "/var/lib/postgresql/data/tembo"

[+] pg_jsonschema.so => /var/lib/postgresql/data/tembo/15/lib
[+] extension/pg_jsonschema--0.1.4.sql => /var/lib/postgresql/data/tembo
[+] extension/pg_jsonschema.control => /var/lib/postgresql/data/tembo

sharedir can be located with pg_config --sharedir

$  ls $(pg_config --pkglibdir) | grep pg_jsonschema
pg_jsonschema.so

$ ls $(pg_config --sharedir)/extension | grep pg_jsonschema
pg_jsonschema--0.1.4.sql
pg_jsonschema.control

The information in a control file is used to determine what start up or upgrade scripts to run. We’ll cover upgrades in-depth in a future blog, so let’s focus on first-time enabling. For example, in the above installation output, we notice a file pg_jsonschema--0.1.4.sql. Postgres knows to run this because the name of the control file matches the name of the script suffixed by the default_version defined in the control file.

$ cat $(pg_config --sharedir)/extension/pg_jsonschema.control
comment = 'JSON schema validation on json and jsonb data types'
default_version = '0.1.4'
module_pathname = '$libdir/pg_jsonschema'
relocatable = false
superuser = true

When running CREATE EXTENSION, the extension name always matches exactly the name of a control file, less the .control file type.

postgres=# CREATE EXTENSION pg_jsonschema;
CREATE EXTENSION

I mentioned that a start up script creates new SQL, including new functions. For example in the case of pg_jsonschema, the start up script pg_jsonschema--0.1.4.sql includes the following SQL to create a new function called jsonb_matches_schema. Even though we have a library file, we don’t need LOAD because CREATE FUNCTION is another way to load code from a file. This is an example of requires LOAD: false, requires CREATE EXTENSION: true.

CREATE FUNCTION … AS ‘obj_file’ documentation

obj_file is the name of the shared library file containing the compiled [code]

CREATE FUNCTION "jsonb_matches_schema"(
        "schema" json,
        "instance" jsonb
) RETURNS bool
IMMUTABLE STRICT
LANGUAGE c
AS 'MODULE_PATHNAME', 'jsonb_matches_schema_wrapper';
info

You can always know whether or not an extension requires CREATE EXTENSION by the presence of a control file in $(pg_config --sharedir)/extension

Hooks that require a restart

An extension is in the category requires CREATE EXTENSION: true and requires LOAD: true if the extension has libraries that use hooks which require a restart and it has a control file.

You will be able to identify this is the case when the extension’s documentation mentions both CREATE EXTENSION and shared_preload_libraries. Sometimes an error message or hint is provided if you run CREATE EXTENSION before loaded the library, or if you try to run LOAD directly, but you can’t count on that.

For example, in the case of both pg_cron and pg_partman, there are a background workers. These are examples of extensions using hooks in the start up process of Postgres. So, in both of these cases the user is expected to configure shared_preload_libraries to start the background worker, then run CREATE EXTENSION on a cluster where that background worker is already running.

LOAD is needed when there isn’t a control file

In the case of auto_explain, it uses hooks that do not require a restart. In this case, there is no control file and no extra SQL objects to be created. So LOAD is required simply because we have to load it into memory somehow. To demonstrate, it is technically possible to make a control file for auto_explain to allow for CREATE EXTENSION behavior instead of LOAD:

auto_explain.control:

comment = 'auto explain'
default_version = '0.0.1'
module_pathname = '$libdir/auto_explain'
relocatable = false
superuser = true

auto_explain—0.0.1.sql

LOAD 'auto_explain';
warning

In practice, do not use LOAD in an extension start up script to activate hooks. LOAD is only applicable for the current connection.

postgres=# CREATE EXTENSION auto_explain;
CREATE EXTENSION

postgres=# \dx
                     List of installed extensions
      Name       | Version |   Schema   |         Description
-----------------+---------+------------+------------------------------
 auto_explain    | 0.0.1   | public     | auto explain
 plpgsql         | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

postgres=# SET auto_explain.log_min_duration = 0;
SET
postgres=# SET auto_explain.log_analyze = true;
SET

After running the above, now my subsequent queries have their EXPLAIN ANALYZE logged.

So, if that could work, why not just have control files for all extensions?

Having a control file requires version upgrade handling.

When you have a control file, you also have to write upgrade scripts for every new version. In the case of pg_cron, we can find all these files in sharedir. When enabling version 1.5, it will run pg_cron--1.0.sql, then each migration script up to 1.5.

pg_cron--1.0--1.1.sql
pg_cron--1.0.sql
pg_cron--1.1--1.2.sql
pg_cron--1.2--1.3.sql
pg_cron--1.3--1.4.sql
pg_cron--1.4-1--1.5.sql
pg_cron--1.4--1.4-1.sql
pg_cron.control

Since that’s not really applicable on auto_explain, because it’s just logging outputs and there is nothing to migrate or handle between versions, it’s just cleaner to not have a control file. Upgrading auto_explain only involves replacing the library, then loading it again.

info

Upgrade logic is not applicable for extensions that do not require CREATE EXTENSION. These cases just involve re-loading a new version of the library.

You don’t load hooks during CREATE EXTENSION

It made sense to me for activating hooks that require a restart they have to be configured in shared_preload_libraries. But for extensions that do not require a restart, it’s not obvious why the hooks can’t just be loaded during the CREATE EXTENSION start up script like I just demonstrated is possible with auto_explain.

Even though it’s technically possible to LOAD hooks during CREATE EXTENSION, it’s a bad idea.

First of all, when using the LOAD command directly, it’s only applicable to the current connection. So, in the above example with auto explain, the queries are only logged in the connection where I ran CREATE EXTENSION. To apply to all connections without a restart, it would need to go into session_preload_libraries. It is technically possible to do that inside of CREATE EXTENSION by doing ALTER SYSTEM SET session_preload_libraries then SELECT pg_reload_conf() in your start up script, but it is not a good approach for CREATE EXTENSION to automatically perform a configuration update. First of all it would confuse a user to change a config on the fly, and secondly there is currently no concept to automatically merge multi-value, comma-separated configurations like session_preload_libraries.

info

The 2x2 matrix makes it easier to understand how to enable an extension.

Just ask yourself “do I need to run CREATE EXTENSION?” determined by presence of a control file, and “do I need to do a LOAD?” determined by any mention of LOAD, shared_preload_libraries, or session_preload_libraries in the extension’s documentation or an error message.

In all cases of needing a LOAD, you can get away with setting it in shared_preload_libraries. You can optimize to avoid restarts in some cases.

Output plugins

There are some extensions, for example wal2json that require neither CREATE EXTENSION or LOAD. In all known cases so far, these are output plugins. I think it’s more of a stretch to call these ‘extensions’, but since they provide additional functionality to Postgres, that counts in my book.

In the case of output plugins, the library is loaded when a replication slot is created:

Postgresql documentation

SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);

Installing extensions

Some of the above examples use the free and open source Trunk project that Tembo created, which allows us to skip the build process. It also installs extension dependencies, and provides metadata about other dependencies. When I’m trying out extensions, I am starting from one of Tembo’s container images to handle the system dependencies installation.

I wrote this guide for trying out extensions locally. If you have any issues just reach out on our community Slack channel and we can help.

Perhaps it’s not so bad after all…

files-in-computer

Automate everything

We want to make it possible to automatically install and turn on any Postgres extension. For this reason, we are seeking to qualify all known extensions by these two dimensions: requires CREATE EXTENSION true or false, and requires LOAD true or false.

To enable the community, that metadata is being published on Trunk. On Tembo Cloud, we leverage that information to automatically enable extensions. Currently, we’ve got this working for over 150 extensions.

Dear experts, tell me how I’m wrong (seriously!)

I’m serious that I want you to tell me where this is incorrect! If you’re a Postgres extensions expert, or maybe just know a thing or two about extensions that seems to conflict with something in this blog, please reach out on X @sjmiller609 and let me know. Even if it’s just minor correction or subjective information, I’d love to hear from you. I also want to hear if there is an easier mental model than this. I hope this blog can serve as a minimal yet comprehensive explanation of what it takes to get extensions turned on.

Another way to contribute is to click the “Edit this page” link below, and suggest changes. I will happily accept improvements to this blog.