Before working for a Postgres company, I had never used extensions.
Now, I'm part of a team working to fully automate turning on any extension. I didn't find great resources to explain the process of turning on an extension, and why it varies between different extensions.
I want to share my mental model for the different types of extensions, how to know what type of extension you're working with, and how to get it turned on.
Turn on an extension
What's traditionally involved:
- Find the extension you want
- Figure out how to build it
- Sometimes, installation of dependencies (for example with apt-get or yum)
- Sometimes, installation of other extensions (goto ‘figure out how to build it’)
- Install your extension
- Sometimes, load a library
- Sometimes, provide extension-specific configurations
- Sometimes, run the
Building and installing extensions is well covered by other resources. In this blog, I want to focus on steps to get an extension up and running after it's installed, and how I believe that all extensions fit into four mostly-tidy categories.
Extensions consist of SQL and / or libraries.
SQL objects, let's just call it SQL, are extensions of SQL, for example new functions and data types. These are often implemented by a library, but can also be implemented in other ways, for example using a procedural language like PL/pgSQL.
Hooks: A Postgres feature informally called hooks can be used to connect into Postgres' existing functionality. Hooks allow for overwriting default Postgres functionality, or calling back into an extension's code at the appropriate time. For example, one type of hook can modify Postgres start up behavior to launch a background worker, and a different type of hook can be used to redirect queries to a different table.
Sometimes extensions are instead referred to as 'modules', but I like to simply refer to everything as an 'extension', but feel free to @ me on X to tell me why that's wrong (@sjmiller609).
Enter the matrix
A big part of what I have been working on is fully automating enabling any extension. In order to do that, we have to understand exactly how extensions vary. We can break it down into a 2x2 matrix by defining two boolean categories.
|Requires ||Does not require |
|Requires ||Extensions that use SQL and their libraries have hooks||Extensions that do not use SQL, may or may not have hooks|
|Does not require ||SQL-only extensions, and SQL + libraries without hooks||Output plugins|
By categorizing an extension into this 2x2 matrix, we can know how to turn it on.
LOAD is a command that tells Postgres to load a library, meaning make the code accessible to Postgres by loading the compiled code on disk into memory. If a library has hooks, performing a load will activate the hooks.
LOAD: true means you have to do one of the following steps to load a library:
- LOAD: using the
LOADcommand directly loads a library for the current connection only
- session_preload_libraries: configuration, specifies which libraries to LOAD for new connections
- shared_preload_libraries: configuration, specifies which libraries to LOAD at server start, and therefore requires a restart
Even though code is loaded in other ways during
CREATE EXTENSION, that is not requires
LOAD: true under this definition. I mean that the user must do something other than
CREATE EXTENSION to load in libraries. Also, we are conflating local_preload_libraries with
session_preload_libraries to simplify things in this blog post.
For example, if you installed the extension auto explain, then you may have a library file called
auto_explain.so in your library directory, which can be found with pg_config --pkglibdir. Libraries are not always named exactly the same as the extension.
$ trunk install auto_explain
Using pkglibdir: /var/lib/postgresql/data/tembo/15/lib
[+] auto_explain.so => /var/lib/postgresql/data/tembo/15/lib
$ ls $(pg_config --pkglibdir) | grep auto_explain
Auto explain can be loaded into your session like
LOAD 'auto_explain';. This command will always match exactly the name of the library file, less the file type, in this example
.so. With a couple of configurations, now this extension will automatically log the EXPLAIN ANALYZE output for long-running queries.
postgres=# LOAD 'auto_explain';
LOAD command is not typically used directly, and many extensions require you do not load them in this way. Instead, typically the Postgres configuration shared_preload_libraries is used instead.
postgres=# LOAD 'pg_cron';
ERROR: pg_cron can only be loaded via shared_preload_libraries
HINT: Add pg_cron to the shared_preload_libraries configuration variable in postgresql.conf.
The best reason to use
LOAD directly is for debugging. It can be nice to
LOAD on-demand while troubleshooting.
What to do when an extension requires load:
Extensions that requires
LOAD: true can always be configured in
shared_preload_libraries, but this configuration requires a restart to take effect. Some extensions can be loaded without a restart using
LOAD directly, but in this case it's usually better to use the
session_preload_libraries configuration, and reload the Postgres configuration with
SELECT pg_reload_conf();. You should run
LOAD directly when you are intentionally loading for only the current connection.
When you run CREATE EXTENSION, this basically just runs an extension's SQL script. The script will typically create new SQL objects such as functions, data types, operators and so on.
CREATE EXTENSION looks at the extension's control file, which is installed to the
extension directory of sharedir.
$ trunk install pg_jsonschema;
Using pkglibdir: "/var/lib/postgresql/data/tembo/15/lib"
Using sharedir: "/var/lib/postgresql/data/tembo"
[+] pg_jsonschema.so => /var/lib/postgresql/data/tembo/15/lib
[+] extension/pg_jsonschema--0.1.4.sql => /var/lib/postgresql/data/tembo
[+] extension/pg_jsonschema.control => /var/lib/postgresql/data/tembo
sharedir can be located with
$ ls $(pg_config --pkglibdir) | grep pg_jsonschema
$ ls $(pg_config --sharedir)/extension | grep pg_jsonschema
The information in a control file is used to determine what start up or upgrade scripts to run. We'll cover upgrades in-depth in a future blog, so let's focus on first-time enabling. For example, in the above installation output, we notice a file
pg_jsonschema--0.1.4.sql. Postgres knows to run this because the name of the control file matches the name of the script suffixed by the
default_version defined in the control file.
$ cat $(pg_config --sharedir)/extension/pg_jsonschema.control
comment = 'JSON schema validation on json and jsonb data types'
default_version = '0.1.4'
module_pathname = '$libdir/pg_jsonschema'
relocatable = false
superuser = true
CREATE EXTENSION, the extension name always matches exactly the name of a control file, less the
.control file type.
postgres=# CREATE EXTENSION pg_jsonschema;
I mentioned that a start up script creates new SQL, including new functions. For example in the case of pg_jsonschema, the start up script
pg_jsonschema--0.1.4.sql includes the following SQL to create a new function called
jsonb_matches_schema. Even though we have a library file, we don't need
CREATE FUNCTION is another way to load code from a file. This is an example of requires
LOAD: false, requires
CREATE EXTENSION: true.
obj_file is the name of the shared library file containing the compiled [code]
CREATE FUNCTION "jsonb_matches_schema"(
) RETURNS bool
AS 'MODULE_PATHNAME', 'jsonb_matches_schema_wrapper';
You can always know whether or not an extension requires
CREATE EXTENSION by the presence of a control file in
Hooks that require a restart
An extension is in the category requires
CREATE EXTENSION: true and requires
LOAD: true if the extension has libraries that use hooks which require a restart and it has a control file.
You will be able to identify this is the case when the extension's documentation mentions both
CREATE EXTENSION and
shared_preload_libraries. Sometimes an error message or hint is provided if you run
CREATE EXTENSION before loaded the library, or if you try to run
LOAD directly, but you can't count on that.
For example, in the case of both
pg_partman, there are a background workers. These are examples of extensions using hooks in the start up process of Postgres. So, in both of these cases the user is expected to configure
shared_preload_libraries to start the background worker, then run
CREATE EXTENSION on a cluster where that background worker is already running.
LOAD is needed when there isn't a control file
In the case of auto_explain, it uses hooks that do not require a restart. In this case, there is no control file and no extra SQL objects to be created. So
LOAD is required simply because we have to load it into memory somehow. To demonstrate, it is technically possible to make a control file for auto_explain to allow for
CREATE EXTENSION behavior instead of
comment = 'auto explain'
default_version = '0.0.1'
module_pathname = '$libdir/auto_explain'
relocatable = false
superuser = true
In practice, do not use
LOAD in an extension start up script to activate hooks.
LOAD is only applicable for the current connection.
postgres=# CREATE EXTENSION auto_explain;
List of installed extensions
Name | Version | Schema | Description
auto_explain | 0.0.1 | public | auto explain
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
postgres=# SET auto_explain.log_min_duration = 0;
postgres=# SET auto_explain.log_analyze = true;
After running the above, now my subsequent queries have their
EXPLAIN ANALYZE logged.
So, if that could work, why not just have control files for all extensions?
Having a control file requires version upgrade handling.
When you have a control file, you also have to write upgrade scripts for every new version. In the case of pg_cron, we can find all these files in sharedir. When enabling version 1.5, it will run
pg_cron--1.0.sql, then each migration script up to 1.5.
Since that's not really applicable on auto_explain, because it's just logging outputs and there is nothing to migrate or handle between versions, it's just cleaner to not have a control file. Upgrading auto_explain only involves replacing the library, then loading it again.
Upgrade logic is not applicable for extensions that do not require
CREATE EXTENSION. These cases just involve re-loading a new version of the library.
You don't load hooks during CREATE EXTENSION
It made sense to me for activating hooks that require a restart they have to be configured in
shared_preload_libraries. But for extensions that do not require a restart, it's not obvious why the hooks can't just be loaded during the
CREATE EXTENSION start up script like I just demonstrated is possible with auto_explain.
Even though it's technically possible to
LOAD hooks during
CREATE EXTENSION, it's a bad idea.
First of all, when using the
LOAD command directly, it's only applicable to the current connection. So, in the above example with auto explain, the queries are only logged in the connection where I ran
CREATE EXTENSION. To apply to all connections without a restart, it would need to go into
session_preload_libraries. It is technically possible to do that inside of
CREATE EXTENSION by doing
ALTER SYSTEM SET session_preload_libraries then
SELECT pg_reload_conf() in your start up script, but it is not a good approach for
CREATE EXTENSION to automatically perform a configuration update. First of all it would confuse a user to change a config on the fly, and secondly there is currently no concept to automatically merge multi-value, comma-separated configurations like
The 2x2 matrix makes it easier to understand how to enable an extension.
Just ask yourself "do I need to run
CREATE EXTENSION?" determined by presence of a control file, and "do I need to do a
LOAD?" determined by any mention of
session_preload_libraries in the extension's documentation or an error message.
In all cases of needing a
LOAD, you can get away with setting it in
shared_preload_libraries. You can optimize to avoid restarts in some cases.
There are some extensions, for example wal2json that require neither
CREATE EXTENSION or
LOAD. In all known cases so far, these are output plugins. I think it's more of a stretch to call these 'extensions', but since they provide additional functionality to Postgres, that counts in my book.
In the case of output plugins, the library is loaded when a replication slot is created:
SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);
Some of the above examples use the free and open source Trunk project that Tembo created, which allows us to skip the build process. It also installs extension dependencies, and provides metadata about other dependencies. When I'm trying out extensions, I am starting from one of Tembo’s container images to handle the system dependencies installation.
Perhaps it's not so bad after all...
We want to make it possible to automatically install and turn on any Postgres extension. For this reason, we are seeking to qualify all known extensions by these two dimensions: requires
CREATE EXTENSION true or false, and requires
LOAD true or false.
To enable the community, that metadata is being published on Trunk. On Tembo Cloud, we leverage that information to automatically enable extensions. Currently, we've got this working for over 150 extensions.
Dear experts, tell me how I'm wrong (seriously!)
I'm serious that I want you to tell me where this is incorrect! If you're a Postgres extensions expert, or maybe just know a thing or two about extensions that seems to conflict with something in this blog, please reach out on X @sjmiller609 and let me know. Even if it's just minor correction or subjective information, I'd love to hear from you. I also want to hear if there is an easier mental model than this. I hope this blog can serve as a minimal yet comprehensive explanation of what it takes to get extensions turned on.
Another way to contribute is to click the "Edit this page" link below, and suggest changes. I will happily accept improvements to this blog.