Skip to main content

7 posts tagged with "extensions"

View All Tags

· 11 min read
Binidxaba

An elephant representing pgvector

Image credit: Generated with Bing Image Creator

In databases, indexes are data structures that speed up finding specific values in a table column. The analogous task in vector databases consists of finding the (approximate) nearest-neighbors of a given vector. So, to accomplish this task fast, we can similarly create specialized vector indexes.

But, speeding up a query is not just about blindly creating an index. When deciding whether to create one or more indexes on a table, several factors need to be considered—for example, the size of the tables, whether the table is modified frequently, how the table is used in queries, and so on. Similar considerations apply to vector indexes.

In today's post, let us explore vector indexes and their tradeoffs in the context of Postgres and Pgvector. In particular, let us compare their build time, size, and speed, and, based on that, derive some guidelines to decide which one to choose for a given application.

Indexes in Pgvector

Pgvector is an open-source Postgres extension for similarity search. It allows for exact and approximate nearest-neighbor search. In particular, for ANN it offers two types of indexes: IVFFlat and HNSW. Let us briefly discuss them.

IVFFlat

The IVFFlat (Inverted File with Flat Compression) index works by dividing the vectors in the table into multiple lists. The algorithm calculates a number of centroids and finds the clusters around those centroids. So, there is a list for each centroid, and the elements of these lists are the vectors that make up its corresponding cluster.

When searching for the K nearest vectors, instead of calculating the distance to all vectors, the search space is narrowed to only a subset of the lists, thus reducing the number of computations. Which lists are the candidates? The ones whose centroid is closer to the search vector.

IVFFlat

IVFFlat generates lists based on clusters.

So, we can infer that the effectiveness of the index depends on two parameters: the number/size of the lists and the number of lists that need to be examined during the search (aka probes).

In pgvector, these two parameters are selected in two distinct moments. First, the number of lists is chosen when creating the index, for example:

CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 1000)

Second, the number of lists to be explored is established during execution, e.g.:

SET ivfflat.probes = 32

The pgvector documentation suggests the following:

Choose an appropriate number of lists - a good place to start is rows / 1000 for up to 1M rows and sqrt(rows) for over 1M rows

When querying, specify an appropriate number of probes (higher is better for recall, lower is better for speed) - a good place to start is sqrt(lists)

So, imagine that we have a dataset of 1M vectors. With the parameters above, pgvector would generate 1,000 lists of approximately 1,000 vectors. When executing a query, it would only query ~32 of such lists and execute ~32,000 comparisons to find the closest neighbors to a search vector. That is, only 0.032X compared to a full scan.

Of course, you can choose different parameters to achieve the desired recall. More on that later in this post.

HNSW

The Hierarchical Navigable Small Worlds (HNSW) index creates a graph with multiple layers. The nodes in the graph represent vectors, and the links represent distances. Finding the ANN consists of traversing the graph and looking for the shorter distances.

We can think of these layers as different zoom levels of the graph. Zooming out, we see a few nodes (vectors) and links. But as we zoom in, we see more and more nodes and more and more links.

The traversal of the graph resembles a skip list in that if no more candidate nodes are found in the current layer of the graph, the search continues in the next layer (zoom in), where more links should exist.

HNSW

HNSW creates a graph with multiple layers

For HNSW, two tuning parameters are decided at creation time: the maximum number of connections per layer (m) and the size of the dynamic candidate list for constructing the graph (ef_construction):

CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64)

A bigger value for m means that each node would have more links in each layer of the graph. A big m affects negatively during query time since more connections need to be checked, but it also improves recall.

ef_construction is also used during the build phase of the index, and it indicates the entry points in layer i+1. That means that bigger values would lead to heavier indexes.

Pgvector does not mention any particular recommendation for HNSW, but the defaults are m=16 and ef_construction=64.

Another parameter, ef_search, determines the size of the dynamic candidate list of vectors. In pgvector, the default is ef_search=40, but can be modified as follows at query time:

SET hnsw.ef_search = 100;

Impact of Approximation on Recall

In the previous sections, I mentioned a couple of times that the build parameters affect recall. What do we mean by that?

Well, Recall measures how many retrieved neighbors are indeed in the true kNN group. For example, a recall of 1.0 means that all calculated neighbors are really the closest. Whereas a recall of 0.5 means that only half of the computed neighbors are the closest. Recall is an important metric because it helps measure the approximation errors and tune the index parameters.

IVFFlat and HNSW are approximate indexes that work with heuristics. That means that there could be errors in their search results.

Take IVFFlat as an example. When deciding which lists to scan, the decision is taken based on the distance to the centroid of the list. Depending on the data and the tuning parameters, the closest vector to the search vector could correspond to a list that was not selected for probing, thus reducing the accuracy of the result.

One way to mitigate this problem and boost recall is to increase the number of lists to probe. But that, of course, would incur a performance penalty. So, improving the recall is not for free, and careful evaluation and tuning of the parameters is paramount.

IVFFlat vs HNSW in the pgvector arena

Now, let us examine the two types of indexes quantitatively. We will use the ANN benchmark, which we modified to have both algorithms available.

For pgvector, the benchmark creates a table with a vector column, taking the chosen dataset and inserting the items into the table. Then, it builds the selected type of index, and after that, it executes a bunch of queries. The dataset contains both train and test data. The benchmarking program uses the test data to evaluate the recall for each algorithm.

For this comparison, let us use a small dataset of around 1M vectors of 50 dimensions. The test set consists of 10K queries, and we want to obtain the 10 nearest neighbors.

The aspects that we want to evaluate are:

  • Build Time
  • Size
  • Recall
  • Speed

For these experiments, let us ask ourselves: How are the different parameters affected if I want a recall of X? In particular, let us set a recall of 0.998.

In pgvector, such recall is achieved with the following settings:

Index typeParameters
IVFFlatLists = 200, Probes = 100
HNSWm = 24, ef_construction = 200, ef_search = 800

Build Time

For the chosen parameters, IVFFlat indexes can be created quicker (128 seconds) compared to HNSW (4065 seconds). HNSW creation is almost 32X slower.

Build time

Size

In terms of index size, IVFFlat is again the winner. For a recall of 0.998, IVFFlat requires around 257MB, whereas HNSW requires about 729MB. HNSW requires 2.8X more space.

Index size

Speed

note

The benchmark uses one thread to execute the vector queries.

It is in speed where HNSW shines. With a recall of 0.998, HNSW can achieve a throughput of 40.5 QPS, whereas IVFFlat can only execute 2.6 QPS. HNSW is 15.5X better in this aspect.

Queries per second

Recall vs Index Updates

Another weakness of the IVFFlat index is that it is not resilient to index updates in terms of recall. The reason for that is that the centroids are not recalculated. So, the different regions in the vector space remain the same if vectors are added or removed. If, after a series of updates, the real centroids (if they were recomputed) are different, the previous mapping would be less effective, leading to a worse recall.

In fact, in psql you will get the following messages if you attempt to create an IVFFlat index when there are only a few rows in the table:

postgres=# CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
CREATE TABLE

postgres=# INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
INSERT 0 2

postgres=# CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
NOTICE: ivfflat index created with little data
DETAIL: This will cause low recall.
HINT: Drop the index until the table has more data.
CREATE INDEX

postgres=# drop index items_embedding_idx;
DROP INDEX

The solution to this problem would be to recalculate the centroids and the lists... which effectively means rebuilding the index.

HNSW doesn't show that:

postgres=# CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
CREATE INDEX

To exercise this behavior, I modified the benchmark to set up the database as follows:

  1. Insert 1/100 of the training dataset
  2. Build the index
  3. Insert the rest of the training dataset
  4. Execute the queries using the test dataset

For the chosen parameters, we can see that the recall is more sensitive to index updates for IVFFlat. For HNSW, the change is negligible.

IVFFlat recall

HNSW recall

Picking the right index for your use case

As a quick summary, these were there results that we obtained from our experiments:

IVFFlatHNSW
Build Time (in seconds)1284,065
Size (in MB)257729
Speed (in QPS)2.640.5
Change in Recall upon updatesSignificantNegligible

With the results above, we can then make the following recommendations:

  • If you care more about index size, then choose IVFFlat.
  • If you care more about index build time, then select IVFFlat.
  • If you care more about speed, then choose HNSW.
  • If you expect vectors to be added or modified, then select HNSW.

Let us see some examples.

Imagine a case of a database of Constellations for Astronomy. Data updates would be infrequent (inverse of guideline #4), so IVFFlat would be a good candidate. The index would remain of modest size (guideline #1) and give good recall by tuning only two parameters.

Let's take another example. Imagine a system of Facial Recognition. You'd likely want a fast response time (guideline #2) with good accuracy. You may also be OK with the size of the index (inverse of guideline #1). So, HNSW would be the best choice.

The case of an IoT Sensor Data Database where read values keep changing (e.g., temperature, position, etc.) would also be a good candidate for HNSW (guideline #4). IVFFlat could not handle the index changes properly.

Finally, imagine a database of vector embeddings obtained from your company's knowledge base to generate a chatbot. If the knowledge base rarely changes (inverse of guideline #4) and rebuilding the index is acceptable (if recall ever degrades) (guideline #3), you may choose IVFFlat.

Wrapping up…

In this post, we discussed the two types of indexes currently available in pgvector: IVFFlat and HNSW. We discussed their build parameters and how they affect recall.

With the help of a benchmark, we compared the indexes quantitatively in terms of build time, index size, QPS, and recall. We derived some general guidelines for choosing the appropriate index type based on our results.

I invite everyone to try out the benchmarking program, which can be found here, and the modified version, which is here.

What other elements should we consider when choosing a vector index? Let us know your thoughts at @tembo_io.

Appendix

The experiments in this post were carried out using a machine with the following characteristics:

CPUIntel(R) Core(TM) i7-8565U CPU @ 1.80GHz 8th Generation
Number of Cores4 (8 threads)
CacheLevel 1: 256KiB write-back, Level 2: 1MiB write-back, Level 3: 8MiB write-back
Memory16 GB SODIMM DDR4 2400MHz
Disk150GB SSD
Operating SystemUbuntu 22.04.3 LTS, Linux 5.15.0-79-generic (x86_64)
Postgres14

· 3 min read
Steven Miller

Someone on X wanted to know how to get an anonymous dump of Postgres data, but doesn't want to install an extension in their production DB. I want to show how you can start a local database, dump the production data there, then do an anonymized dump from that without too much hassle.

Getting set up

Dockerfile:

FROM quay.io/tembo/tembo-local:latest

# Install extensions from Trunk
RUN trunk install pgcrypto
RUN trunk install postgresql_anonymizer

# Setting samples to use for anonymization
RUN cd /var/lib/postgresql/data/tembo/extension/anon && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/lorem_ipsum.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/identifiers_category.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/identifier_fr_FR.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/identifier_en_US.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/address.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/city.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/company.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/country.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/email.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/first_name.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/iban.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/last_name.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/postcode.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/siret.csv && \
wget https://gitlab.com/dalibo/postgresql_anonymizer/-/raw/master/data/lorem_ipsum.csv

Build and run it like this:

docker build -t example-local-image .
docker rm --force local-tembo || true
docker run -it --name local-tembo -p 5432:5432 --rm example-local-image

This guide covers how to quickly try out Postgres extensions locally in more detail.

Dump the data into your local DB

pg_dump 'your-connection-string-here' | psql 'postgres://postgres:postgres@localhost:5432'

Anonymize the local DB

Initialize the extension:

SET session_preload_libraries = 'anon';
LOAD 'anon';
CREATE EXTENSION IF NOT EXISTS anon CASCADE;
SELECT anon.init();

For example, I have a table called "extension_owners", and I would like to anonymize the user_name column:

postgres=# select extension_id,user_name from extension_owners limit 1;
extension_id | user_name
--------------+---------------------
26 | coredb-service-user
(1 row)

I configured anonymization on that column like this:

SECURITY LABEL FOR anon ON COLUMN extension_owners.user_name
IS 'MASKED WITH FUNCTION anon.lorem_ipsum( words := 1 )';

There are a lot of other options for anonymizing data, and you can even write your own functions. More information in these docs.

caution

This next step replaces data in the local database.

Since we are working on a local copy of the data, we can just use this function to replace anonymized columns in-place.

SELECT anon.anonymize_database();

We can see now this column has been anonymized.

postgres=# select user_name from extension_owners limit 10;
user_name
------------
First
They
Cell
Price
Them
Go
Parent
Republican
With
Between
(10 rows)

You can do further modification from here, for example masking and replacing additional columns, formatting columns, etc.

Done!

Now you have an anonymized database locally. From here, you can pg_dump to a file, or do something else!

If you think this kind of thing is cool, follow me on X (@sjmiller609) for more content. At Tembo, we are all about Postgres extensions. You can try out extensions on Tembo Cloud for free.

· 8 min read
Binidxaba

Language models are like the wizards of the digital world, conjuring up text that sounds eerily human. These marvels of artificial intelligence, such as GPT-3.5, are sophisticated algorithms that have been trained on vast swathes of text from the internet. They can understand context, generate coherent paragraphs, translate languages, and even assist in tasks like writing, chatbots, and more. Think of them as your trusty digital scribe, ready to assist with their textual sorcery whenever you summon them.

If you have used ChatGPT in the past, you probably were able to suspect that the previous paragraph was generated using it. And that's true. See the prompt here.

From the example above, you can witness the eloquence LLMs are capable of. Some people have been shocked so much that they became convinced that these models were sentient. However, in the end, they are nothing but a large, complex series of matrix and vector operations. These matrices and vectors have been trained to represent the semantic meaning of words.

In today's post, we will explore these meaning vectors and how they are related to Postgres. In particular, we are going to play with sentence transformers, vectors, and similarity search. All of that with the help of the pgvector Postgres extension.

Let’s go!

From words to vectors

Like we said, a vector can represent many things, for example, the position of a character in a 3D video game, the position of a pixel in your screen, the force applied to an object, a color in the RGB space, or even the meaning of a word…

Word embedding refers to the technique by which words can be represented as vectors. These days, the embeddings offered by OpenAI are very popular. However, other alternatives exist, like word2vect, Glove, FastText, and ELMo.

Similarly, entire sentences can be represented as vectors using OpenAI embeddings or SentenceTransformers, for example.

These models can be accessed through libraries for different languages. The following Python snippet shows how to obtain the vector embeddings of three sentences using SentenceTransformer:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = ['SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings.',
'Pgvector is postgres extension for vector similarity search.',
'Tembo will help you say goodby to database sprawl, and hello to Postgres.']

sentence_embeddings = model.encode(sentences)

for sentence, embedding in zip(sentences, sentence_embeddings):
print("Sentence:", sentence)
print("Embedding:", embedding)
print("")
note

The code used in this blog post can be found in this gist.

The mind-blowing part is that words and sentences with a similar meaning will have similar vectors. This characteristic is the basis of a search technique called similarity search, where we simply find the nearest embedding vectors to find texts that are similar to our query.

Postgres meets Language Models

Models are great at generating content that seems credible, as shown earlier. However, you may have experienced cases where ChatGPT hallucinates answers or delivers out-of-date information. That's because LLMs are pre-trained using general data. And, because of that, creating a chatbot based only on the pre-trained data wouldn't be helpful for your customers, for instance.

The concept of RAG (Retrieval-Augmented Generation) acknowledges this limitation.

One way of overcoming these problems is to store your company's knowledge base in a database.... preferably in a vector database. You could then query related content and feed that content to the LLM of your preference.

RAG with pgvector

Specialized vector databases include Milvus, Qdrant, Weaviate, and Pinecone. However, you probably want to stick to your Postgres database.

Postgres is not in itself a vector database, but extensions can come to the rescue one more time... This time with pgvector.

Let's use it and explore how we would query related content from a Postgres database.

pgvector: Postgres as a vector database

pgvector is a Postgres extension that helps work with vectors and stores them in your postgres database. It offers functions for calculating the distance between vectors and for similarity search.

For the following demo, I converted all of Tembo’s blogs into document vectors using the following Python script that uses the langchain framework.

from langchain.document_loaders import TextLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores.pgvector import PGVector

import os


CONNECTION_STRING = "postgresql+psycopg2://postgres:password@localhost:5432/vector_db"
COLLECTION_NAME = 'my_collection'

embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 20)

files = os.listdir('./corpus')

for file in files:
file_path = f"./corpus/{file}"
print(f"Loading: {file_path}")
loader = TextLoader(file_path)
document = loader.load()
texts = text_splitter.split_documents(document)
sentence_embeddings = embeddings.embed_documents([t.page_content for t in texts[:5]])

db = PGVector.from_documents(
embedding=embeddings,
documents=texts,
collection_name=COLLECTION_NAME,
connection_string=CONNECTION_STRING)

It basically loads each document and then inserts them into Postgres using the PGVector class. As a result, in my Postgres database called vector_db, I got two tables:

Show tables

  • langchain_pg_collection: contains information about all collections.
  • langchain_pg_embedding: contains all the resulting vectors.

The following picture shows part of the contents of (2):

Show vectors

The resulting vectors have 384 dimensions.

Are these sentences similar?

Let’s now play with these vectors.

Using pgvector we can search content that is similar to a query. For example, we can find content related to postgres 16.

First, we can obtain a vector that represents a query:

from langchain.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')
print embeddings.embed_query(“What is new in postgres 16")

Then we can search vectors stored in the database that are similar to the query vector. The tool for that is cosine distance, which in pgvector is represented with the <=> operator:

SELECT document, 1-(embedding <=> '[<your_vector_here>]') as cosine_similarity
FROM langchain_pg_embedding
ORDER BY cosine_similarity DESC
LIMIT 2;

The above query retrieves vectors/chunks of text ordered by how close they are (in terms of cosine distance) to the query vector. In my case, the most similar chunk of text was:

In case you missed it, Postgres 16 came out last week - and this year it arrived earlier than the last few years. There are many features that I’ve been looking forward to for the last few months and I’m excited to see them get into the hands of users. Before we dive into the specific features of this release, let’s discuss what a Postgres major release actually means.

[postgres-16]

Postgres Releases

The PostgreSQL Global Development Group releases a new major version every year with new features.

In addition, Postgres releases minor versions of each major release every 3 months or so with bug fixes and security fixes. No new features are released in minor versions, and that’s what makes major version releases so exciting as it’s the culmination of about a year’s worth of development work on the project.

Which is an excerpt from Postgres 16: The exciting and the unnoticed.

Let us look at what Postgres is doing behind the scenes, using explain analyze:

Limit  (cost=28.07..28.08 rows=2 width=641) (actual time=1.069..1.071 rows=2 loops=1)
-> Sort (cost=28.07..28.53 rows=181 width=641) (actual time=1.067..1.068 rows=2 loops=1)
Sort Key: ((embedding <=> '[<your_vector>]'::vector))
Sort Method: top-N heapsort Memory: 28kB
-> Seq Scan on langchain_pg_embedding (cost=0.00..26.26 rows=181 width=641) (actual time=0.036..0.953 rows=181 loops=1)
Planning Time: 0.079 ms
Execution Time: 1.093 ms
(7 rows)


We can observe that Postgres is sequentially scanning all rows. Then it computes the cosine distance for all those rows and sorts them. Finally, it takes the first two rows.

The sequential scan could be avoided if we had an index. Indeed, we can create one thanks to pgvector, for example:

alter table langchain_pg_embedding alter column embedding type vector(384);

CREATE INDEX ON langchain_pg_embedding USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
 Limit  (cost=5.01..5.11 rows=2 width=641) (actual time=0.175..0.179 rows=1 loops=1)
-> Index Scan using langchain_pg_embedding_embedding_idx2 on langchain_pg_embedding (cost=5.01..13.49 rows=181 width=641) (actual time=0.172..0.175 rows=1 loops=1)
Order By: (embedding <=> '[<your_vector>]'::vector)
Planning Time: 0.154 ms
Execution Time: 0.224 ms
(5 rows)

One thing to keep in mind is that these indexes are used for approximate nearest neighbor search. We’ll explore what that means in a future blog post. Let us know if that would be interesting for you.

Pgvector(ize)?

Ok, at this point you should now have a sense of what pgvector is, and how to use it together with Python. However, wouldn't it be great if the vectorizing step could happen all within Postgres?

Pg_vectorize is an extension being developed by Tembo that intends to streamline the process of generating vectors from the data in your Postgres tables. It uses a background worker to generate and update the embeddings in batches every N seconds. Also, if you need to find similar vectors, the extension can do that. All within Postgres. Isn't that a cool idea?

I invite you to check out the repository and stay tuned.

To wrap up...

In this post, we briefly discussed the concept of embeddings, why they are important, and how they can be generated using one of the multiple available libraries. We also explored how to store and query the resulting vectors using Postgres and the pgvector extension.

These concepts are relevant to leveraging a knowledge base in conjunction with LLMs in an emerging technique called RAG. Of course, when implementing a real-life solution, more factors need to be considered, and this post was just an introduction.

I invite everyone to try out pgvector (e.g. using the scripts in this post), and the different operations that it offers. Also, can you think of other uses of pgvector? Let us know your thoughts in @tembo_io.

Disclaimer

The first paragraph in this blog post was generated using ChatGPT. Here’s the prompt

· 5 min read
Jay Kothari

The maxim of building a product is to know your user. However, any company big or small will often have user data spread around in various systems. A data platform team will often deploy various pipelines that can sync data from various sources into a data warehouse. As an alternative, Postgres supports the concept of a Foreign Data Wrapper. Let’s dive into what this is and how it can help us.

In this blog, we'll look at clerk_fdw—a tool that bridges the gap between Clerk, a leading user management solution, and your very own Postgres Database. By the end, you'll discover how this integration can empower you to make data-driven decisions, optimize your pricing strategy, and refine your market approach. Let's get started!

What’s a Foreign Data Wrapper?

A foreign data wrapper is an extension available in PostgreSQL that allows you to bring ‘foreign data’ (i.e. data in a different Postgres DB, a different database like DB2, or even a different kind of data source, like an API) and query it the same way you would query a normal Postgres table. They are particularly useful when your data may be segregated into different databases, but are still related in ways that you could gather some useful information from them. In building a foreign data wrapper for Clerk.com, we have used Supabase Wrappers that make it easier to build Foreign Data Wrappers and interact with third-party data using SQL.

If you should take something away from this blog, is that Postgres’ Foreign Data Wrappers are a great tool to build an analytics platform based on Postgres. See examples of other FDWs in Trunk

What’s Clerk?

Clerk is a user management tool. With Clerk, users experience a seamless sign-up and sign-in flow, whether they prefer using email, SMS, or even their favorite social media accounts. Its versatility and developer-friendly APIs make it an excellent choice for us at Tembo for both efficiency and a superior user experience.

The Power of Integration

Being able to access data from a User Management Tool like Clerk as part of your data platform is especially useful because it enables you to have a 360-degree view of the user experience on your product, without having to set up any complex data export pipelines from Clerk into other systems.

In fact, we built clerk_fdw at Tembo to address needs in our internal analytics pipeline . Here are some of the ways we are using it:

  • Run advanced analytics that combine internal data with user data from Clerk.
  • Understand user interaction patterns with our product.
  • Identify and engage with top users.

clerk

Setting up clerk_fdw

The first step would be installing the clerk_fdw extension. You can install this extension using trunk.

trunk install clerk_fdw

The next step would be to enable the extension in your postgres instance. You can do so using the following command:

create extension if not exists clerk_fdw;

Create the foreign data wrapper for clerk

create foreign data wrapper clerk_wrapper
handler clerk_fdw_handler
validator clerk_fdw_validator;

Connect to Clerk using your credentials

create server my_clerk_server
foreign data wrapper clerk_wrapper
options (
api_key '<clerk secret Key>');

Create Foreign Table:

User table

This table will store information about the users.

Note: The current limit is 500 users. We are working to increase this limitation in future releases.

create foreign table clerk_users (
user_id text,
first_name text,
last_name text,
email text,
gender text,
created_at bigint,
updated_at bigint,
last_sign_in_at bigint,
phone_numbers bigint,
username text
)
server my_clerk_server
options (
object 'users'
);

Organization Table

This table will store information about the organizations.

Note: The current limit is 500 organizations. We are working to increase this limitation in future releases.

create foreign table clerk_organizations (
organization_id text,
name text,
slug text,
created_at bigint,
updated_at bigint,
created_by text
)
server my_clerk_server
options (
object 'organizations'
);

Junction Table

This table connects the clerk_users and clerk_orgs. It lists out all users and their roles in each organization.

create foreign table clerk_organization_memberships (
user_id text,
organization_id text,
role text
)
server my_clerk_server
options (
object 'organization_memberships'
);

Dive into the Data

Now you can query through your database and get useful information like:

  • How many organizations have been created each week in the past
  • How many users have signed up in the past 30 days
  • What organizations is a user part of
  • All users and their roles in an organization
  • And more….

Here are some of the charts we were able to make from the clerk foreign data wrapper using some synthetic data.

chart1 chart2

Conclusion

In conclusion, we believe that Postgres’ concept of Foreign Data Wrappers is more than just a technical integration—it's a game-changer that allows Postgres users to build data warehouse platforms that reach across all data sources in the business. It paves the way for businesses to harness critical insights directly from their operational databases, making informed decisions easier than ever before. See examples of other FDWs in Trunk

Give us a star and try out clerk_fdw by running the example in the README. If you hit any snags, please create an issue. We would greatly welcome contributions to the project as well.

· 18 min read
Steven Miller

back-in-time

A nice feature of AWS S3 is version history and lifecycle policies. When objects are updated or deleted, the old object version remains in the bucket, but it’s hidden. Old versions are deleted eventually by the lifecycle policy.

I would like something like that for my Postgres table data. We can use the temporal_tables extension for version history, and combine it with pg_partman to partition by time, automatically expiring old versions.

Data model

Let's say we have a table employees, and it looks like this:

       name       |  salary
------------------+----------
Bernard Marx | 10000.00
Lenina Crowne | 7000.00
Helmholtz Watson | 18500.00

We will add one more column to this table, sys_period, which is a time range. This time range represents "since when" is this row the current version. This range is unbounded on the right side, because all the rows in the employees table are the present version.

       name       |  salary  |             sys_period
------------------+----------+------------------------------------
Helmholtz Watson | 18500.00 | ["2023-09-28 13:30:19.24318+00",)
Bernard Marx | 11600.00 | ["2023-09-28 13:33:58.735932+00",)
Lenina Crowne | 11601.00 | ["2023-09-28 13:33:58.738827+00",)

We will make a new table employees_history to store previous versions. This will have the same columns as the employees table, but all the rows in sys_period are bounded on the the right and the left sides. These ranges represent when this row was the current version. We will configure temporal_tables to automatically create these rows when anything changes in the employees table.

     name      |  salary  |                            sys_period
---------------+----------+-------------------------------------------------------------------
Bernard Marx | 10000.00 | ["2023-09-28 13:30:19.18544+00","2023-09-28 13:33:58.683279+00")
Bernard Marx | 11200.00 | ["2023-09-28 13:33:58.683279+00","2023-09-28 13:33:58.731332+00")
Bernard Marx | 11400.00 | ["2023-09-28 13:33:58.731332+00","2023-09-28 13:33:58.735932+00")
Lenina Crowne | 7000.00 | ["2023-09-28 13:30:19.239152+00","2023-09-28 13:33:58.738827+00")

To automatically delete old versions, we'll add one more column to the employees_table, created_at. We will use this information to expire old versions after they are older than our retenion configuration, with the help of pg_partman.

Getting set up

This guide covers how to quickly try out Postgres extensions locally. I've followed that guide to set up my environment with temporal_tables and pg_partman.

I have a Dockefile, two SQL scripts, and a file with Postgres configurations.

.
├── Dockerfile
├── 0_startup.sql
├── 1_create_versioned_table.sql
└── custom.conf

Dockerfile: We use Trunk to install pg_partman and temporal_tables. Then, we copy the three other files into the image.

FROM quay.io/tembo/tembo-local:latest

RUN trunk install pg_partman
RUN trunk install temporal_tables

COPY 0_startup.sql $PGDATA/startup-scripts

COPY 1_create_versioned_table.sql $PGDATA/startup-scripts

COPY custom.conf $PGDATA/extra-configs

0_startup.sql: Enables temporal_tables and pg_partman when Postgres starts.

CREATE EXTENSION IF NOT EXISTS temporal_tables;
CREATE EXTENSION IF NOT EXISTS pg_partman;

1_create_versioned_table.sql: Creates a sample table, then enables version history on it.

-- Sample: an existing table we want to enable versioning on
CREATE TABLE employees
(
name text NOT NULL PRIMARY KEY,
department text,
salary numeric(20, 2)
);

/*
Adding version history to the table,
first we need to add a time range to the existing table.
This represents "since when" has this row been current.
*/
ALTER TABLE employees ADD COLUMN sys_period tstzrange NOT NULL;

/*
Creating a time-partitioned version table
each row has the range the data was valid for,
and also the time this version was created.
*/
CREATE TABLE employees_history (
LIKE employees INCLUDING DEFAULTS EXCLUDING INDEXES EXCLUDING CONSTRAINTS,
created_at timestamptz NOT NULL DEFAULT now())
PARTITION BY RANGE (created_at);

-- Allow efficient querying of partition key and name
CREATE INDEX ON employees_history (created_at);

/*
Enable automatic partitioning with pg_partman, partitioning every 1 minute.

It's more realistic to partition daily or greater.
*/
SELECT create_parent('public.employees_history', 'created_at', 'native', '1 minute');

-- This connects employees table to employees_history
CREATE TRIGGER versioning_trigger
BEFORE INSERT OR UPDATE OR DELETE ON employees
FOR EACH ROW EXECUTE PROCEDURE versioning('sys_period',
'employees_history',
true);

/*
Configure retention policy for employee history to keep old versions for 10 minutes.

It's more realistic to configure retention for 1 year.
*/
UPDATE part_config
SET retention = '10 minutes',
retention_keep_table = false,
retention_keep_index = false,
infinite_time_partitions = true
WHERE parent_table = 'public.employees_history';

custom.conf: our additions to the Postgres configuration.

# Enable pg_partman background worker
shared_preload_libraries = 'pg_partman_bgw'

# How many seconds between pg_partman background worker runs
# It's more realistic to run every 3600 seconds, or longer
pg_partman_bgw.interval = 10

# Which database pg_partman should target
pg_partman_bgw.dbname = 'postgres'

# It's best practice to use limited permissions for the background worker
# pg_partman_bgw.role = 'limitedrole'

# This was helpful when I was working on getting the settings working
# log_min_messages = 'DEBUG1'

With those four files in place, we can run Postgres like this:

docker build -t example-local-image .
docker run -it -d --name local-tembo -p 5432:5432 --rm example-local-image

In a separate shell, I connect into the Postgres container.

psql postgres://postgres:postgres@localhost:5432

Basic demo of saving old versions

After we are set up, we have version history and retention policy configured on the employees table, but both the employees table and the employees_history table are empty.

SELECT * FROM employees;
 name | department | salary | sys_period
------+------------+--------+------------
(0 rows)
SELECT * FROM employees_history;
 name | department | salary | sys_period | created_at
------+------------+--------+------------+------------
(0 rows)

Adding data:

INSERT INTO employees (name, department, salary)
VALUES ('Bernard Marx', 'Hatchery and Conditioning Centre', 10000);

INSERT INTO employees (name, department, salary)
VALUES ('Lenina Crowne', 'Hatchery and Conditioning Centre', 7000);

INSERT INTO employees (name, department, salary)
VALUES ('Helmholtz Watson', 'College of Emotional Engineering', 18500);

Now, the employees has some data, and employees_history is still empty.

SELECT name, salary, sys_period FROM employees;
       name       |   salary  |             sys_period
------------------+-----------+------------------------------------
Bernard Marx | 10000.00 | ["2023-09-28 20:23:14.840624+00",)
Lenina Crowne | 7000.00 | ["2023-09-28 20:23:14.911528+00",)
Helmholtz Watson | 18500.00 | ["2023-09-28 20:23:14.913555+00",)
(3 rows)
SELECT * FROM employees_history;
 name | department | salary | sys_period | created_at
------+------------+--------+------------+------------
(0 rows)

Modifying data:

UPDATE employees SET salary = 11200 WHERE name = 'Bernard Marx';
UPDATE employees SET salary = 11400 WHERE name = 'Bernard Marx';
UPDATE employees SET salary = 11600 WHERE name = 'Bernard Marx';
UPDATE employees SET salary = 11601 WHERE name = 'Lenina Crowne';

Now, the employees_history table has past versions.

SELECT name, salary, sys_period FROM employees;
       name       |  salary  |             sys_period
------------------+----------+------------------------------------
Helmholtz Watson | 18500.00 | ["2023-09-28 20:23:14.913555+00",)
Bernard Marx | 11600.00 | ["2023-09-28 20:23:50.731597+00",)
Lenina Crowne | 11601.00 | ["2023-09-28 20:23:50.734214+00",)
(3 rows)
SELECT name, salary, sys_period FROM employees_history;
     name      |  salary  |                            sys_period
---------------+----------+-------------------------------------------------------------------
Bernard Marx | 10000.00 | ["2023-09-28 20:23:14.840624+00","2023-09-28 20:23:50.684293+00")
Bernard Marx | 11200.00 | ["2023-09-28 20:23:50.684293+00","2023-09-28 20:23:50.727283+00")
Bernard Marx | 11400.00 | ["2023-09-28 20:23:50.727283+00","2023-09-28 20:23:50.731597+00")
Lenina Crowne | 7000.00 | ["2023-09-28 20:23:14.911528+00","2023-09-28 20:23:50.734214+00")
(4 rows)

Looking up past versions

Let's say we want to look up Bernard's salary at a previous date. We can check the employees_history table to find the row where the time range matches our provided timestamp. However, this wouldn't find the correct salary if we provide a timestamp that is after the most recent update to Bernard's salary, since that row is in the employees table.

We can first create a view for this purpose. We only need to do this once, then we can query this view like a table going forward.

CREATE VIEW employee_history_view AS

SELECT name, department, salary, sys_period
FROM employees

UNION ALL

SELECT name, department, salary, sys_period
FROM employees_history;

Then, we can use this query to find Bernard's salary at any given date.

SELECT salary
FROM employee_history_view
WHERE name = 'Bernard Marx'
AND sys_period @> TIMESTAMP WITH TIME ZONE '2023-09-28 20:23:30+00'
LIMIT 1;

@> Is a containment operator and you might recognize it if you have used JSONB.

Comparing to the employees_history table shown above, it is returning the correct value.

  salary
----------
10000.00
(1 row)

It also works to look up the current salary:

SELECT salary
FROM employee_history_view
WHERE name = 'Bernard Marx'
AND sys_period @> now()::TIMESTAMP WITH TIME ZONE
LIMIT 1;
  salary
----------
11600.00
(1 row)
SELECT salary FROM employees WHERE name = 'Bernard Marx';
  salary
----------
11600.00
(1 row)

If I try to query a salary from the future, it will return the current salary. If I try to query a salary from before Bernard is known in the employees_history table, then I get an empty result.

Partitioning

What is partitioning? Postgres documentation has detailed information on partitioning but just to summarize, partitioning is about splitting what is logically one large table into smaller tables. Typically, this is done for query performance. In our case, we are partitioning to expire old versions.

Partitioning tables is something I’m familiar with from Tembo’s work in PGMQ, which is a queueing extension for Postgres.

Performance

Writes

We should expect write performance to be slower, since we are writing to two tables for every update.

I created a new table that does not have versioning enabled to compare write performance.

-- Create a table like employees
CREATE TABLE employees_write_test
AS TABLE employees
WITH NO DATA;

-- ...and insert one row
INSERT INTO employees_write_test (name, department, salary, sys_period)
VALUES ('Bernard Marx', 'Hatchery and Conditioning Centre', 11600.00, tstzrange(now(), null));

Then, I used EXPLAIN ANALYZE to compare the write performance. I ran the query a few times for each.

Without versioning:

EXPLAIN ANALYZE
UPDATE employees_write_test
SET salary = 11608 WHERE name = 'Bernard Marx';

Three samples:

 Planning Time: 1.654 ms
Execution Time: 1.540 ms

Planning Time: 0.760 ms
Execution Time: 0.707 ms

Planning Time: 1.707 ms
Execution Time: 2.079 ms

With versioning:

EXPLAIN ANALYZE
UPDATE employees
SET salary = 11610 WHERE name = 'Bernard Marx';

Three samples:

 Planning Time: 2.423 ms
Trigger versioning_trigger: time=2.430 calls=1
Execution Time: 4.783 ms

Planning Time: 2.311 ms
Trigger versioning_trigger: time=1.091 calls=1
Execution Time: 2.979 ms

Planning Time: 2.825 ms
Trigger versioning_trigger: time=1.711 calls=1
Execution Time: 5.686 ms

It's more than twice as slow on a single update. That's because we have to write to two rows instead of one, there is more data to write (the time ranges), and because there is some additional processing, for instance determining which range to put on each row. In the next section, I also compare how much time it takes to write 100,000 rows in each of these tables.

Reads

We created a view which is a union between employees and employees_history, then we query the view to find an employee's salary at a given time.

To generate some data, let's make a procedure to update a salary 100,000 times in a row. The below example uses PL/pgSQL. By default, PL/pgSQL functions run as a single transaction, so it would only result in a single update to the employees_history table. For this reason, I am using a procedure with COMMIT so that each increment will be a separate transaction, this way we also get 100,000 updates to the employees_history table. I had to explain that nuance to chatGPT in order for this procedure to be produced properly.

-- Table name and employee name as inputs
CREATE OR REPLACE PROCEDURE increment_salary(p_name text, p_table_name text)
LANGUAGE plpgsql AS $$
DECLARE
v_salary numeric(20,2);
i integer;
v_sql text;
BEGIN
-- Dynamically construct the SQL to get the current salary
v_sql := format('SELECT salary FROM %I WHERE name = $1', p_table_name);
EXECUTE v_sql INTO v_salary USING p_name;

-- Loop 100 thousand times
FOR i IN 1..100000
LOOP
-- Increment the salary
v_salary := v_salary + 1;

-- Dynamically construct the SQL to update the salary
v_sql := format('UPDATE %I SET salary = $2 WHERE name = $1', p_table_name);
EXECUTE v_sql USING p_name, v_salary;

COMMIT; -- Commit the transaction, triggering the versioning procedure
END LOOP;
END
$$;

Run the procedure:

CALL increment_salary('Bernard Marx', 'employees');

This took 55 seconds to run on my laptop. I also tried it on the table without versioning enabled, at in this case it took 38 seconds. I ran it a couple more times on the table with versioning enabled, so that the versions would be distributed across multiple partitions. Now we have an employees_history table that's populated with many rows for Bernard.

SELECT count(*) FROM employees_history WHERE name = 'Bernard Marx';
 count
--------
300000
(1 row)

Let's run the same type of query command we ran before, with EXPLAIN ANALYZE. I picked a timestamp that will not be found to ensure it's as slow as possible.

EXPLAIN ANALYZE
SELECT salary
FROM employee_history_view
WHERE name = 'Bernard Marx'
AND sys_period @> TIMESTAMP WITH TIME ZONE '2023-09-28 15:28:25+00'
LIMIT 1;

Simplified query plan output:

Limit
-> Append
-> Bitmap Heap Scan on employees
Recheck Cond: (name = 'Bernard Marx'::text)
Filter: (sys_period @> '...')
Rows Removed by Filter: 1
Heap Blocks: exact=1
-> Bitmap Index Scan on employees_pkey
Index Cond: (name = 'Bernard Marx'::text)

... Empty partitions omitted ...

-> Seq Scan on employees_history_p2023_09_29_0030
Filter: ((sys_period @> '...') AND (name = 'Bernard Marx'::text))
Rows Removed by Filter: 31

-> Seq Scan on employees_history_p2023_09_29_0031
Filter: ((sys_period @> '...') AND (name = 'Bernard Marx'::text))
Rows Removed by Filter: 99969

... Empty partitions omitted ...

-> Seq Scan on employees_history_p2023_09_29_0035
Filter: ((sys_period @> '...') AND (name = 'Bernard Marx'::text))
Rows Removed by Filter: 97393

-> Seq Scan on employees_history_p2023_09_29_0036
Filter: ((sys_period @> '...') AND (name = 'Bernard Marx'::text))
Rows Removed by Filter: 102607

... Empty partitions omitted ...

Planning Time: 12.427 ms
Execution Time: 262.706 ms
(47 rows)

This query took 263 milliseconds. We notice this query needs to scan all partitions, because we are partitioning by created_at, and querying sys_period. We can improve the speed with indexes.

If this was a real workload, I doubt that employees' salaries are being updated so frequently, or at least that's been the case in my personal experience. However, if it's a big company, then there could be a lot of employees. In that case, it would be best to add an index on the name (or more realistically, employee ID) in the employees_history table. Then, withing each partition it will find only rows for the employee being queryed using the index, then it would scan the remaining rows, probably typically zero, one, or two rows, to find the correct salary.

Expiring old versions

Earlier in this blog, we configured pg_partman to partition in 1 minute increments, to expire partitions that are older than 15 minutes, and to check every 30 seconds. Every 30 seconds, any partition that is older that 15 minutes is deleted by the pg_partman background worker.

With this query, I can check how many rows and the total data size in each partition.

-- This query was produced by ChatGPT 4 with the prompt:
-- "How can I check the number of rows in each partition of employees_history?"
SELECT
child.relname AS partition_name,
pg_total_relation_size(child.oid) AS total_size,
pg_relation_size(child.oid) AS data_size,
pg_stat_user_tables.n_live_tup AS row_count
FROM
pg_inherits
JOIN
pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN
pg_class child ON pg_inherits.inhrelid = child.oid
LEFT JOIN
pg_stat_user_tables ON child.oid = pg_stat_user_tables.relid
WHERE
parent.relname='employees_history'
ORDER BY
partition_name;

In order to check that old versions are being dropped, I ran the procedure to create a lot of salaray increments several times in a row.

Then, running the above query, I find an output like this:

           partition_name           | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2204 | 16384 | 0 | 0
employees_history_p2023_09_28_2205 | 16384 | 0 | 0
employees_history_p2023_09_28_2206 | 16384 | 0 | 0
employees_history_p2023_09_28_2207 | 16384 | 0 | 0
employees_history_p2023_09_28_2208 | 16384 | 0 | 0
employees_history_p2023_09_28_2209 | 16384 | 0 | 0
employees_history_p2023_09_28_2210 | 32768 | 8192 | 4
employees_history_p2023_09_28_2211 | 9584640 | 7995392 | 68267
employees_history_p2023_09_28_2212 | 4489216 | 3719168 | 31733
employees_history_p2023_09_28_2213 | 13180928 | 11018240 | 94144
employees_history_p2023_09_28_2214 | 868352 | 688128 | 5856
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
(16 rows)

In this output, we can see that we have 1 partition for every minute, and a total of 15 partitions. I have old versions expiring after 10 minutes. I thought it's interesting to note that pg_partman is preemptively creating partitions for the future, in this case 5 minutes into the future.

If you refer to the original set up steps, I have configured infinite_time_partitions = true, and this means we will generate partitions even when we are not generating any data for them. I think this is the proper configuration since we also have a retention policy that will drop the old partitions. The concern of making infinite partitions as time passes, even if no data is being generated, is not applicable because old tables are being dropped.

To confirm data was being deleted, I sampled the above query over time, and we can see the large body of inserts moving up into the oldest available partitions, then falling outside of the retention policy and being deleted.


partition_name | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2207 | 16384 | 0 | 0
employees_history_p2023_09_28_2208 | 16384 | 0 | 0
employees_history_p2023_09_28_2209 | 16384 | 0 | 0
employees_history_p2023_09_28_2210 | 32768 | 8192 | 4
employees_history_p2023_09_28_2211 | 9584640 | 7995392 | 68267
employees_history_p2023_09_28_2212 | 4489216 | 3719168 | 31733
employees_history_p2023_09_28_2213 | 13189120 | 11018240 | 94144
employees_history_p2023_09_28_2214 | 876544 | 688128 | 5856
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
employees_history_p2023_09_28_2219 | 16384 | 0 | 0
employees_history_p2023_09_28_2220 | 16384 | 0 | 0
employees_history_p2023_09_28_2221 | 16384 | 0 | 0
(16 rows)


partition_name | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2211 | 9584640 | 7995392 | 68267
employees_history_p2023_09_28_2212 | 4489216 | 3719168 | 31733
employees_history_p2023_09_28_2213 | 13189120 | 11018240 | 94144
employees_history_p2023_09_28_2214 | 876544 | 688128 | 5856
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
employees_history_p2023_09_28_2219 | 16384 | 0 | 0
employees_history_p2023_09_28_2220 | 16384 | 0 | 0
employees_history_p2023_09_28_2221 | 16384 | 0 | 0
employees_history_p2023_09_28_2222 | 16384 | 0 | 0
employees_history_p2023_09_28_2223 | 16384 | 0 | 0
employees_history_p2023_09_28_2224 | 16384 | 0 | 0
employees_history_p2023_09_28_2225 | 16384 | 0 | 0
(16 rows)


partition_name | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2212 | 4489216 | 3719168 | 31733
employees_history_p2023_09_28_2213 | 13189120 | 11018240 | 94144
employees_history_p2023_09_28_2214 | 876544 | 688128 | 5856
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
employees_history_p2023_09_28_2219 | 16384 | 0 | 0
employees_history_p2023_09_28_2220 | 16384 | 0 | 0
employees_history_p2023_09_28_2221 | 16384 | 0 | 0
employees_history_p2023_09_28_2222 | 16384 | 0 | 0
employees_history_p2023_09_28_2223 | 16384 | 0 | 0
employees_history_p2023_09_28_2224 | 16384 | 0 | 0
employees_history_p2023_09_28_2225 | 16384 | 0 | 0
employees_history_p2023_09_28_2226 | 16384 | 0 | 0
(16 rows)

postgres=# select count(*) from employees_history;
count
--------
131733
(1 row)

partition_name | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2214 | 876544 | 688128 | 5856
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
employees_history_p2023_09_28_2219 | 16384 | 0 | 0
employees_history_p2023_09_28_2220 | 16384 | 0 | 0
employees_history_p2023_09_28_2221 | 16384 | 0 | 0
employees_history_p2023_09_28_2222 | 16384 | 0 | 0
employees_history_p2023_09_28_2223 | 16384 | 0 | 0
employees_history_p2023_09_28_2224 | 16384 | 0 | 0
employees_history_p2023_09_28_2225 | 16384 | 0 | 0
employees_history_p2023_09_28_2226 | 16384 | 0 | 0
employees_history_p2023_09_28_2227 | 16384 | 0 | 0
employees_history_p2023_09_28_2228 | 16384 | 0 | 0
(16 rows)

postgres=# select count(*) from employees_history;
count
-------
5856
(1 row)

partition_name | total_size | data_size | row_count
------------------------------------+------------+-----------+-----------
employees_history_default | 16384 | 0 | 0
employees_history_p2023_09_28_2215 | 16384 | 0 | 0
employees_history_p2023_09_28_2216 | 16384 | 0 | 0
employees_history_p2023_09_28_2217 | 16384 | 0 | 0
employees_history_p2023_09_28_2218 | 16384 | 0 | 0
employees_history_p2023_09_28_2219 | 16384 | 0 | 0
employees_history_p2023_09_28_2220 | 16384 | 0 | 0
employees_history_p2023_09_28_2221 | 16384 | 0 | 0
employees_history_p2023_09_28_2222 | 16384 | 0 | 0
employees_history_p2023_09_28_2223 | 16384 | 0 | 0
employees_history_p2023_09_28_2224 | 16384 | 0 | 0
employees_history_p2023_09_28_2225 | 16384 | 0 | 0
employees_history_p2023_09_28_2226 | 16384 | 0 | 0
employees_history_p2023_09_28_2227 | 16384 | 0 | 0
employees_history_p2023_09_28_2228 | 16384 | 0 | 0
employees_history_p2023_09_28_2229 | 16384 | 0 | 0
(16 rows)

postgres=# select count(*) from employees_history;
count
-------
0
(1 row)

Thanks!

If you got this far, thank you for reading this! I hope that you are inspired to try out extensions on your own and see what they can do. The next time you have some problem to solve with your data, consider that maybe it could just be handled by a Postgres extension.

If you want to try extensions without any local setup, you should try Tembo Cloud at cloud.tembo.io.

Just use Postgres!

· 12 min read
Binidxaba

In my previous submission to this space, I described my experience with pgmq while using the Python library. In this post, I'll share what I found after inspecting the code.

So, first, I'll describe the general structure of the project. Then, I'll explain what happens when we install the pgmq extension. Finally, I'll describe how some of its functions work.

In this post, I'll be using version v0.25.0, which you can find here.

Project structure

After cloning the appropriate tag, we can see that the repository contains the following files:

$ ls -1
Cargo.lock
Cargo.toml
CONTRIBUTING.md
core
Dockerfile.build
examples
images
LICENSE
Makefile
pgmq.control
pgmq-rs
README.md
sql
src
tembo-pgmq-python
tests

The project uses pgrx. From pgrx's README, we know that the relevant files for the extension are Cargo.toml, pgmq.control and the src and sql directories:

$ tree sql src
sql
├── pgmq--0.10.2--0.11.1.sql
├── pgmq--0.11.1--0.11.2.sql
...
├── pgmq--0.8.0--0.8.1.sql
├── pgmq--0.8.1--0.9.0.sql
└── pgmq--0.9.0--0.10.2.sql
src
├── api.rs
├── errors.rs
├── lib.rs
├── metrics.rs
├── partition.rs
├── sql_src.sql
└── util.rs

0 directories, 7 files

Installing the pgmq extension

note

This section assumes that you have successfully installed the pre-requisites as described in CONTRIBUTING.md

To build the pgmq extension, we can do the following:

cargo build

Alternatively, to build and install the pgmq extension, we can do:

cargo pgrx install

In either case, we can see a shared library pgmq.so being created. The installation process also places the shared library in the lib directory of the postgres installation; and the sql files and the control file in the extensions directory. In my case:

$ ls -1 /opt/postgres/share/extension/pgmq*
/opt/postgres/share/extension/pgmq--0.10.2--0.11.1.sql
...
/opt/postgres/share/extension/pgmq--0.9.0--0.10.2.sql
/opt/postgres/share/extension/pgmq.control

$ ls -1 /opt/postgres/lib/pgmq*
/opt/postgres/lib/pgmq.so

To test the extension, we can do:

cargo pgrx run

and it'll start a psql prompt. In the prompt, we can execute the create extension statement to start using pgmq:

-- List installed extensions
\dx

-- Enable pgmq
create extension pgmq;

-- List installed extensions again
\dx

The output will look something like:

pgmq=# \dx
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(1 row)

pgmq=# create extension pgmq;
CREATE EXTENSION

pgmq=# \dx
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+---------------------------------------------------------------------
pgmq | 0.25.0 | public | A lightweight message queue. Like AWS SQS and RSMQ but on Postgres.
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)

We can also list the available functions:

-- List available functions under pgmq schema
\df pgmq.*
pgmq=# \df pgmq.*
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------+------
pgmq | archive | boolean | queue_name text, msg_id bigint | func
pgmq | archive | TABLE(archive boolean) | queue_name text, msg_ids bigint[] | func
pgmq | create | void | queue_name text | func
pgmq | create_non_partitioned | void | queue_name text | func
pgmq | create_partitioned | void | queue_name text, partition_interval text DEFAULT '10000'::text, retention_interval text DEFAULT '100000'::text | func
pgmq | delete | boolean | queue_name text, msg_id bigint | func
pgmq | delete | TABLE(delete boolean) | queue_name text, msg_ids bigint[] | func
pgmq | drop_queue | boolean | queue_name text, partitioned boolean DEFAULT false | func
pgmq | list_queues | TABLE(queue_name text, created_at timestamp with time zone) | | func
pgmq | metrics | TABLE(queue_name text, queue_length bigint, newest_msg_age_sec integer, oldest_msg_age_sec integer, total_messages bigint, scrape_time timestamp with time zone) | queue_name text | func
pgmq | metrics_all | TABLE(queue_name text, queue_length bigint, newest_msg_age_sec integer, oldest_msg_age_sec integer, total_messages bigint, scrape_time timestamp with time zone) | | func
pgmq | pop | TABLE(msg_id bigint, read_ct integer, enqueued_at timestamp with time zone, vt timestamp with time zone, message jsonb) | queue_name text | func
pgmq | purge_queue | bigint | queue_name text | func
pgmq | read | TABLE(msg_id bigint, read_ct integer, enqueued_at timestamp with time zone, vt timestamp with time zone, message jsonb) | queue_name text, vt integer, "limit" integer | func
pgmq | read_with_poll | TABLE(msg_id bigint, read_ct integer, enqueued_at timestamp with time zone, vt timestamp with time zone, message jsonb) | queue_name text, vt integer, "limit" integer, poll_timeout_s integer DEFAULT 5, poll_interval_ms integer DEFAULT 250 | func
pgmq | send | bigint | queue_name text, message jsonb, delay integer DEFAULT 0 | func
pgmq | send_batch | TABLE(msg_id bigint) | queue_name text, messages jsonb[], delay integer DEFAULT 0 | func
pgmq | set_vt | TABLE(msg_id bigint, read_ct integer, enqueued_at timestamp with time zone, vt timestamp with time zone, message jsonb) | queue_name text, msg_id bigint, vt_offset integer | func
(18 rows)

With this, we can now explore the extension from the inside. And, if needed, recompile and reinstall the extension to play with it.

The internals

We know that when an extension is created with pgrx, it generates a lib.rs file. Let us explore it.

One of the first thing we can see, is that the other five files in the src/ directory are included:

pub mod api;
pub mod errors;
pub mod metrics;
pub mod partition;
pub mod util;

After reviewing these files a little bit, we can notice that there's also some relevant code in another module, the one in core/. For example, in src/partition.rs:

use pgmq_core::{
errors::PgmqError,
query::{
assign_archive, assign_queue, create_archive, create_archive_index, create_index,
create_meta, grant_pgmon_meta, grant_pgmon_queue, grant_pgmon_queue_seq, insert_meta,
},
types::{PGMQ_SCHEMA, QUEUE_PREFIX},
util::CheckedName,
};

So, at this point we know that we can find the source code in two places: src/ and core/.

If we continue exploring lib.rs, we can see that a sql file (sql_src.sql) is executed when the extension is enabled:

CREATE TABLE pgmq.meta (
queue_name VARCHAR UNIQUE NOT NULL,
is_partitioned BOOLEAN NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL
);
...

We can actually see that table with psql:

-- List tables in the pgmq schema
\dt pgmq.*

-- List contents of pgmq.meta
select * from pgmq.meta;
pgmq-# \dt pgmq.*
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | meta | table | binidxaba
(1 row)

pgmq=# select * from pgmq.meta;
queue_name | created_at
------------+------------
(0 rows)

The following diagram shows what the pgmq schema looks like right after CREATE EXTENSION is executed:

after-create-extension

From this point, we can suspect that every time we create a queue, a new row is inserted into this table.

Let us see what pgmq.create() does...

pgmq.create()

Most of the functions provided by pgmq are defined in src/api.rs. In that file, we can find the function pgmq_create(queue_name: &str), and if we chase the call sequence, we can discover that the interesting function is init_queue(name: &str) in core/src/query.rs:

pub fn init_queue(name: &str) -> Result<Vec<String>, PgmqError> {
let name = CheckedName::new(name)?;
Ok(vec![
create_queue(name)?,
assign_queue(name)?,
create_index(name)?,
create_archive(name)?,
assign_archive(name)?,
create_archive_index(name)?,
insert_meta(name)?,
grant_pgmon_queue(name)?,
])
}

This function generates several sql statements that are later executed in pgmq_create_non_partitioned using an Spi client.

I'll skip the details, but the sql statements basically do:

  1. Create a table pgmq.q_<queue_name>.
  2. Assign the table to the pqmg extension.
  3. Create an index on the pgmq.q_<queue_name> table.
  4. Create a table pgmq.a_<queue_name>.
  5. Assign the table to the pgmq extension.
  6. Create an index on the pgmq.a_<queue_name> table.
  7. Insert a row on the pgmq.meta table.
  8. Grant privileges to pg_monitor.

We can see the effects of this in psql using the following lines:

-- Create a queue
select pgmq.create('my_queue');

-- List tables
\dt pgmq.*

-- List indexes
\di pgmq.*

-- See the contents of pgmq_meta
select * from pgmq.meta;

The output will show something like:

pgmq=# select pgmq_create('my_queue');
create
--------

(1 row)

pgmq=# \dt pgmq.*;
List of relations
Schema | Name | Type | Owner
--------+------------+-------+-----------
pgmq | a_my_queue | table | binidxaba
pgmq | meta | table | binidxaba
pgmq | q_my_queue | table | binidxaba
(3 rows)

pgmq=# \di pgmq.*
List of relations
Schema | Name | Type | Owner | Table
--------+--------------------------+-------+-----------+------------
pgmq | a_my_queue_pkey | index | binidxaba | a_my_queue
pgmq | archived_at_idx_my_queue | index | binidxaba | a_my_queue
pgmq | meta_queue_name_key | index | binidxaba | meta
pgmq | q_my_queue_pkey | index | binidxaba | q_my_queue
pgmq | q_my_queue_vt_idx | index | binidxaba | q_my_queue
(5 rows)

pgmq=# select * from pgmq.meta;
queue_name | is_partitioned | created_at
------------+----------------+-------------------------------
my_queue | f | 2023-09-18 23:35:38.163096-06
(1 row)

The following diagram shows what the pgmq schema looks like at this point:

complete

For the queue my_queue, we can see the underlying table and the corresponding archive table. Each table has an index associated with the primary key. The pgmq.q_my_queue table also has an index on the vt column, and pgmq.a_my_queue has an index on the archived_at column.

We can suspect that the pgmq.q_my_queue table is used in the send and read operations. Let us look at those two functions.

pgmq.send()

We can explore the send operation in a similar way. The relevant SQL is straightforward. It just inserts a new row in the the underlying table:

INSERT INTO {PGMQ_SCHEMA}.{QUEUE_PREFIX}_{name} (vt, message)
VALUES {values}
RETURNING msg_id;

pgmq-send

note

At this point, we can see the following pattern in the pgmq project:

  • the exposed SQL functions are defined in src/api.rs, and
  • the underlying SQL statements are defined in core/src/query.rs

pgmq.read()

So, let's see. If I were the one programming pgmq.read(), I would perhaps do something like "get the first {limit} rows from the queue table whose {vt} has already expired, and for those rows, also update the visibility timeout to now() + {vt}." Naively, maybe something like:

update pgmq.q_my_queue
SET
vt = clock_timestamp() + interval '10 seconds',
read_ct = read_ct + 1
WHERE
msg_id in (select msg_id from pgmq.q_my_queue where vt <= clock_timestamp()
ORDER BY msg_id ASC
LIMIT 1);

In reality, pgmq.read is more interesting than that 😅. It performs the following DML:

WITH cte AS
(
SELECT msg_id
FROM {PGMQ_SCHEMA}.{QUEUE_PREFIX}_{name}
WHERE vt <= clock_timestamp()
ORDER BY msg_id ASC
LIMIT {limit}
FOR UPDATE SKIP LOCKED
)
UPDATE {PGMQ_SCHEMA}.{QUEUE_PREFIX}_{name} t
SET
vt = clock_timestamp() + interval '{vt} seconds',
read_ct = read_ct + 1
FROM cte
WHERE t.msg_id=cte.msg_id
RETURNING *;

pgmq-read

Firstly, in pgmq's version, there is a CTE (Common Table Expression) to obtain the first {limit} message IDs whose vt has expired. (It would be interesting to discuss why pgmq developers used a CTE, but we can explore that in another post.)

There are two crucial things to notice in the CTE. One is the order by clause that ensures the FIFO ordering. The other one is the FOR UPDATE SKIP LOCKED clause, claiming the rows no one else has claimed. This part is essential because it ensures correctness in the case of concurrent pgmq.read() operations.

The next step in the DML is to update the corresponding rows with a new vt value by adding the supplied {vt} to the current timestamp. Additionally, the read_ct value is incremented by 1. What is the use of this counter? In general, we can suspect that there is a problem processing a given message if it has a high read_ct value because users usually archive the message after successfully processing it. So, ideally, a message is only read once.

pgmq.archive()

The next stage in the lifecycle of a message is archiving it. For that, pgmq uses the following insert statement:

WITH archived AS (
DELETE FROM {PGMQ_SCHEMA}.{QUEUE_PREFIX}_{name}
WHERE msg_id = ANY($1)
RETURNING msg_id, vt, read_ct, enqueued_at, message
)
INSERT INTO {PGMQ_SCHEMA}.{ARCHIVE_PREFIX}_{name} (msg_id, vt, read_ct, enqueued_at, message)
SELECT msg_id, vt, read_ct, enqueued_at, message
FROM archived
RETURNING msg_id;

Essentially, it deletes the message with the provided msg_id from the queue table, and then the message is placed in the corresponding archive table.

pgmq-archive

One interesting thing to notice is that pgmq.archive() can be used to archive a batch of messages too:

select pgmq.archive('my_queue', ARRAY[3, 4, 5]);
pgmq=# select pgmq.archive('my_queue', ARRAY[3, 4, 5]);
pgmq_archive
--------------
t
t
t
(3 rows)

That is achieved in pgrx by declaring two functions using the same name in the pg_extern derive macro as follows:

#[pg_extern(name = "archive")]
fn pgmq_archive(queue_name: &str, msg_id: i64) -> Result<Option<bool>, PgmqExtError> {
//...
}

#[pg_extern(name = "archive")]
fn pgmq_archive_batch(
queue_name: &str,
msg_ids: Vec<i64>,
) -> Result<TableIterator<'static, (name!(archive, bool),)>, PgmqExtError> {
//...
}

pgmq.drop_queue()

Finally, let's talk about pgmq.drop_queue(). It essentially executes multiple statements:

  1. Unassign the pgmq.q_<queue_name> table from the extension.
  2. Unassign the pgmq.a_<queue_name> table from the extension.
  3. Drop the table pgmq.q_<queue_name>.
  4. Drop the table pgmq.a_<queue_name>.
  5. Delete the corresponding row from the pgmq.meta table.

Nothing surprising in this one, and with it, we conclude our tour.

Conclusion

In this post, we explored how the pgrx tool is used to generate the pgmq extension. We explored how the metadata objects are created and how they are used in the basic send, read and archive operations. At least from an explorer perspective, the internals of the extension are currently easy to read and understand.

I invite everyone to explore how the other pgmq functions work. You can explore the code at https://github.com/tembo-io/pgmq. And you can learn more about pgrx at: https://github.com/pgcentralfoundation/pgrx.

· 13 min read
Steven Miller

Before working for a Postgres company, I had never used extensions.

Now, I'm part of a team working to fully automate turning on any extension. I didn't find great resources to explain the process of turning on an extension, and why it varies between different extensions.

I want to share my mental model for the different types of extensions, how to know what type of extension you're working with, and how to get it turned on.

Turn on an extension

one-does-not-simply

What's traditionally involved:

  • Find the extension you want
  • Figure out how to build it
  • Sometimes, installation of dependencies (for example with apt-get or yum)
  • Sometimes, installation of other extensions (goto ‘figure out how to build it’)
  • Install your extension
  • Sometimes, load a library
  • Sometimes, provide extension-specific configurations
  • Sometimes, run the CREATE EXTENSION command

Building and installing extensions is well covered by other resources. In this blog, I want to focus on steps to get an extension up and running after it's installed, and how I believe that all extensions fit into four mostly-tidy categories.

Terminology

Extensions consist of SQL and / or libraries.

A library simply means compiled code, for example written in C or Rust.

SQL objects, let's just call it SQL, are extensions of SQL, for example new functions and data types. These are often implemented by a library, but can also be implemented in other ways, for example using a procedural language like PL/pgSQL.

Hooks: A Postgres feature informally called hooks can be used to connect into Postgres' existing functionality. Hooks allow for overwriting default Postgres functionality, or calling back into an extension's code at the appropriate time. For example, one type of hook can modify Postgres start up behavior to launch a background worker, and a different type of hook can be used to redirect queries to a different table.

note

Sometimes extensions are instead referred to as 'modules', but I like to simply refer to everything as an 'extension', but feel free to @ me on X to tell me why that's wrong (@sjmiller609).

Enter the matrix

im-in-matrix

A big part of what I have been working on is fully automating enabling any extension. In order to do that, we have to understand exactly how extensions vary. We can break it down into a 2x2 matrix by defining two boolean categories.

Requires LOAD true or false and requires CREATE EXTENSION true or false:

Requires CREATE EXTENSIONDoes not require CREATE EXTENSION
Requires LOADExtensions that use SQL and their libraries have hooksExtensions that do not use SQL, may or may not have hooks
Does not require LOADSQL-only extensions, and SQL + libraries without hooksOutput plugins

By categorizing an extension into this 2x2 matrix, we can know how to turn it on.

LOAD

LOAD is a command that tells Postgres to load a library, meaning make the code accessible to Postgres by loading the compiled code on disk into memory. If a library has hooks, performing a load will activate the hooks.

Requires LOAD: true means you have to do one of the following steps to load a library:

  • LOAD: using the LOAD command directly loads a library for the current connection only
  • session_preload_libraries: configuration, specifies which libraries to LOAD for new connections
  • shared_preload_libraries: configuration, specifies which libraries to LOAD at server start, and therefore requires a restart
note

Even though code is loaded in other ways during CREATE EXTENSION, that is not requires LOAD: true under this definition. I mean that the user must do something other than CREATE EXTENSION to load in libraries. Also, we are conflating local_preload_libraries with session_preload_libraries to simplify things in this blog post.

For example, if you installed the extension auto explain, then you may have a library file called auto_explain.so in your library directory, which can be found with pg_config --pkglibdir. Libraries are not always named exactly the same as the extension.

$ trunk install auto_explain

Using pkglibdir: /var/lib/postgresql/data/tembo/15/lib

[+] auto_explain.so => /var/lib/postgresql/data/tembo/15/lib
$ ls $(pg_config --pkglibdir) | grep auto_explain
auto_explain.so

Auto explain can be loaded into your session like LOAD 'auto_explain';. This command will always match exactly the name of the library file, less the file type, in this example .so. With a couple of configurations, now this extension will automatically log the EXPLAIN ANALYZE output for long-running queries.

postgres=# LOAD 'auto_explain';
LOAD

However, the LOAD command is not typically used directly, and many extensions require you do not load them in this way. Instead, typically the Postgres configuration shared_preload_libraries is used instead.

postgres=# LOAD 'pg_cron';
ERROR: pg_cron can only be loaded via shared_preload_libraries
HINT: Add pg_cron to the shared_preload_libraries configuration variable in postgresql.conf.

The best reason to use LOAD directly is for debugging. It can be nice to LOAD on-demand while troubleshooting.

info

What to do when an extension requires load:

Extensions that requires LOAD: true can always be configured in shared_preload_libraries, but this configuration requires a restart to take effect. Some extensions can be loaded without a restart using LOAD directly, but in this case it's usually better to use the session_preload_libraries configuration, and reload the Postgres configuration with SELECT pg_reload_conf();. You should run LOAD directly when you are intentionally loading for only the current connection.

CREATE EXTENSION

When you run CREATE EXTENSION, this basically just runs an extension's SQL script. The script will typically create new SQL objects such as functions, data types, operators and so on.

CREATE EXTENSION looks at the extension's control file, which is installed to the extension directory of sharedir.

$ trunk install pg_jsonschema;

Using pkglibdir: "/var/lib/postgresql/data/tembo/15/lib"
Using sharedir: "/var/lib/postgresql/data/tembo"

[+] pg_jsonschema.so => /var/lib/postgresql/data/tembo/15/lib
[+] extension/pg_jsonschema--0.1.4.sql => /var/lib/postgresql/data/tembo
[+] extension/pg_jsonschema.control => /var/lib/postgresql/data/tembo

sharedir can be located with pg_config --sharedir

$  ls $(pg_config --pkglibdir) | grep pg_jsonschema
pg_jsonschema.so

$ ls $(pg_config --sharedir)/extension | grep pg_jsonschema
pg_jsonschema--0.1.4.sql
pg_jsonschema.control

The information in a control file is used to determine what start up or upgrade scripts to run. We'll cover upgrades in-depth in a future blog, so let's focus on first-time enabling. For example, in the above installation output, we notice a file pg_jsonschema--0.1.4.sql. Postgres knows to run this because the name of the control file matches the name of the script suffixed by the default_version defined in the control file.

$ cat $(pg_config --sharedir)/extension/pg_jsonschema.control
comment = 'JSON schema validation on json and jsonb data types'
default_version = '0.1.4'
module_pathname = '$libdir/pg_jsonschema'
relocatable = false
superuser = true

When running CREATE EXTENSION, the extension name always matches exactly the name of a control file, less the .control file type.

postgres=# CREATE EXTENSION pg_jsonschema;
CREATE EXTENSION

I mentioned that a start up script creates new SQL, including new functions. For example in the case of pg_jsonschema, the start up script pg_jsonschema--0.1.4.sql includes the following SQL to create a new function called jsonb_matches_schema. Even though we have a library file, we don't need LOAD because CREATE FUNCTION is another way to load code from a file. This is an example of requires LOAD: false, requires CREATE EXTENSION: true.

CREATE FUNCTION ... AS 'obj_file' documentation

obj_file is the name of the shared library file containing the compiled [code]

CREATE FUNCTION "jsonb_matches_schema"(
"schema" json,
"instance" jsonb
) RETURNS bool
IMMUTABLE STRICT
LANGUAGE c
AS 'MODULE_PATHNAME', 'jsonb_matches_schema_wrapper';
info

You can always know whether or not an extension requires CREATE EXTENSION by the presence of a control file in $(pg_config --sharedir)/extension

Hooks that require a restart

An extension is in the category requires CREATE EXTENSION: true and requires LOAD: true if the extension has libraries that use hooks which require a restart and it has a control file.

You will be able to identify this is the case when the extension's documentation mentions both CREATE EXTENSION and shared_preload_libraries. Sometimes an error message or hint is provided if you run CREATE EXTENSION before loaded the library, or if you try to run LOAD directly, but you can't count on that.

For example, in the case of both pg_cron and pg_partman, there are a background workers. These are examples of extensions using hooks in the start up process of Postgres. So, in both of these cases the user is expected to configure shared_preload_libraries to start the background worker, then run CREATE EXTENSION on a cluster where that background worker is already running.

LOAD is needed when there isn't a control file

In the case of auto_explain, it uses hooks that do not require a restart. In this case, there is no control file and no extra SQL objects to be created. So LOAD is required simply because we have to load it into memory somehow. To demonstrate, it is technically possible to make a control file for auto_explain to allow for CREATE EXTENSION behavior instead of LOAD:

auto_explain.control:

comment = 'auto explain'
default_version = '0.0.1'
module_pathname = '$libdir/auto_explain'
relocatable = false
superuser = true

auto_explain--0.0.1.sql

LOAD 'auto_explain';
caution

In practice, do not use LOAD in an extension start up script to activate hooks. LOAD is only applicable for the current connection.

postgres=# CREATE EXTENSION auto_explain;
CREATE EXTENSION

postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
-----------------+---------+------------+------------------------------
auto_explain | 0.0.1 | public | auto explain
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)

postgres=# SET auto_explain.log_min_duration = 0;
SET
postgres=# SET auto_explain.log_analyze = true;
SET

After running the above, now my subsequent queries have their EXPLAIN ANALYZE logged.

So, if that could work, why not just have control files for all extensions?

Having a control file requires version upgrade handling.

When you have a control file, you also have to write upgrade scripts for every new version. In the case of pg_cron, we can find all these files in sharedir. When enabling version 1.5, it will run pg_cron--1.0.sql, then each migration script up to 1.5.

pg_cron--1.0--1.1.sql
pg_cron--1.0.sql
pg_cron--1.1--1.2.sql
pg_cron--1.2--1.3.sql
pg_cron--1.3--1.4.sql
pg_cron--1.4-1--1.5.sql
pg_cron--1.4--1.4-1.sql
pg_cron.control

Since that's not really applicable on auto_explain, because it's just logging outputs and there is nothing to migrate or handle between versions, it's just cleaner to not have a control file. Upgrading auto_explain only involves replacing the library, then loading it again.

info

Upgrade logic is not applicable for extensions that do not require CREATE EXTENSION. These cases just involve re-loading a new version of the library.

You don't load hooks during CREATE EXTENSION

It made sense to me for activating hooks that require a restart they have to be configured in shared_preload_libraries. But for extensions that do not require a restart, it's not obvious why the hooks can't just be loaded during the CREATE EXTENSION start up script like I just demonstrated is possible with auto_explain.

Even though it's technically possible to LOAD hooks during CREATE EXTENSION, it's a bad idea.

First of all, when using the LOAD command directly, it's only applicable to the current connection. So, in the above example with auto explain, the queries are only logged in the connection where I ran CREATE EXTENSION. To apply to all connections without a restart, it would need to go into session_preload_libraries. It is technically possible to do that inside of CREATE EXTENSION by doing ALTER SYSTEM SET session_preload_libraries then SELECT pg_reload_conf() in your start up script, but it is not a good approach for CREATE EXTENSION to automatically perform a configuration update. First of all it would confuse a user to change a config on the fly, and secondly there is currently no concept to automatically merge multi-value, comma-separated configurations like session_preload_libraries.

info

The 2x2 matrix makes it easier to understand how to enable an extension.

Just ask yourself "do I need to run CREATE EXTENSION?" determined by presence of a control file, and "do I need to do a LOAD?" determined by any mention of LOAD, shared_preload_libraries, or session_preload_libraries in the extension's documentation or an error message.

In all cases of needing a LOAD, you can get away with setting it in shared_preload_libraries. You can optimize to avoid restarts in some cases.

Output plugins

There are some extensions, for example wal2json that require neither CREATE EXTENSION or LOAD. In all known cases so far, these are output plugins. I think it's more of a stretch to call these 'extensions', but since they provide additional functionality to Postgres, that counts in my book.

In the case of output plugins, the library is loaded when a replication slot is created:

Postgresql documentation

SELECT * FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding', false, true);

Installing extensions

Some of the above examples use the free and open source Trunk project that Tembo created, which allows us to skip the build process. It also installs extension dependencies, and provides metadata about other dependencies. When I'm trying out extensions, I am starting from one of Tembo’s container images to handle the system dependencies installation.

I wrote this guide for trying out extensions locally. If you have any issues just reach out on our community Slack channel and we can help.

Perhaps it's not so bad after all...

files-in-computer

Automate everything

We want to make it possible to automatically install and turn on any Postgres extension. For this reason, we are seeking to qualify all known extensions by these two dimensions: requires CREATE EXTENSION true or false, and requires LOAD true or false.

To enable the community, that metadata is being published on Trunk. On Tembo Cloud, we leverage that information to automatically enable extensions. Currently, we've got this working for over 150 extensions.

Dear experts, tell me how I'm wrong (seriously!)

I'm serious that I want you to tell me where this is incorrect! If you're a Postgres extensions expert, or maybe just know a thing or two about extensions that seems to conflict with something in this blog, please reach out on X @sjmiller609 and let me know. Even if it's just minor correction or subjective information, I'd love to hear from you. I also want to hear if there is an easier mental model than this. I hope this blog can serve as a minimal yet comprehensive explanation of what it takes to get extensions turned on.

Another way to contribute is to click the "Edit this page" link below, and suggest changes. I will happily accept improvements to this blog.