Announcing ParadeDB partnership: Search and Analytics for Postgres

Oct 3, 2024 4 min read
blog post hero image

Today, we’re excited to announce the launch of two new Tembo stacks, powered by ParadeDB.

  • The ParadeDB Stack — Designed as a search stack, it is powered by the ParadeDB extensions (pg_search and pg_analytics) and is designed as an Elasticsearch alternative for customers needing high-quality full-text search over Postgres data.
  • The Analytics Stack — Gives developers a data lake experience on Postgres. It is powered by the ParadeDB pg_analytics extension that brings DuckDB into Postgres and is designed for high-performance analytics over object storages like S3 and table formats like Iceberg.
Selecting the ParadeDB Stacks in Tembo Cloud

ParadeDB is Search and Analytics for Postgres

ParadeDB is an Elasticsearch alternative built on Postgres. ParadeDB is built as two extensions, pg_search and pg_analytics, which together bring fast full-text search and high-performance analytics to Postgres.

ParadeDB and Tembo Stacks - A match made in heaven

Tembo is on a mission to enable developers to do more (dare we say, everything?) on Postgres. Every time developers bring in a new technologies, it becomes a piece of software that needs to be learned, mastered and maintained. This becomes a huge cost in the form of cognitive overhead for the team in addition to the time and resources it takes to set up, manage and maintain it. The system also gets expensive and much harder to debug due to the sheer number of tools involved.

However, just using a use-case optimized flavor of Postgres, which is already deployed in most organizations is a much better fit. That’s why we created Tembo Stacks - which are Postgres instances with configurations and extensions which optimize Postgres for a particular workload. And ParadeDB does exactly that for search, allowing users to use Postgres instead of Elasticsearch.

ParadeDB is one of the fastest-growing open source database projects on GitHub, with over 50,000 deployments. Their extensions are deployed with some of the largest organizations in the world powering production workloads at scale.

Our teams share a joint vision for the future of the Postgres ecosystem and we couldn’t imagine a better partnership than to combine the Tembo Cloud platform, UX and our operational expertise with ParadeDB’s feature rich extensions. Our joint customers will now be able to access a fully managed version of ParadeDB on AWS directly on the Tembo platform. We’ll soon be adding support for GCP and Azure to our platform too.

ParadeDB Stack

The ParadeDB stack allows you to create efficient full-text search indexes powered by the BM25 algorithm. That helps you get a much more performant and feature rich full-text search experience compared to the Postgres native ts_vector and a much simpler experience compared to adding something as complicated as Elasticsearch to your stack.

To get started, you can simply deploy the ParadeDB stack on Tembo and create indexes using the paradedb.create_bm25 function and run searches via the search_idx.search function. Please refer to the docs for full instructions.


CALL paradedb.create_bm25(
  index_name => 'search_idx',
  schema_name => 'public',
  table_name => 'mock_items',
  key_field => 'id',
  text_fields => paradedb.field('description') || paradedb.field('category'),
  numeric_fields => paradedb.field('rating')
);

SELECT description, rating, category
FROM search_idx.search(
  '(description:keyboard OR category:electronics) AND rating:>2',
  limit_rows => 5
);

Analytics Stack

In today’s world, S3 has become a common destination to store your non-operational data. But, traditionally, it’s been pretty hard for Postgres users to query their S3 data efficiently. However, thanks to Postgres extensibility and DuckDB’s high performance analytical query engine, this can now be done via the pg_analytics extension. This extension allows you to run efficient queries on parquet or Iceberg data stored in object stores like S3.

The Tembo Analytics Stack ships with pg_analytics and other extensions optimized for analytics workloads such as hydra_columnar allowing you to run efficient analytics queries right from Postgres.

To get started, you can simply deploy the Analytics stack on Tembo and create a foreign table which points to your S3 data. Then, you should be able to immediately query the data without needing to define the columns and types for the table.


CREATE FOREIGN TABLE trips ()
  SERVER parquet_server
  OPTIONS (files 's3://tembo-demo-bucket/yellow_tripdata_2024-01.parquet');

SELECT vendorid, passenger_count, trip_distance FROM trips LIMIT 1;

Get started!

Log into cloud.tembo.io and select either the ParadeDB or the Analytics stack to get started. You can try these Stacks under our free trial or jump right into a paid plan. Check out the getting-started guides for both of these Stacks:

We’re excited for you to try the new ParadeDB-powered Stacks - let us know what you think! In upcoming weeks, we will be sharing benchmarks around how ParadeDB makes search and analytics faster for Postgres users.