pg_auto_dw is an auto-data warehouse extension to use GenAI to inspect transactional Postgres schemas, and automate the creation of well-formed data warehouse, using Data vault modeling.
Our objective is to help software engineers access high-quality data warehousing when they don’t have access to an expert data architect. I have wanted this badly for years as a dev + data engineer, and my messy data warehouses that usually feature a pile of ingested tables, and ad-hoc derivative tables :)
We’re partnering with our friend Johnathan Pickard from Analyzer1 to iterate with us on this project, as he will transition his previously built automations to be as fully-automated as GenAI makes possible.
But why Postgres for data warehousing?
- Postgres for everything — Postgres has historically been the de-facto operational store—and now people are beginning to use it for analytics and data warehousing.
- Efficient data warehouse schemas - One of the key things for DWH performance is the data structure and transformations going from operations to analytics. There are best practices such as Kimball or Data Vault that many software engineers aren’t highly familiar with.
- Doing transformations in Postgres - If Postgres is your operational store and your analytics store, why not rely on Postgres to do the transformations. With LLMs, we had the hunch that we could do this automatically.
And why Postgres for data warehouse automation?
We do not believe that automating the creation of data warehousing structures should be offloaded and partitioned at the analytic edge. Classically, operational data is offloaded at interval to be processed by a dedicated data team.
This approach has disadvantages. Firstly, it makes operational analytics an afterthought or at least an expensive and time consuming endeavor. Secondly, it creates a very broad data sprawl which directly conflicts with the paradigm of a unified data model.
Postgres stands as the world’s most popular, comprehensive, and extensible open-source database, with its growing trend reinforcing its future-proof status.
How the extension works
If your source tables are well-structured and tables and fields are well-named, constructing a data warehouse can be achieved with a single call to the extension.
SELECT auto_dw.go();
Loading graph...
flowchart LR Start(("Start")) --> ext["Install Extension"] ext -- #10711; --> build["auto_dw.go()"] build -- #10711; --> DW[("DW Created")] DW --> Done(("Done"))
Informative Functions
These functions do not affect the database:
health()
— Understand DW healthsource_tables()
- Understand the status of all tablessource_columns()
— Understand the status of all table columns
Interactive Functions
These functions affect the data warehouse portion of the database:
source_include(object_pattern)
— Add source objectssource_exclude(object_pattern)
— Remove source objectsupdate_context(object, context)
— Provide additional context to fine tune the automationgo(flag, status)
— Initiates the build and dataflows
Join us on the Journey
We are just getting started. Our development process is transparent and open for collaboration on GitHub.
We invite developers, data scientists, and technology enthusiasts to join us. Your insights and contributions are invaluable as we push the boundaries of what is possible.
Stay tuned for upcoming demos and update blog posts related to building and using this extension.