โ About
So you're both a - maybe early - PostgreSQL fan... and a recent duckdb
adopter.
You like both databases for their strengths and ecosystems... and wonder how it would be
possible to seamlessly send data from/to each other databases... without having to code anything, I mean nothing more that playing
sql shell
commands in a terminal
: no Python, no Java,...
๐ Well this is exactly what I will talk about in this post thanks to :
๐ค ... but why this post ?
The 3 main reasons of this article and why to pull/put from PostgreSQL
/duckdb
at this point are :
DuckDB | PostgreSQL | |
---|---|---|
Database Model | Columnar database | Relational database |
License | MIT | BSD like |
Serverless | Yes | No |
๐ฏ What we'll do
We will, only from terminal :
- ๐ Install & boot a containerized PostgreSQL database (with Podman)
- ๐ Create a database
- ๐ Create and feed a little table
- ๐ช Read the
psql
table fromduckdb
- ๐๏ธ Export the
psql
table to a parquet file - ๐ฌ Inspect
parquet
file withparquet-cli
Also we will do the reverse move :
- Create a table in PostgreSQL from within
duckdb
- Test table contents from
sql
๐ฟ Demo
๐ shell
scripts
Install & boot a PostgreSQL instance:
export PGPASSWORD=mysecretpassword # Boot a postgresql instance podman run --name postgres -e POSTGRES_PASSWORD=$PGPASSWORD -d\ -p 5432:5432 docker.io/library/postgres # Check container status podman ps -a
Install psql
so we can reach PostgrSQL from outside the contenair:
# Install `psql` on the host so the database can be accessed # from outside de container sudo apt install -y postgresql-client
Now, create some PostgreSQL objects:
# Create a demo database psql -h localhost -p 5432 -U postgres -c "CREATE DATABASE demo;" # Create a table psql -h localhost -p 5432 -U postgres -d demo\ -c "CREATE table customers(id varchar primary key);" # Feed the PostgreSQL table with some data psql -h localhost -p 5432 -U postgres -d demo\ -c "insert into customers values \ ('Duffy duck'),\ ('Daisy Duck'),\ ('Donald Duck'),\ ('Ludwig Von Drake');"
Install duckdb
:
# (Quick and dirty) duckdb install wget https://github.com/duckdb/duckdb/releases/download/v1.0.0/duckdb_cli-linux-amd64.zip unzip duckdb_cli-linux-amd64.zip cp duckdb /usr/bin/ rm duckdb duckdb_cli-linux-amd64.zip
Now do the fun stuff...
Reach PostgreSQL database from duckdb
Let's reach postgres database from duckdb
:
duckdb -c "ATTACH 'dbname=demo user=postgres password=mysecretpassword host=127.0.0.1'\ AS db (TYPE POSTGRES, READ_ONLY); show all tables; select * from db.customers; COPY db.customers TO 'db.customers.parquet' (FORMAT PARQUET);"
... then check the output parquet
file:
ls -ltr file db.customers.parquet
... and read the resulting parquet
file from duckdb
:
duckdb -c "select * from 'db.customers.parquet';"
Test resulting parquet
file with parquet-cli
:
pip install parquet-cli
parq -h
parq db.customers.parquet --count
parq db.customers.parquet --head
parq db.customers.parquet --tail
duckdb
โก๏ธ PostgreSQL
Let's:
- "Attach" the remote PostgreSQL instance from
duckdb
runtime - Create a table
- Feed the table
- Select table contents from
psql
duckdb -c "ATTACH 'dbname=demo user=postgres password=mysecretpassword host=127.0.0.1'</span>
AS db (TYPE POSTGRES);</span>
create table db.heroes(name varchar primary key);</span>
insert into db.heroes values</span>
('Dumbo'),</span>
('Man-Elephant'),</span>
('Tantra'),</span>
('Elephant Man'),</span>
('The Elephantmen'),</span>
('Mammomax') ;
"
psql -h localhost -p 5432 -U postgres -d demo</span>
-c "select * from heroes;"
โ๏ธ More about DuckDB
vs PostgreSQL
See below this very synthetic breakdown from influxdata:
Top comments (2)