I was recently loading a lot of files into PostgreSQL, and i had the need of doing it without many resources and to be fast.
The requirements I had were the following:
- Low footprint, not reading all lines of a file at once
- Easy to load data
- Write data efficiently into Postgres using "COPY TO" sentence
With these requirements, I came up with the idea of doing with streams in NodeJS because:
- It's efficient
- It's simple
Let's say I have the following file
first_name, last_name, country Karie,Zorn,Ukraine Thebault,Flicker,France Danya,Atcock,Canada Lorne,Westmorland,Russia Page,Greeve,Canada Rene,Riccardini,Serbia Adair,Mullin,Philippines Della,Gumb,China Charlie,Swadlinge,Cambodia Celka,Karlowicz,Canada
And the following table in PostgreSQL:
CREATE EXTENSION "uuid-ossp"; CREATE TABLE PERSON( id uuid default uuid_generate_v4(), first_name varchar(30) NOT NULL, last_name varchar(30) NOT NULL, country varchar(30) NOT NULL );
As we're going to use curl in the PostgreSQL sentence in order to get the file, we're going to extend the Postgresql Docker image with the following:
FROM postgres:10.4 RUN apt-get update && apt-get install -y curl
I named this file postgresql.Dockerfile, when you have that file, we're going to get all running with the following commands:
docker build -f postgresql.Dockerfile -t postgres-with-curl . docker run --restart always --name postgres --network host -d postgres-with-curl docker run --network host -e MINIO_ACCESS_KEY=user -e MINIO_SECRET_KEY=password --restart always --name minio -d minio/minio:RELEASE.2018-06-22T23-48-46Z server /data
So let's do it, first we need to create the connections for PostgreSQL and Minio:
const pg = require("pg") const minio = require("minio") const minioClient = new minio.Client({ accessKey: "user", secretKey: "password", secure: false, endPoint: "localhost", port: 9000, region: "us-east-1", }) const pgPool = new pg.Pool({ user: "postgres", password: "postgres", host: "localhost", port: 5432, database: "postgres", })
Then we're going to do basically 3 things:
- Read the file
- Upload to minio
- Execute SQL in PostgreSQL instance
const fs = require("fs") const path = require("path") const TABLE_NAME = "public.people" async function main() { const input = fs.createReadStream(path.join(__dirname, "people.csv"), { encoding: "utf-8", }) const bucketName = "tmp" const objectName = "new.csv" await minioClient.putObject(bucketName, objectName, input) const csvUrl = await minioClient.presignedGetObject( bucketName, objectName, 10000, // duration in seconds of the url ) const copyDataSql = ` COPY ${TABLE_NAME} (FIRST_NAME, LAST_NAME, COUNTRY) FROM PROGRAM 'curl "${csvUrl}"' WITH (FORMAT CSV , HEADER, QUOTE '"', DELIMITER ',' );` const client = await pgPool.connect() await client.query(copyDataSql) await client.release() await pgPool.end() } main()
If we execute the following code in nodejs we should see the rows in the database with the following query:
SELECT * FROM PUBLIC.PEOPLE;
The script can fail because of the following reasons:
- Minio is not up
- Postgresql is not up
- Postgresql table has not been created
- Minio bucket "tmp" has not been created
This technique can be useful when we have no control over the disk in postgres and we have to perform a batch operation.
Thanks for reading.
Top comments (0)