DEV Community

Dviejo
Dviejo

Posted on • Edited on

Loading data into Postgresql using Nodejs and Minio

I was recently loading a lot of files into PostgreSQL, and i had the need of doing it without many resources and to be fast.

The requirements I had were the following:

  1. Low footprint, not reading all lines of a file at once
  2. Easy to load data
  3. Write data efficiently into Postgres using "COPY TO" sentence

With these requirements, I came up with the idea of doing with streams in NodeJS because:

  1. It's efficient
  2. It's simple

Let's say I have the following file

first_name, last_name, country Karie,Zorn,Ukraine Thebault,Flicker,France Danya,Atcock,Canada Lorne,Westmorland,Russia Page,Greeve,Canada Rene,Riccardini,Serbia Adair,Mullin,Philippines Della,Gumb,China Charlie,Swadlinge,Cambodia Celka,Karlowicz,Canada 

And the following table in PostgreSQL:

CREATE EXTENSION "uuid-ossp"; CREATE TABLE PERSON( id uuid default uuid_generate_v4(), first_name varchar(30) NOT NULL, last_name varchar(30) NOT NULL, country varchar(30) NOT NULL ); 

As we're going to use curl in the PostgreSQL sentence in order to get the file, we're going to extend the Postgresql Docker image with the following:

FROM postgres:10.4 RUN apt-get update && apt-get install -y curl 

I named this file postgresql.Dockerfile, when you have that file, we're going to get all running with the following commands:

docker build -f postgresql.Dockerfile -t postgres-with-curl . docker run --restart always --name postgres --network host -d postgres-with-curl docker run --network host -e MINIO_ACCESS_KEY=user -e MINIO_SECRET_KEY=password --restart always --name minio -d minio/minio:RELEASE.2018-06-22T23-48-46Z server /data 

So let's do it, first we need to create the connections for PostgreSQL and Minio:

const pg = require("pg") const minio = require("minio") const minioClient = new minio.Client({ accessKey: "user", secretKey: "password", secure: false, endPoint: "localhost", port: 9000, region: "us-east-1", }) const pgPool = new pg.Pool({ user: "postgres", password: "postgres", host: "localhost", port: 5432, database: "postgres", }) 

Then we're going to do basically 3 things:

  1. Read the file
  2. Upload to minio
  3. Execute SQL in PostgreSQL instance
const fs = require("fs") const path = require("path") const TABLE_NAME = "public.people" async function main() { const input = fs.createReadStream(path.join(__dirname, "people.csv"), { encoding: "utf-8", }) const bucketName = "tmp" const objectName = "new.csv" await minioClient.putObject(bucketName, objectName, input) const csvUrl = await minioClient.presignedGetObject( bucketName, objectName, 10000, // duration in seconds of the url ) const copyDataSql = ` COPY ${TABLE_NAME} (FIRST_NAME, LAST_NAME, COUNTRY) FROM PROGRAM 'curl "${csvUrl}"' WITH (FORMAT CSV , HEADER, QUOTE '"', DELIMITER ',' );` const client = await pgPool.connect() await client.query(copyDataSql) await client.release() await pgPool.end() } main() 

If we execute the following code in nodejs we should see the rows in the database with the following query:

SELECT * FROM PUBLIC.PEOPLE; 

The script can fail because of the following reasons:

  • Minio is not up
  • Postgresql is not up
  • Postgresql table has not been created
  • Minio bucket "tmp" has not been created

This technique can be useful when we have no control over the disk in postgres and we have to perform a batch operation.

Thanks for reading.

Top comments (0)