PostgreSQL as NoSQL

H I M A N C H A L I T E C H L E A D D B E @ I N M O B I PostgreSQL as NoSQL

What’s on the plate today ?  What is the requirement for Schemaless database ?  Data type and volume that has to be handled.  How can we use PostgreSQL to store Schemaless data ?  Performance of PostgreSQL Schemaless option.

What is Schemaless data ?  It doesn’t mean unstructured, but it’s structured over each document  Each Document is a pair of key-values in a hierarchical structure of arrays  The application needs to be aware about this structure and handle it if it’s not getting as expected  Modern Schemaless DBs don’t use SQL and hence are termed NoSQL DB. (confusing term)

Requirement for Schemaless data  Introduction of new reporting data for merchant where dimension is not fixed  Currently there are 15 known fields, where data varies from 5-15 for different merchants  In future , chances of new dimension addition  Data size to start with is 1 million per hour

Issue with current schema DB  Fixed schema definition  Too many null values  Adding new column will need schema changes  May be updates in old rows will be required So, should we move to NoSQL ??

If PostgreSQL can solve it !!  3 different ways provided by PostgreSQL for Document types  XML  hstore  JSON

hstore : Smart contrib module  A hierarchical storage type for PostgreSQL  Key Value store with ACID compliance  Maps String Keys to String Values or other hstore values  Rich in its own functions like..  h -> “a”  h?->”a”  h@>”a->2”  Indexing available : GiST and GIN  Indexes whole hierarchy , not just a key  Expression indexes are also supported

hstore continues…. CREATE TABLE my_store ( id character varying(1024) NOT NULL, doc hstore, CONSTRAINT my_store_pkey PRIMARY KEY (id) ); CREATE INDEX my_store_doc_idx_gist ON my_store USING gist(doc) ; SELECT doc -> ‘text’ as merchant, doc -> ‘created_at’ as created_at FROM my_store WHERE doc @> ‘created_at=>23/12/2013’; SELECT doc -> ‘text’ as merchant, doc -> ‘created_at’ as created_at FROM my_store WHERE doc @> ‘is_active=>:t’ AND doc ? ‘has_address’ ORDER BY doc -> ‘created_at’ DESC; SELECT doc -> ‘text’ as merchant, doc -> ‘created_at’ as created_at FROM my_store WHERE doc @> ‘is_active=>:t’ AND doc ?| ARRAY[‘has_address’, ‘has_payoption’] ;

JSON  Storage type of famous NoSQL DBs like MongoDB , CouchDB  PostgreSQL introduced JSON type in 9.2; Stored as text but with validation for JSON (--To be fixed in 9.4)  Function like row_to_json , array_to_json  But JSON Production and Processing absent  9.3 Came with many new features  to_json(any)  Json_agg(record)  Many Extraction Functions and operators

JSON continues…  JSONB being introduced in 9.4  JSONB Indexing  Tables can be unlogged for further performance increase at the cost of reliability

hstore and json  hstore_to_json(hstore)  hstore_to_json_loose(hstore)

Some numbers..  Data used : id number, other fields text  Record 2 Million , average 64bytes each  Read CSV File, parse into appropriate format , Insert into DB  Write Speed in Records/Sec  hstore : 4600 r/s  hstore(GiST) : 3000 r/s  hstore (GIN) : 1700 r/s  JSON : 4600 r/s  MongoDB : 4000 r/s • DB Size (in MB table/indexes)  hstore : 300/0  hstore(GiST) : 300/71  hstore(GIN) : 300/700  JSON : 300/60  MongoDB : 1800/200

Write Speed 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Records/Second Records/Second

Data Size 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Index(MB) Table(MB)

 Select query on primary key  Fetch Time in Milliseconds  hstore : 320  hstore(GiST) : 190  hstore(GIN) : 180  JSON : 20  MongoDB : 40  Select query on Name (filter for 100 names)  Fetch Time in Milliseconds  hstore : 350  hstore(GiST) : 150  Hstore(GIN) : 140  JSON : 10000  MongoDB : 450

Select Query Fetch on PK 0 50 100 150 200 250 300 350 MilliSeconds

Select Query Fetch on Text 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 MilliSeconds MilliSeconds

Some conclusions..  PostgreSQL can be used as schemaless DB  PostgreSQL’s relational data storage is very efﬁcient.  Build indexes using expressions on common used fields  In hstore GiST index is much more efficient than GIN in our case  GiST/GIN accelerates every field not just the Primary Key  GIN indexes are best for static data because lookups are faster.  For dynamic data, GiST indexes are faster to update.  Specifically, GiST indexes are very good for dynamic data and fast if the number of unique words (lexemes) is under 100,000.

So what you got in PostgreSQL  Use the features of relational DB in your Schemaless world !  Use Constraints  Use Transactions  Use Indexing  Do Joins on keys  And with all these get NoSQL(Schemaless data) requirement fulfilled as well.  Save the new DB migration cost and time  ---  As these are PostgreSQL specific features, migration to other RDBMS not possible.

Questions??? No offence to NoSQL DBs  Thank You !! himamahi09@gmail.com

PostgreSQL as NoSQL

More Related Content

What's hot

Similar to PostgreSQL as NoSQL

Recently uploaded

PostgreSQL as NoSQL