PostgreSQL: So powerful, you won’t believe it’s open source Reuven M. Lerner • reuven@lerner.co.il Database 2011 January 13th, 2011
Who am I? • Web developer, software architect, consultant • Linux Journal columnist since 1996 • Mostly Ruby on Rails + PostgreSQL, but also Python, PHP, jQuery, and lots more... • PostgreSQL user since (at least) 1997
Lots of options! • Oracle • Microsoft SQL Server • IBM DB2 • MySQL • PostgreSQL
How do you choose? • Integrity (ACID compliance) • Data types • Functionality • Tools • Extensibility • Documentation • Community
PostgreSQL • Very fast, very scalable. (Just ask Skype.) • Amazingly flexible, easily extensible. • Rock-solid — no crashes, corruption, security issues for years • Ridiculously easy administration • It also happens to be free (MIT/BSD)
Brief history • Ingres (Stonebreaker, Berkeley) • Postgres (Stonebreaker, Berkeley) • PostgreSQL project = Postgres + SQL • About one major release per year • Version 8.x — Windows port, recovery • Version 9.0 — hot replication, upgrades
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
PostgreSQL
Data types • Boolean • Numeric (integer, float, decimal) • (var)char, text (infinitely large), binary • sequences (guaranteed unique) • Date/time and time intervals • IP addresses, XML, enums, arrays
Or create your own!
Or create your own! CREATE TYPE Person AS (first_name TEXT, last_name TEXT);
Or create your own! CREATE TYPE Person AS (first_name TEXT, last_name TEXT);
Or create your own! CREATE TYPE Person AS (first_name TEXT, last_name TEXT); CREATE TABLE Members (group_id INTEGER, member Person);
PostGIS • Some people took this all the way • Want to include geographical information? • No problem — we’ve got PostGIS! • Complete GIS solution, with data types and functions • Keeps pace with main PostgreSQL revisions
Flexible internal types • For example, text/binary storage • Inline or in a separate table (TOAST) • Compressed or not ALTER TABLE Foo ALTER COLUMN mytext SET STORAGE EXTERNAL;
Foreign keys that work CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500);
Foreign keys that work CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey"
Foreign keys that work CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores".
Foreign keys that work CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores". ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey"
Foreign keys that work CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores". ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores".
Custom validity checks CREATE TABLE DVDs (id SERIAL, title TEXT check (length(title) > 3), store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('AB', 500);
Custom validity checks CREATE TABLE DVDs (id SERIAL, title TEXT check (length(title) > 3), store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('AB', 500); ERROR: new row for relation "dvds" violates check constraint "dvds_title_check"
Timestamp vs. Interval testdb=# select now(); now ------------------------------- 2010-10-31 08:58:23.365792+02 (1 row) Point in time testdb=# select now() - interval '3 days'; ?column? ------------------------------- 2010-10-28 08:58:28.870011+02 Difference between (1 row) points in time
Built-in functions • Math • Text processing (including regexps) • Date/time calculations • Conditionals (CASE, COALESCE, NULLIF) for use in queries • Extensive library of geometrical functions
Or write your own! • PL/pgSQL • PL/Perl • PL/Python • PL/Ruby • PL/R • PL/Tcl
Triggers • Write a function, and use it as a trigger • The function can run before or after inserts, deletes, or updates • What do they do? • Stop certain actions from happening • Change the data before it’s really used
Windowing functions • Calculate aggregate amounts on each row, taking other rows into consideration • For example: Rank each person’s salary within their department
Smart indexing • Functional indexes CREATE UNIQUE INDEX on People(lower(email_address)); • Conditional indexes CREATE INDEX on People WHERE is_admin = true; • Background (concurrent) indexing
Transactions • In PostgreSQL from the beginning • Use transactions for just about anything: BEGIN DROP TABLE DVDs; ROLLBACK; SELECT * FROM DVDs; -- Works!
Savepoints (or, sub-transactions) BEGIN; INSERT INTO table1 VALUES (1); SAVEPOINT my_savepoint; INSERT INTO table1 VALUES (2); ROLLBACK TO SAVEPOINT my_savepoint; INSERT INTO table1 VALUES (3); COMMIT;
MVCC • Readers and writers don’t block each other • “Multi-version concurrency control” • xmin, xmax on each tuple; rows are those tuples with txid_current between them • Old versions stick around until vacuumed • Autovacuum removes even this issue
Full-text indexing • Built-in, Unicode-aware full-text indexing has been in PostgreSQL for a few years • It’s simply a different type of index, as well as a different search operator (@@) • Custom dictionaries for different languages, indicating stop words and stemming rules
CTEs • Adds a “WITH” statement, which defines a sorta-kinda temp table • You can then query that same temp table • Makes many queries easier to read, write, without a real temp table • Better yet: CTEs can be recursive, for everything from Fibonacci to org charts
WITH regional_sales AS ( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
WITH regional_sales AS ( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
WITH regional_sales AS ( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
WITH regional_sales AS ( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
WITH regional_sales AS ( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
WITH regional_sales AS ( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
WITH regional_sales AS ( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
PITR • Store WALs (write-ahead logs) to disk • Normally, the server will apply all WALs when it starts up • By choosing which WALs to apply, you can bring the database up to date until a certain point in time
Streaming replication • In 8.4, you could use “log shipping” to send the WALs from one server to another • When the primary server went down, you could bring up this secondary • In 9.0, this can happen over the network • Faster, more reliable, and easier to configure than previously
Hot standby • In 8.4, the secondary server could not answer queries when it was receiving WALs from the primary • 9.0 introduced “hot standby”: Secondary server can answer read-only queries • Can help with balancing certain queries
It’s all in the tables • Databases are good at working with tables • So PostgreSQL stores info in tables • Configuration parameters • Objects (tables, indexes, functions, etc.) • Statistics for the optimizer • Look through pg_catalog ... and learn!
Client libraries • libpq (in C) • Java (JDBC) • Others by 3rd • .NET (npgsql) parties: • ODBC • Python • JavaScript (!) • Ruby • Just about any language you can • Perl imagine
Want to learn more? • Mailing lists, wikis, and blogs • All at http://postgresql.org/ • http://planetpostgresql.org • PostgreSQL training, consulting, development, hand-holding, and general encouragement
Thanks! (Any questions?) reuven@lerner.co.il http://www.lerner.co.il/ 054-496-8405 “reuvenlerner” on Skype/AIM

PostgreSQL talk, Database 2011 conference

  • 1.
    PostgreSQL: So powerful,you won’t believe it’s open source Reuven M. Lerner • reuven@lerner.co.il Database 2011 January 13th, 2011
  • 2.
    Who am I? •Web developer, software architect, consultant • Linux Journal columnist since 1996 • Mostly Ruby on Rails + PostgreSQL, but also Python, PHP, jQuery, and lots more... • PostgreSQL user since (at least) 1997
  • 3.
    Lots of options! •Oracle • Microsoft SQL Server • IBM DB2 • MySQL • PostgreSQL
  • 4.
    How do youchoose? • Integrity (ACID compliance) • Data types • Functionality • Tools • Extensibility • Documentation • Community
  • 5.
    PostgreSQL • Very fast,very scalable. (Just ask Skype.) • Amazingly flexible, easily extensible. • Rock-solid — no crashes, corruption, security issues for years • Ridiculously easy administration • It also happens to be free (MIT/BSD)
  • 6.
    Brief history • Ingres(Stonebreaker, Berkeley) • Postgres (Stonebreaker, Berkeley) • PostgreSQL project = Postgres + SQL • About one major release per year • Version 8.x — Windows port, recovery • Version 9.0 — hot replication, upgrades
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    Data types • Boolean •Numeric (integer, float, decimal) • (var)char, text (infinitely large), binary • sequences (guaranteed unique) • Date/time and time intervals • IP addresses, XML, enums, arrays
  • 14.
  • 15.
    Or create yourown! CREATE TYPE Person AS (first_name TEXT, last_name TEXT);
  • 16.
    Or create yourown! CREATE TYPE Person AS (first_name TEXT, last_name TEXT);
  • 17.
    Or create yourown! CREATE TYPE Person AS (first_name TEXT, last_name TEXT); CREATE TABLE Members (group_id INTEGER, member Person);
  • 18.
    PostGIS • Some peopletook this all the way • Want to include geographical information? • No problem — we’ve got PostGIS! • Complete GIS solution, with data types and functions • Keeps pace with main PostgreSQL revisions
  • 19.
    Flexible internal types •For example, text/binary storage • Inline or in a separate table (TOAST) • Compressed or not ALTER TABLE Foo ALTER COLUMN mytext SET STORAGE EXTERNAL;
  • 20.
    Foreign keys thatwork CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500);
  • 21.
    Foreign keys thatwork CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey"
  • 22.
    Foreign keys thatwork CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores".
  • 23.
    Foreign keys thatwork CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores". ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey"
  • 24.
    Foreign keys thatwork CREATE TABLE DVDs (id SERIAL, title TEXT, store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('Attack of the Killer Tomatoes', 500); ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores". ERROR: insert or update on table "dvds" violates foreign key constraint "dvds_store_id_fkey" DETAIL: Key (store_id)=(500) is not present in table "stores".
  • 25.
    Custom validity checks CREATETABLE DVDs (id SERIAL, title TEXT check (length(title) > 3), store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('AB', 500);
  • 26.
    Custom validity checks CREATETABLE DVDs (id SERIAL, title TEXT check (length(title) > 3), store_id INTEGER REFERENCES Stores); INSERT INTO DVDs (title, store_id) VALUES ('AB', 500); ERROR: new row for relation "dvds" violates check constraint "dvds_title_check"
  • 27.
    Timestamp vs. Interval testdb=#select now(); now ------------------------------- 2010-10-31 08:58:23.365792+02 (1 row) Point in time testdb=# select now() - interval '3 days'; ?column? ------------------------------- 2010-10-28 08:58:28.870011+02 Difference between (1 row) points in time
  • 28.
    Built-in functions • Math •Text processing (including regexps) • Date/time calculations • Conditionals (CASE, COALESCE, NULLIF) for use in queries • Extensive library of geometrical functions
  • 29.
    Or write yourown! • PL/pgSQL • PL/Perl • PL/Python • PL/Ruby • PL/R • PL/Tcl
  • 30.
    Triggers • Write afunction, and use it as a trigger • The function can run before or after inserts, deletes, or updates • What do they do? • Stop certain actions from happening • Change the data before it’s really used
  • 31.
    Windowing functions • Calculateaggregate amounts on each row, taking other rows into consideration • For example: Rank each person’s salary within their department
  • 32.
    Smart indexing • Functionalindexes CREATE UNIQUE INDEX on People(lower(email_address)); • Conditional indexes CREATE INDEX on People WHERE is_admin = true; • Background (concurrent) indexing
  • 33.
    Transactions • In PostgreSQLfrom the beginning • Use transactions for just about anything: BEGIN DROP TABLE DVDs; ROLLBACK; SELECT * FROM DVDs; -- Works!
  • 34.
    Savepoints (or, sub-transactions) BEGIN; INSERT INTOtable1 VALUES (1); SAVEPOINT my_savepoint; INSERT INTO table1 VALUES (2); ROLLBACK TO SAVEPOINT my_savepoint; INSERT INTO table1 VALUES (3); COMMIT;
  • 35.
    MVCC • Readers andwriters don’t block each other • “Multi-version concurrency control” • xmin, xmax on each tuple; rows are those tuples with txid_current between them • Old versions stick around until vacuumed • Autovacuum removes even this issue
  • 36.
    Full-text indexing • Built-in,Unicode-aware full-text indexing has been in PostgreSQL for a few years • It’s simply a different type of index, as well as a different search operator (@@) • Custom dictionaries for different languages, indicating stop words and stemming rules
  • 37.
    CTEs • Adds a“WITH” statement, which defines a sorta-kinda temp table • You can then query that same temp table • Makes many queries easier to read, write, without a real temp table • Better yet: CTEs can be recursive, for everything from Fibonacci to org charts
  • 38.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 39.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 40.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 41.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 42.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 43.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 44.
    WITH regional_sales AS( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
  • 45.
    PITR • Store WALs(write-ahead logs) to disk • Normally, the server will apply all WALs when it starts up • By choosing which WALs to apply, you can bring the database up to date until a certain point in time
  • 46.
    Streaming replication • In8.4, you could use “log shipping” to send the WALs from one server to another • When the primary server went down, you could bring up this secondary • In 9.0, this can happen over the network • Faster, more reliable, and easier to configure than previously
  • 47.
    Hot standby • In8.4, the secondary server could not answer queries when it was receiving WALs from the primary • 9.0 introduced “hot standby”: Secondary server can answer read-only queries • Can help with balancing certain queries
  • 48.
    It’s all inthe tables • Databases are good at working with tables • So PostgreSQL stores info in tables • Configuration parameters • Objects (tables, indexes, functions, etc.) • Statistics for the optimizer • Look through pg_catalog ... and learn!
  • 49.
    Client libraries • libpq(in C) • Java (JDBC) • Others by 3rd • .NET (npgsql) parties: • ODBC • Python • JavaScript (!) • Ruby • Just about any language you can • Perl imagine
  • 50.
    Want to learnmore? • Mailing lists, wikis, and blogs • All at http://postgresql.org/ • http://planetpostgresql.org • PostgreSQL training, consulting, development, hand-holding, and general encouragement
  • 51.
    Thanks! (Any questions?) reuven@lerner.co.il http://www.lerner.co.il/ 054-496-8405 “reuvenlerner” on Skype/AIM