Waiting for PostgreSQL 19 – Add date and timestamp variants of random(min, max).

On 9th of September 2025, Dean Rasheed committed patch:

Add date and timestamp variants of random(min, max).   This adds 3 new variants of the random() function:   random(min date, max date) returns date random(min timestamp, max timestamp) returns timestamp random(min timestamptz, max timestamptz) returns timestamptz   Each returns a random value x in the range min <= x <= max.   Author: Damien Clochard <damien@dalibo.info> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Vik Fearing <vik@postgresfriends.org> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/f524d8cab5914613d9e624d9ce177d3d@dalibo.info

Continue reading Waiting for PostgreSQL 19 – Add date and timestamp variants of random(min, max).

Who logged to system from multiple countries in 2 hours?

Yesterday someone posted a set of queries for interviews, all centered on answering business-like questions from database.

Today this post is hidden behind some “subscribe to read more" thing, so I will not even link it, but one question there caught my eye.

Since I can't copy paste the text, I'll try to write what I remember:

Given table sessions, with columns: user_id, login_time, and country_id, list all cases where single account logged to the system from more than one country within 2 hour time frame.

The idea behind is that it would be a tool to find hacked account, based on idea that you generally can't change country within 2 hours. Which is somewhat true.

Solution in the blogpost suggested joining sessions table with itself, using some inequality condition. I think we can do better…

Continue reading Who logged to system from multiple countries in 2 hours?

Waiting for PostgreSQL 19 – Display Memoize planner estimates in EXPLAIN

On 29th of July 2025, David Rowley committed patch:

Display Memoize planner estimates in EXPLAIN   There've been a few complaints that it can be overly difficult to figure out why the planner picked a Memoize plan. To help address that, here we adjust the EXPLAIN output to display the following additional details:   1) The estimated number of cache entries that can be stored at once 2) The estimated number of unique lookup keys that we expect to see 3) The number of lookups we expect 4) The estimated hit ratio   Technically #4 can be calculated using #1, #2 and #3, but it's not a particularly obvious calculation, so we opt to display it explicitly. The original patch by Lukas Fittl only displayed the hit ratio, but there was a fear that might lead to more questions about how that was calculated. The idea with displaying all 4 is to be transparent which may allow queries to be tuned more easily. For example, if #2 isn't correct then maybe extended statistics or a manual n_distinct estimate can be used to help fix poor plan choices.   Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> Author: Lukas Fittl <lukas@fittl.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g%40mail.gmail.com

Continue reading Waiting for PostgreSQL 19 – Display Memoize planner estimates in EXPLAIN

pg_dump speed across versions

Got interested recently in speed of pg_dump. Specifically, if, over the years, it has became faster, and if yes, how much.

Couple of years ago I was in position where we needed to run pg_dump, and found some inefficiencies, which got later patched. This was around the version 12. So, how does the situation look now? Interestingly, not so great…

Continue reading pg_dump speed across versions

Waiting for PostgreSQL 18 – Add function to get memory context stats for processes

On 8th of April 2025, Daniel Gustafsson committed patch:

Add function to get memory context stats for processes   This adds a function for retrieving memory context statistics and information from backends as well as auxiliary processes. The intended usecase is cluster debugging when under memory pressure or unanticipated memory usage characteristics.   When calling the function it sends a signal to the specified process to submit statistics regarding its memory contexts into dynamic shared memory. Each memory context is returned in detail, followed by a cumulative total in case the number of contexts exceed the max allocated amount of shared memory. Each process is limited to use at most 1Mb memory for this.   A summary can also be explicitly requested by the user, this will return the TopMemoryContext and a cumulative total of all lower contexts.   In order to not block on busy processes the caller specifies the number of seconds during which to retry before timing out. In the case where no statistics are published within the set timeout, the last known statistics are returned, or NULL if no previously published statistics exist. This allows dash- board type queries to continually publish even if the target process is temporarily congested. Context records contain a timestamp to indicate when they were submitted.   Author: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com

Continue reading Waiting for PostgreSQL 18 – Add function to get memory context stats for processes

Waiting for PostgreSQL 18 – Allow NOT NULL constraints to be added as NOT VALID

On 7th of April 2025, Álvaro Herrera committed patch:

Allow NOT NULL constraints to be added as NOT VALID   This allows them to be added without scanning the table, and validating them afterwards without holding access exclusive lock on the table after any violating rows have been deleted or fixed.   Doing ALTER TABLE ... SET NOT NULL for a column that has an invalid not-null constraint validates that constraint. ALTER TABLE .. VALIDATE CONSTRAINT is also supported. There are various checks on whether an invalid constraint is allowed in a child table when the parent table has a valid constraint; this should match what we do for enforced/not enforced constraints.   pg_attribute.attnotnull is now only an indicator for whether a not-null constraint exists for the column; whether it's valid or invalid must be queried in pg_constraint. Applications can continue to query pg_attribute.attnotnull as before, but now it's possible that NULL rows are present in the column even when that's set to true.   For backend internal purposes, we cache the nullability status in CompactAttribute->attnullability that each tuple descriptor carries (replacing CompactAttribute.attnotnull, which was a mirror of Form_pg_attribute.attnotnull). During the initial tuple descriptor creation, based on the pg_attribute scan, we set this to UNRESTRICTED if pg_attribute.attnotnull is false, or to UNKNOWN if it's true; then we update the latter to VALID or INVALID depending on the pg_constraint scan. This flag is also copied when tupledescs are copied.   Comparing tuple descs for equality must also compare the CompactAttribute.attnullability flag and return false in case of a mismatch.   pg_dump deals with these constraints by storing the OIDs of invalid not-null constraints in a separate array, and running a query to obtain their properties. The regular table creation SQL omits them entirely. They are then dealt with in the same way as "separate" CHECK constraints, and dumped after the data has been loaded. Because no additional pg_dump infrastructure was required, we don't bump its version number.   I decided not to bump catversion either, because the old catalog state works perfectly in the new world. (Trying to run with new catalog state and the old server version would likely run into issues, however.)   System catalogs do not support invalid not-null constraints (because commit 14e87ffa5c54 didn't allow them to have pg_constraint rows anyway.)   Author: Rushabh Lathia <rushabh.lathia@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Tested-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAGPqQf0KitkNack4F5CFkFi-9Dqvp29Ro=EpcWt=4_hs-Rt+bQ@mail.gmail.com

Continue reading Waiting for PostgreSQL 18 – Allow NOT NULL constraints to be added as NOT VALID

Waiting for PostgreSQL 18 – Add modern SHA-2 based password hashes to pgcrypto.

On 5th of April 2025, Álvaro Herrera committed patch:

Add modern SHA-2 based password hashes to pgcrypto.   This adapts the publicly available reference implementation on https://www.akkadia.org/drepper/SHA-crypt.txt and adds the new hash algorithms sha256crypt and sha512crypt to crypt() and gen_salt() respectively.   Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/c763235a2757e2f5f9e3e27268b9028349cef659.camel@oopsware.de

Continue reading Waiting for PostgreSQL 18 – Add modern SHA-2 based password hashes to pgcrypto.

Waiting for PostgreSQL 18 – Non text modes for pg_dumpall, correspondingly change pg_restore

On 4th of April 2025, Andrew Dunstan committed patch:

Non text modes for pg_dumpall, correspondingly change pg_restore   pg_dumpall acquires a new -F/--format option, with the same meanings as pg_dump. The default is p, meaning plain text. For any other value, a directory is created containing two files, globals.data and map.dat. The first contains SQL for restoring the global data, and the second contains a map from oids to database names. It will also contain a subdirectory called databases, inside which it will create archives in the specified format, named using the database oids.   In these casess the -f argument is required.   If pg_restore encounters a directory containing globals.dat, and no toc.dat, it restores the global settings and then restores each database.   pg_restore acquires two new options: -g/--globals-only which suppresses restoration of any databases, and --exclude-database which inhibits restoration of particualr database(s) in the same way the same option works in pg_dumpall.   Author: Mahendra Singh Thalor <mahi6run@gmail.com> Co-authored-by: Andrew Dunstan <andrew@dunslane.net> Reviewed-by: jian he <jian.universality@gmail.com> Reviewed-by: Srinath Reddy <srinath2133@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/cb103623-8ee6-4ba5-a2c9-f32e3a4933fa@dunslane.net

Continue reading Waiting for PostgreSQL 18 – Non text modes for pg_dumpall, correspondingly change pg_restore

Waiting for PostgreSQL 18 – Allow json{b}_strip_nulls to remove null array elements

On 5th of March 2025, Andrew Dunstan committed patch:

Allow json{b}_strip_nulls to remove null array elements   An additional paramater ("strip_in_arrays") is added to these functions. It defaults to false. If true, then null array elements are removed as well as null valued object fields. JSON that just consists of a single null is not affected.   Author: Florents Tselai <florents.tselai@gmail.com>   Discussion: https://postgr.es/m/4BCECCD5-4F40-4313-9E98-9E16BEB0B01D@gmail.com

Continue reading Waiting for PostgreSQL 18 – Allow json{b}_strip_nulls to remove null array elements

Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes

On 3rd of March 2025, Tomas Vondra committed patch:

Allow parallel CREATE INDEX for GIN indexes   Allow using parallel workers to build a GIN index, similarly to BTREE and BRIN. For large tables this may result in significant speedup when the build is CPU-bound.   The work is divided so that each worker builds index entries on a subset of the table, determined by the regular parallel scan used to read the data. Each worker uses a local tuplesort to sort and merge the entries for the same key. The TID lists do not overlap (for a given key), which means the merge sort simply concatenates the two lists. The merged entries are written into a shared tuplesort for the leader.   The leader needs to merge the sorted entries again, before writing them into the index. But this way a significant part of the work happens in the workers, and the leader is left with merging fewer large entries, which is more efficient.   Most of the parallelism infrastructure is a simplified copy of the code used by BTREE indexes, omitting the parts irrelevant for GIN indexes (e.g. uniqueness checks).   Original patch by me, with reviews and substantial improvements by Matthias van de Meent, certainly enough to make him a co-author.   Author: Tomas Vondra, Matthias van de Meent Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com

Continue reading Waiting for PostgreSQL 18 – Allow parallel CREATE INDEX for GIN indexes