Posted on Feb 6, 2021 • Edited on Feb 8, 2021

Compare table structures in two schemata

Bugs are the bane of programmers' existence. But like death and taxes... well, death, anyway*; you can't get away from them entirely. This is especially true of large application endeavors including those with multiple schemata, which is what I'll be discussing here.

So, our application is using multiple tenant schemata for our clients and the structure must match. We encountered a situation that possibly could have included checking table structures between our template schema and a customer schema. This led to the development of this query.

The query utilizes the information_schema which should be present in most RDBMS engines, so this should be fairly portable.

To use this query, you'll need to set two to three parameters (depending on the number of tables you wish to compare).

Let's now examine the query:

with check_params as ( select 'my-schema-to-check' as check_schema, -- put schema to check here! 'my-known-good-schema' as main_schema, -- put the known good schema here null::text as check_table -- put table name to check here (or null for all)! ), display_table as ( select coalesce(mn.table_name, ck.table_name) as table_name, case when mn.table_schema is null then ck.table_name || ' <null>' when ck.table_schema is null then 'null ' || ' <' || mn.table_name || '>' else mn.table_name end::text as disp_table_name from ( select table_schema, table_name from information_schema.tables cross join check_params cp where table_schema = cp.main_schema ) as mn full join ( select table_schema, table_name from information_schema.tables cross join check_params cp where table_schema = cp.check_schema ) as ck on ck.table_name = mn.table_name ), full_table_compare as ( select coalesce(t.check_schema, c.check_schema) || ' <' || coalesce(t.main_schema, c.main_schema) || '>' as "schema_name", (select dt.disp_table_name from display_table dt where dt.table_name = coalesce(t.table_name, c.table_name)) as "table_name", case when c.column_name is distinct from t.column_name then coalesce(c.column_name, 'null') || ' <' || coalesce(t.column_name, 'null') || '>' else t.column_name end::text as "column_name", case when c.udt_name is distinct from t.udt_name then coalesce(c.udt_name, 'null') || ' <' || coalesce(t.udt_name, 'null') || '>' else t.udt_name end::text as column_type, case when c.character_maximum_length is distinct from t.character_maximum_length then coalesce(c.character_maximum_length, 'null') || ' <' || coalesce(t.character_maximum_length, 'null') || '>' else t.character_maximum_length end::text as text_length, case when c.numeric_precision is distinct from t.numeric_precision then coalesce(c.numeric_precision, 'null') || ' <' || coalesce(t.numeric_precision, 'null') || '>' else t.numeric_precision end::text as precision, case when c.numeric_scale is distinct from t.numeric_scale then coalesce(c.numeric_scale, 'null') || ' <' || coalesce(t.numeric_scale, 'null') || '>' else t.numeric_scale end::text as scale, case when c.is_nullable is distinct from t.is_nullable then coalesce(c.is_nullable, 'null') || ' <' || coalesce(t.is_nullable, 'null') || '>' else t.is_nullable end::text as is_nullable, case when c.column_default is distinct from t.column_default then coalesce(c.column_default, 'null') || ' <' || coalesce(t.column_default, 'null') || '>' else t.column_default end::text as col_default from ( -- Get info from the "good" schema select cl.table_schema, cl.table_name, cl.column_name, cl.udt_name, coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length, coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision, coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale, coalesce(cl.is_nullable::text, 'null')::text as is_nullable, regexp_replace(cl.column_default, cp.main_schema || '\.', '', 'g') as column_default, cp.check_schema, cp.main_schema from information_schema.columns cl cross join check_params cp where cl.table_schema = cp.main_schema and cl.table_name = coalesce(cp.check_table, cl.table_name) ) as t full join ( -- Get info from "suspect" schema select cl.table_schema, cl.table_name, cl.column_name, cl.udt_name, coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length, coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision, coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale, coalesce(cl.is_nullable::text, 'null')::text as is_nullable, regexp_replace(cl.column_default, cp.check_schema || '\.', '', 'g') as column_default, cp.check_schema, cp.main_schema from information_schema.columns cl cross join check_params cp where cl.table_schema = cp.check_schema and cl.table_name = coalesce(cp.check_table, table_name) ) as c on c.table_name = t.table_name and c.column_name = t.column_name ) select schema_name, table_name, column_name, column_type, text_length, precision, scale, is_nullable, col_default from full_table_compare -- only report the differences where table_name ~ '<' or column_name ~ '<' or column_type ~ '<' or text_length ~ '<' or precision ~ '<' or scale ~ '<' or is_nullable ~ '<' or col_default ~ '<' order by table_name, column_name;

Wow.

Let's break it down.

check_params

with check_params as ( select 'my-schema-to-check' as check_schema, -- put schema to check here! 'my-known-good-schema' as main_schema, -- put the known good schema here null::text as check_table -- put table name to check here (or null for all)! )

This is the CTE that will have our parameters. This was created so that we wouldn't have to attempt to try script variables and to only have to define the values once. Just replace the strings with the requisite schema names to compare all tables in both schemas. Replace the null::text for the check_table param with a table name to constrain to that one table.

display_table

display_table as ( select coalesce(mn.table_name, ck.table_name) as table_name, case when mn.table_schema is null then ck.table_name || ' <null>' when ck.table_schema is null then 'null ' || ' <' || mn.table_name || '>' else mn.table_name end::text as disp_table_name from ( select table_schema, table_name from information_schema.tables cross join check_params cp where table_schema = cp.main_schema ) as mn full join ( select table_schema, table_name from information_schema.tables cross join check_params cp where table_schema = cp.check_schema ) as ck on ck.table_name = mn.table_name )

This CTE is designed to help with output. Because we are comparing if a table does or does not exist as well as its column existing or not, we have an issue. The table name is replicated on the information_schema.columns record for each column. So we want to display it correctly regardless of presence/absence of the table itself or for a column mismatch.

Why a full join? We want to compare both ways.

full_table_compare

full_table_compare as ( select coalesce(t.check_schema, c.check_schema) || ' <' || coalesce(t.main_schema, c.main_schema) || '>' as "schema_name", (select dt.disp_table_name from display_table dt where dt.table_name = coalesce(t.table_name, c.table_name)) as "table_name", case when c.column_name is distinct from t.column_name then coalesce(c.column_name, 'null') || ' <' || coalesce(t.column_name, 'null') || '>' else t.column_name end::text as "column_name", case when c.udt_name is distinct from t.udt_name then coalesce(c.udt_name, 'null') || ' <' || coalesce(t.udt_name, 'null') || '>' else t.udt_name end::text as column_type, case when c.character_maximum_length is distinct from t.character_maximum_length then coalesce(c.character_maximum_length, 'null') || ' <' || coalesce(t.character_maximum_length, 'null') || '>' else t.character_maximum_length end::text as text_length, case when c.numeric_precision is distinct from t.numeric_precision then coalesce(c.numeric_precision, 'null') || ' <' || coalesce(t.numeric_precision, 'null') || '>' else t.numeric_precision end::text as precision, case when c.numeric_scale is distinct from t.numeric_scale then coalesce(c.numeric_scale, 'null') || ' <' || coalesce(t.numeric_scale, 'null') || '>' else t.numeric_scale end::text as scale, case when c.is_nullable is distinct from t.is_nullable then coalesce(c.is_nullable, 'null') || ' <' || coalesce(t.is_nullable, 'null') || '>' else t.is_nullable end::text as is_nullable, case when c.column_default is distinct from t.column_default then coalesce(c.column_default, 'null') || ' <' || coalesce(t.column_default, 'null') || '>' else t.column_default end::text as col_default from ( -- Get info from the "good" schema select cl.table_schema, cl.table_name, cl.column_name, cl.udt_name, coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length, coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision, coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale, coalesce(cl.is_nullable::text, 'null')::text as is_nullable, regexp_replace(cl.column_default, cp.main_schema || '\.', '', 'g') as column_default, cp.check_schema, cp.main_schema from information_schema.columns cl cross join check_params cp where cl.table_schema = cp.main_schema and cl.table_name = coalesce(cp.check_table, cl.table_name) ) as t full join ( -- Get info from "suspect" schema select cl.table_schema, cl.table_name, cl.column_name, cl.udt_name, coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length, coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision, coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale, coalesce(cl.is_nullable::text, 'null')::text as is_nullable, regexp_replace(cl.column_default, cp.check_schema || '\.', '', 'g') as column_default, cp.check_schema, cp.main_schema from information_schema.columns cl cross join check_params cp where cl.table_schema = cp.check_schema and cl.table_name = coalesce(cp.check_table, table_name) ) as c on c.table_name = t.table_name and c.column_name = t.column_name )

So there's two main parts to this query.

schema information

 select cl.table_schema, cl.table_name, cl.column_name, cl.udt_name, coalesce(cl.character_maximum_length::text, 'null')::text as character_maximum_length, coalesce(cl.numeric_precision::text, 'null')::text as numeric_precision, coalesce(cl.numeric_scale::text, 'null')::text as numeric_scale, coalesce(cl.is_nullable::text, 'null')::text as is_nullable, regexp_replace(cl.column_default, cp.main_schema || '\.', '', 'g') as column_default, cp.check_schema, cp.main_schema from information_schema.columns cl cross join check_params cp where cl.table_schema = cp.main_schema and cl.table_name = coalesce(cp.check_table, cl.table_name)

We need to get specific information from the information_schema regarding the columns for the target table(s) in the main or good or master or template schema (whatever you with to call it). This is by far not exhaustive, but it's a good place to start for basic structure comparisons. We also need to get the same information for the target, suspect, bad, or check schema (again, whatever you with to call it).

This information is:

table_schema : Namespace of the table
table_name : Name of the table
column_name : Name of the column
udt_name : Name of the (internal) data type of the column
character_maximum_length : Len of a char or varchar column. This will be null for a text type column.
numeric_precision : Precision of a numeric/decimal column
numeric_scale : Scale of a numeric/decimal column
is_nullable : YES if it can contain null else NO
column_default : The default value for the column

Some operations are done at these queries to make the comparison logic easier such as type changes or string scrubbing.

Full joins are again used to allow for bi-directional comparisons.

comparison logic

select coalesce(t.check_schema, c.check_schema) || ' <' || coalesce(t.main_schema, c.main_schema) || '>' as "schema_name", (select dt.disp_table_name from display_table dt where dt.table_name = coalesce(t.table_name, c.table_name)) as "table_name", case when c.column_name is distinct from t.column_name then coalesce(c.column_name, 'null') || ' <' || coalesce(t.column_name, 'null') || '>' else t.column_name end::text as "column_name", case when c.udt_name is distinct from t.udt_name then coalesce(c.udt_name, 'null') || ' <' || coalesce(t.udt_name, 'null') || '>' else t.udt_name end::text as column_type, case when c.character_maximum_length is distinct from t.character_maximum_length then coalesce(c.character_maximum_length, 'null') || ' <' || coalesce(t.character_maximum_length, 'null') || '>' else t.character_maximum_length end::text as text_length, case when c.numeric_precision is distinct from t.numeric_precision then coalesce(c.numeric_precision, 'null') || ' <' || coalesce(t.numeric_precision, 'null') || '>' else t.numeric_precision end::text as precision, case when c.numeric_scale is distinct from t.numeric_scale then coalesce(c.numeric_scale, 'null') || ' <' || coalesce(t.numeric_scale, 'null') || '>' else t.numeric_scale end::text as scale, case when c.is_nullable is distinct from t.is_nullable then coalesce(c.is_nullable, 'null') || ' <' || coalesce(t.is_nullable, 'null') || '>' else t.is_nullable end::text as is_nullable, case when c.column_default is distinct from t.column_default then coalesce(c.column_default, 'null') || ' <' || coalesce(t.column_default, 'null') || '>' else t.column_default end::text as col_default

The rubber meets the road here. This is where all of the comparisons of the records from the two schemata are done. Basically, it checks for differences and reports them in a specified manner. Any difference will be reported as
suspect_schema_value <good_schema_value> meaning that the "good" values will be surrounded by < and > characters.

If a column has a null value it will appear as null or <null> depending if the value originated from the "suspect" or "good" schema.

If the values are identical, then there will only be the plaintext value listed as no other indicators are necessary.

The use of IS DISTINCT FROM was crucial to simplify the comparisons. This is an operator that evaluates null as if it were a value instead of, well... null. So if we consider null to have a value then x IS DISTINCT FROM y would be similar to coalesce(x, 'x') != coalesce(y, 'y') and x IS NOT DISTINCT FROM y would be equivalent to coalesce(x, 'x') != coalesce(y, 'y'). Check out the document link for the official explanation.

assembly

select schema_name, table_name, column_name, column_type, text_length, precision, scale, is_nullable, col_default from full_table_compare -- only report the differences where table_name ~ '<' or column_name ~ '<' or column_type ~ '<' or text_length ~ '<' or precision ~ '<' or scale ~ '<' or is_nullable ~ '<' or col_default ~ '<' order by table_name, column_name;

This is a lot of data and, on a good day, it will be mostly if not all in sync. So we're really only interested in the differences. This final query only pulls the records from the output of the full_table_compare CTE but only those records where any column contains a < character denoting a difference. We're checking every column except schema_name as it will always have the format suspect_schema <good_schema> as an indicator of the origin of reported values.

Demo time!

Let's start by creating two schemata:

postgres=# create schema eep; CREATE SCHEMA postgres=# create schema opp; CREATE SCHEMA

Now we'll create some sample tables to play with.

postgres=# create table eep.ork postgres-# ( postgres(# id bigserial primary key, postgres(# label text, postgres(# created_ts timestamptz, postgres(# data jsonb postgres(# ); CREATE TABLE postgres=# create table eep.oof postgres-# ( postgres(# uniq_label text primary key postgres(# ); CREATE TABLE postgres=# create table opp.ork postgres-# ( postgres(# id bigserial primary key, postgres(# label text, postgres(# created_ts timestamptz, postgres(# data jsonb postgres(# ); CREATE TABLE postgres=# create table opp.ah_ah postgres-# ( postgres(# id uuid primary key, postgres(# authentication_header jsonb postgres(# ); CREATE TABLE

We'll change our query parameters to say the eep schema is "good" and opp is "suspect". We'll also set it to only look at the ork table.

with check_params as ( select 'opp' as check_schema, -- put schema to check here! 'eep' as main_schema, -- put the known good schema here 'ork'::text as check_table -- put table name to check here (or null for all)! ),

Running the query, we see the following output:

 schema_name | table_name | column_name | column_type | text_length | precision | scale | is_nullable | col_default -------------+------------+-------------+-------------+-------------+-----------+-------+-------------+------------- (0 rows)

素晴らしい! No differences!

Let's expand to all of the tables.

with check_params as ( select 'opp' as check_schema, -- put schema to check here! 'eep' as main_schema, -- put the known good schema here null::text as check_table -- put table name to check here (or null for all)! ),

Running the query, we now see:

 schema_name | table_name | column_name | column_type | text_length | precision | scale | is_nullable | col_default -------------+--------------+------------------------------+--------------+-------------+-------------+-------------+-------------+------------- opp <eep> | ah_ah <null> | authentication_header <null> | jsonb <null> | null <null> | null <null> | null <null> | YES <null> | opp <eep> | ah_ah <null> | id <null> | uuid <null> | null <null> | null <null> | null <null> | YES <null> | opp <eep> | null <oof> | null <uniq_label> | null <text> | null <null> | null <null> | null <null> | null <NO> | (3 rows)

So, let's examine the output.

First, there's the schema_name column. This shows the suspect schema followed by the good schema (surrounded by < and > chars).

Next we have the table_name column. Note there there is an ah_ah table that is in the opp schema, but not in the eep schema. Also, tere is an oof table that is in the eep schema that is not in the opp schema.

The rest of the columns show the differences at each column: type, length, precision, scale, nullable, default.

So we can tell missing tables, but how about actual differences?

Let's change opp.ork a bit.

Change the label from text to varchar(256)

postgres=# alter table opp.ork alter column label set data type varchar(256); ALTER TABLE

Add a new column updated_ts

postgres=# alter table opp.ork add column updated_ts timestamptz; ALTER TABLE

Just for fun, let's also alter eep.ork.

Add an audited column

postgres=# alter table eep.ork add column audited boolean not null default false; ALTER TABLE

Now let's constrain the query back to only the ork table:

with check_params as ( select 'opp' as check_schema, -- put schema to check here! 'eep' as main_schema, -- put the known good schema here 'ork'::text as check_table -- put table name to check here (or null for all)! ),

Now, running the query, we see:

 schema_name | table_name | column_name | column_type | text_length | precision | scale | is_nullable | col_default -------------+------------+-------------------+--------------------+-------------+-------------+-------------+-------------+-------------- opp <eep> | ork | label | varchar <text> | 256 <null> | null | null | YES | opp <eep> | ork | null <audited> | null <bool> | null <null> | null <null> | null <null> | null <NO> | null <false> opp <eep> | ork | updated_ts <null> | timestamptz <null> | null <null> | null <null> | null <null> | YES <null> | (3 rows)

So we've caught the extra column audited in eep.ork, the extra column updated_ts in opp.ork and the change to the common label column.

In closing, this could prove a useful tool to compare tables between schemata. This could be the basis for extending comparisons across databases on the same engine as well.

Hopefully this will prove useful and be helpful to database developers.

*I always pay my tax. Really!