my org is using gcp's postgres server as our production db. past few months, we've been busy with projects not related to db specifically, so no one was monitoring anything. (screenshot below) (first row is primary, next 2 are its read-replicas in different regions)
last month, we noticed that apparently our primary instance has ballooned to almost 3 TB of storage consumption. we also observed that the 2 read-only replicas attached to the instance were sitting at ~260 GB storage usage.
looking at the history, we did have a high usage (~50x than usual) for early december; where we can see (screenshot below) that storage consumption did start increasing linearly. it should still not be increasing with such a pace.
We ran a full vacuum of tables in the primary yesterday and the read-replicas have dropped to 120 GB storage usage. However, the primary instance is still consuming 3TB storage. (screenshots below)
running the query to fetch disk usage per table (shared below) shows that it should indeed be around 120 GB disk usage.
open to any ideas on what is making the primary instance use that much disk space? what should we do (apart from creating a new cluster) to get back the disk space?
we might be ok if the dynamic disk stops incrementing daily at this pace, until we can free up some manpower to either spin up a new cluster or some other solution.
sql query used for fetching disk storage per table
SELECT table_name, pg_size_pretty(table_size) AS table_size, pg_size_pretty(indexes_size) AS indexes_size, pg_size_pretty(total_size) AS total_size FROM ( SELECT table_name, pg_table_size(table_name) AS table_size, pg_indexes_size(table_name) AS indexes_size, pg_total_relation_size(table_name) AS total_size FROM ( SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name FROM information_schema.tables ) AS all_tables ORDER BY total_size DESC ) AS pretty_sizes;