0

my org is using gcp's postgres server as our production db. past few months, we've been busy with projects not related to db specifically, so no one was monitoring anything. (screenshot below) (first row is primary, next 2 are its read-replicas in different regions)

storage usage

last month, we noticed that apparently our primary instance has ballooned to almost 3 TB of storage consumption. we also observed that the 2 read-only replicas attached to the instance were sitting at ~260 GB storage usage.

looking at the history, we did have a high usage (~50x than usual) for early december; where we can see (screenshot below) that storage consumption did start increasing linearly. it should still not be increasing with such a pace.

storage

We ran a full vacuum of tables in the primary yesterday and the read-replicas have dropped to 120 GB storage usage. However, the primary instance is still consuming 3TB storage. (screenshots below)

replica storage drop

primary storage drop

running the query to fetch disk usage per table (shared below) shows that it should indeed be around 120 GB disk usage.

open to any ideas on what is making the primary instance use that much disk space? what should we do (apart from creating a new cluster) to get back the disk space?

we might be ok if the dynamic disk stops incrementing daily at this pace, until we can free up some manpower to either spin up a new cluster or some other solution.


sql query used for fetching disk storage per table

SELECT table_name, pg_size_pretty(table_size) AS table_size, pg_size_pretty(indexes_size) AS indexes_size, pg_size_pretty(total_size) AS total_size FROM ( SELECT table_name, pg_table_size(table_name) AS table_size, pg_indexes_size(table_name) AS indexes_size, pg_total_relation_size(table_name) AS total_size FROM ( SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name FROM information_schema.tables ) AS all_tables ORDER BY total_size DESC ) AS pretty_sizes; 
3
  • 1
    As a sysadmin my usual guess is that when disk usage is not caused by the amounts of data in tables, it likely is used by other things: for example log files, auditing / transaction logs maybe even file exports / uploads, (point in time) backups etc.etc. Commented May 20 at 10:03
  • On MySQL I'd immediately look at binlog retention, particularly in a replication parameter like this. I don't know how Postgres handles resync after failure, but almost certainly it's retaining the same sort of information. Commented May 20 at 17:37
  • Does your GCP instance have protection, snapshots, etc in place? Those components may also be included in the space. If there are multiple instances, then redundancy is almost certain to be included. Commented May 20 at 19:23

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.