Interpreting Supabase Grafana CPU charts #27022

TheOtherBrian1 · 2024-06-05T04:46:27Z

TheOtherBrian1
Jun 5, 2024
Maintainer

Guide for setting up Supabase Grafana

Examples of stressed CPU
Showing high busy kernel CPU (yellow). Symptomatic of extension or connection-related issues.

High busy user CPU (blue). Symptomatic of problematic queries or database overload:

High busy IOWait (red). Even though the amount is relatively low, because disk is so much slower than CPU, a small amount is a sign of inadequate memory or disk IOPS/throughput:

Completely overwhelmed. Symptomatic of misuse and problematic data access patterns

Interpreting charts

The CPU chart shows 4 distinct metrics of interest:

Yellow: It represents CPU utilized in kernel space (privileged OS operations). If this is high, it may be a sign that your app is connecting/disconnecting too aggressively. It could also be symptomatic of extension-related errors.
Blue: It represents requests in user space and mainly reflects the CPU usage from regular queries. For optimization tips, check out the links at the end of the page.
Red: It represents cycles the CPU spent idle because it was waiting on IO tasks. Any amount of red often implies disk or, indirectly, memory problems.
Green: These are CPU cycles spent idle.
As the CPU peaks towards 100%, queries and database tasks will begin to throttle, as they won't have enough time or access to the CPU.

Identifying problematic queries:

You should review your query performance advisor. Alternatively, you can access pg_stat_statements view. If you query it directly, you should focus on queries that create extremes:

largest std_dev
longest individual runtime
largest mean runtimes

PostgreSQL will produce a lot of activity in the background, so it's preferable to focus only on requests made by your application. This usually means filtering results from just the postgres role and, if you're using the DB API, the API roles, too.

Frequently called queries are the most insightful, so you should limit your observations to only the 15 to 30 most called. If this number is too narrow, expand as needed.

Examples on how to find the most problematic queries

-- Find top 15 most frequent and longest-running on average among the postgres, anon, authenticated, and service_role roles SELECT r.rolname AS user, s.query, s.calls, s.mean_exec_time, s.total_exec_time FROM pg_stat_statements s JOIN pg_authid r ON s.userid = r.oid WHERE r.rolname IN ('anon', 'authenticated', 'service_role', 'postgres') ORDER BY s.mean_exec_time DESC, s.calls DESC LIMIT 15;

Find the top 15 most frequent queries from the postgres, anon, authenticated, and service_role with an average runtime over 200ms and extreme standard deviations.

SELECT r.rolname AS user, s.query, s.min_exec_time AS min_time_ms, s.max_exec_time AS max_time_ms, s.mean_exec_time AS avg_time_ms, s.stddev_exec_time AS stddev_time_ms FROM pg_stat_statements s JOIN pg_authid r ON s.userid = r.oid WHERE r.rolname IN ('anon', 'authenticated', 'service_role') AND s.mean_exec_time > 200 ORDER BY s.stddev_exec_time DESC LIMIT 15;

Special emphasis should be placed on queries that have a higher likelihood of CPU usage:

Many joins, especially those without FK indexes on join columns
- Functions that utilize loops
Window and aggregate functions
ORDER BY without indexes
Queries referencing LIKE/ILIKE

Besides the query explorer, queries that generate timeout and duration events in the logs can also be of interest. You can go to the Logs Explorer, and run the following query to identify the events:

select cast(postgres_logs.timestamp as datetime) as timestamp, event_message, parsed.error_severity, parsed.user_name, parsed.query, parsed.detail, parsed.hint, parsed.sql_state_code, parsed.backend_type from postgres_logs cross join unnest(metadata) as metadata cross join unnest(metadata.parsed) as parsed where regexp_contains(event_message, 'duration|timeout') order by timestamp desc limit 100;

Optimizing:

Optimize your queries.
Add indexes if possible.
Increasing the compute size
Distribute load by using read-replicas
Preprocess CPU-intensive queries with materialized views
If using OFFSET for pagination, switch to keyset pagination instead

Other useful Supabase Grafana guides:

DavidMelnychuk · 2025-05-29T22:14:42Z

DavidMelnychuk
May 29, 2025

Very helpful, this and the other Grafana guides should be added as links in the Supabase docs!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supabase

Interpreting Supabase Grafana CPU charts #27022

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Supabase

Interpreting Supabase Grafana CPU charts #27022

Uh oh!

Uh oh!

TheOtherBrian1 Jun 5, 2024 Maintainer

Interpreting charts

Identifying problematic queries:

Optimizing:

Replies: 1 comment

Uh oh!

DavidMelnychuk May 29, 2025

TheOtherBrian1
Jun 5, 2024
Maintainer

DavidMelnychuk
May 29, 2025