Posted on Jun 25

Database optimization best practices

#postgressql #sql #database #productivity

Imagine your database as a wild animal sanctuary: some queries lumber like tortoises, while others sprint like cheetahs. Your job as a DBA is to coax every query into channeling its inner cheetah—fast, efficient, and resource-savvy. In this KB article, you’ll discover practical techniques, vibrant code examples, ASCII-art execution plans, and Mermaid flowcharts that transform sluggish SQL into scalpels of performance. fileciteturn0file0

1. Measure Twice, Cut Once: EXPLAIN & ANALYZE

Before refactoring, know your enemy. Use:

EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

to expose hidden bottlenecks: row estimates, buffer hits vs. reads, and CPU vs. I/O costs.

 QUERY PLAN ----------------------------------------------------------------------------------- Hash Join (cost=150.00..500.00 rows=1000 width=64) (actual time=12.345..45.678 rows=950 loops=1) Hash Cond: (t1.id = t2.foreign_id) Buffers: shared hit=2000 read=1500

This tells you whether your query is CPU-bound, I/O-bound, or suffering from bad cardinality estimates.

2. Indexing Mastery: More Than Just B‑Trees

2.1 Partial & Expression Indexes

Target hot filter patterns without bloating:

CREATE INDEX idx_active_users ON users((lower(email))) WHERE status = 'active';

2.2 BRIN for Time-Series

Massive append-only tables? Try BRIN:

CREATE TABLE logs ( ts TIMESTAMPTZ, event JSONB ) PARTITION BY RANGE (ts); CREATE INDEX ON logs USING BRIN (ts);

This lightweight index slashes size at scale.

3. Encapsulate Complexity: Stored Functions & Views

Rather than embedding 10 JOINs in every API call, wrap logic in a function or view:

CREATE OR REPLACE FUNCTION daily_sales_summary(day DATE) RETURNS TABLE(user_id UUID, total DECIMAL) AS $$ BEGIN RETURN QUERY SELECT s.user_id, SUM(amount) FROM sales s WHERE date_trunc('day', s.ts) = day GROUP BY s.user_id; END; $$ LANGUAGE plpgsql;

The planner can optimize a stable function more aggressively than ad-hoc SQL.

4. Aggregations & Windows: Tricks of the Trade

4.1 Materialized Aggregates

For metrics dashboards, precompute:

CREATE MATERIALIZED VIEW mv_user_errors AS SELECT user_id, COUNT(*) AS error_count FROM events WHERE error_flag GROUP BY user_id; REFRESH MATERIALIZED VIEW CONCURRENTLY mv_user_errors;

4.2 Window Functions vs. GROUP BY

When you need both raw rows and aggregates:

SELECT order_id, amount, SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer FROM orders;

Use an index on (customer_id, amount) to speed windows.

5. Partitioning & Parallelism: Scale Out Safely

5.1 Declarative Partitioning

Split by time or key:

CREATE TABLE metrics ( ts DATE, value DOUBLE PRECISION ) PARTITION BY RANGE (ts); CREATE TABLE metrics_2025_q1 PARTITION OF metrics FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');

5.2 Harness Parallel Queries

Enable in postgresql.conf:

max_parallel_workers_per_gather = 4

Then large scans auto-split across CPUs.

6. Housekeeping: VACUUM, ANALYZE & Maintenance

6.1 Autovacuum Tuning

Ensure autovacuum thresholds fit your workload. For high-churn tables:

ALTER TABLE big_table SET ( autovacuum_vacuum_scale_factor = 0.05, autovacuum_analyze_scale_factor = 0.02 );

6.2 Fillfactor for Write-Heavy Tables

Reserve free space to reduce page splits:

ALTER TABLE logs SET (fillfactor = 70);

7. Real‑World Case Study: 80% Speedup

Scenario: A nightly report took 10 minutes. By applying:

Partial index on status
Function-based view
Partition pruning on date
Autovacuum tuning

we tracked its execution plan changes:

-- Before: Seq Scan on orders (time: 600s) -- After: Index Only Scan using idx_status_date (time: 120s)

From 10 min → 2 min: a success story to inspire your own triumphs.

Key Takeaways

Measure first with EXPLAIN ANALYZE (BUFFERS).
Index smartly: partial, expression, BRIN.
Encapsulate complex logic in functions/views.
Precompute heavy aggregates with materialized views.
Partition & parallelize for scale.
Maintain: VACUUM, ANALYZE, and fillfactor.

8. Beyond the Basics: Advanced Techniques

8.1 Adaptive Query Plans with pg_stat_statements

Track your most expensive statements:

CREATE EXTENSION pg_stat_statements; SELECT query, calls, total_time, mean_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;

Use this insight to prioritize optimizations.

8.2 Plan Stability with Prepared Statements

For queries with variable patterns, prepared statements lock in good plans:

PREPARE fast_search(text) AS SELECT * FROM products WHERE description ILIKE $1; EXECUTE fast_search('%widget%');

8.3 In-Memory Speed with UNLOGGED Tables

Temp-heavy data can live in RAM:

CREATE UNLOGGED TABLE temp_hits AS SELECT ...;

8.4 Smart Caching Layers

Combine Redis or PGSQL's native caching:

DO $$ BEGIN PERFORM pg_prewarm('hot_table'); END; $$;

Creative Corner: Visualizing Data Flow

Bring your diagrams to life—they guide both your brain and your team.

Final Thoughts: The Art of Performance

Optimizing SQL is equal parts science and art. It’s a continuous journey: measure, tweak, observe, and repeat. With these techniques—from core index strategies to creative caching and plan management—you’re equipped to turn any tortoise into a cheetah.
Remember: the fastest query is the one you never run. Cache wisely, precompute where it counts, and let your database shine.

DEV Community