PostgreSQL Performance Tuning: From Slow Queries to Sub-Second Responses
Sabin Shrestha
Full-Stack Developer
A well-tuned PostgreSQL database can handle millions of queries per second. A poorly configured one struggles with hundreds. Let's explore how to optimize PostgreSQL for production workloads.
Understanding Query Execution
Before optimizing, you need to understand how PostgreSQL executes queries. The EXPLAIN ANALYZE command is your best friend:
EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON p.author_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id
ORDER BY post_count DESC
LIMIT 10;
Output breakdown:
Limit (cost=1234.56..1234.78 rows=10 width=40) (actual time=45.123..45.156 rows=10 loops=1)
-> Sort (cost=1234.56..1250.00 rows=5000 width=40) (actual time=45.120..45.145 rows=10 loops=1)
Sort Key: (count(p.id)) DESC
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=1100.00..1150.00 rows=5000 width=40) (actual time=43.000..44.500 rows=4850 loops=1)
Group Key: u.id
-> Hash Left Join (cost=100.00..1000.00 rows=50000 width=36) (actual time=1.234..35.000 rows=48500 loops=1)
Hash Cond: (p.author_id = u.id)
-> Seq Scan on posts p (cost=0.00..500.00 rows=30000 width=8) (actual time=0.010..10.000 rows=30000 loops=1)
-> Hash (cost=80.00..80.00 rows=5000 width=36) (actual time=1.200..1.200 rows=4850 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 350kB
-> Seq Scan on users u (cost=0.00..80.00 rows=5000 width=36) (actual time=0.005..0.800 rows=4850 loops=1)
Filter: (created_at > '2025-01-01'::date)
Rows Removed by Filter: 150
Planning Time: 0.250 ms
Execution Time: 45.200 ms
Key metrics to watch:
- Seq Scan - Full table scan (often bad for large tables)
- actual time - Real execution time
- rows - Number of rows processed
- loops - Number of times the node executed
Indexing Strategies
B-Tree Indexes (Default)
Best for equality and range queries:
-- Single column index
CREATE INDEX idx_users_email ON users(email);
-- Composite index (order matters!)
CREATE INDEX idx_posts_author_created ON posts(author_id, created_at DESC);
-- Partial index (smaller, faster)
CREATE INDEX idx_posts_published ON posts(created_at DESC)
WHERE published = true;
Index Column Order Matters
For a composite index (a, b, c):
| Query Pattern | Uses Index? | |--------------|-------------| | WHERE a = 1 | ✅ Yes | | WHERE a = 1 AND b = 2 | ✅ Yes | | WHERE a = 1 AND b = 2 AND c = 3 | ✅ Yes | | WHERE b = 2 | ❌ No | | WHERE a = 1 AND c = 3 | ⚠️ Partial (only a) |
GIN Indexes for Arrays and JSONB
-- For array columns
CREATE INDEX idx_posts_tags ON posts USING GIN(tags);
-- For JSONB columns
CREATE INDEX idx_users_metadata ON users USING GIN(metadata);
-- Query using the index
SELECT * FROM posts WHERE tags @> ARRAY['typescript', 'nodejs'];
SELECT * FROM users WHERE metadata @> '{"role": "admin"}';
Expression Indexes
Index computed values:
-- Index for case-insensitive search
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Query that uses it
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
Covering Indexes (Include)
Include non-key columns to avoid table lookups:
-- Include columns needed by the query
CREATE INDEX idx_posts_author_covering ON posts(author_id)
INCLUDE (title, created_at);
-- This query can be answered entirely from the index
SELECT title, created_at FROM posts WHERE author_id = 123;
Query Optimization Techniques
Avoid SELECT *
-- Bad: fetches all columns
SELECT * FROM posts WHERE author_id = 123;
-- Good: fetch only needed columns
SELECT id, title, created_at FROM posts WHERE author_id = 123;
Use EXISTS Instead of IN for Large Sets
-- Slower with large subquery results
SELECT * FROM users
WHERE id IN (SELECT author_id FROM posts WHERE published = true);
-- Faster: stops at first match
SELECT * FROM users u
WHERE EXISTS (
SELECT 1 FROM posts p
WHERE p.author_id = u.id AND p.published = true
);
Optimize Pagination
Offset-based pagination is slow for large offsets:
-- Slow: scans and discards 10000 rows
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- Fast: cursor-based pagination
SELECT * FROM posts
WHERE created_at < '2025-01-15 10:30:00'
ORDER BY created_at DESC
LIMIT 20;
Use CTEs Wisely
CTEs can prevent optimization in some cases:
-- This CTE is materialized (computed once)
WITH recent_posts AS MATERIALIZED (
SELECT * FROM posts WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT * FROM recent_posts WHERE author_id = 123;
-- Better: let the planner optimize
SELECT * FROM posts
WHERE created_at > NOW() - INTERVAL '7 days'
AND author_id = 123;
Connection Pooling
PostgreSQL forks a new process for each connection (~10MB each). Use connection pooling:
PgBouncer Configuration
; pgbouncer.ini
[databases]
myapp = host=localhost port=5432 dbname=myapp
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
; Pool mode: transaction is most efficient
pool_mode = transaction
; Connection limits
max_client_conn = 1000
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
; Timeouts
server_connect_timeout = 15
server_idle_timeout = 600
server_lifetime = 3600
Application-Level Pooling (Prisma)
// prisma/schema.prisma
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
}
// Connection pool in URL
// DATABASE_URL="postgresql://user:pass@localhost:5432/myapp?connection_limit=20&pool_timeout=10"
PostgreSQL Configuration Tuning
Key settings in postgresql.conf:
# Memory Settings
shared_buffers = 256MB # 25% of RAM for dedicated DB server
effective_cache_size = 768MB # 75% of RAM
work_mem = 16MB # Per-operation memory
maintenance_work_mem = 128MB # For VACUUM, CREATE INDEX
# Write-Ahead Log
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 1GB
# Query Planner
random_page_cost = 1.1 # Lower for SSD
effective_io_concurrency = 200 # Higher for SSD
# Connections
max_connections = 100 # Keep low, use pooling
# Logging
log_min_duration_statement = 100 # Log queries > 100ms
log_statement = 'none'
log_checkpoints = on
Memory Formula
shared_buffers = RAM * 0.25
effective_cache_size = RAM * 0.75
work_mem = (RAM - shared_buffers) / (max_connections * 3)
Monitoring and Maintenance
Identify Slow Queries
-- Enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slowest queries
SELECT
round(total_exec_time::numeric, 2) as total_time_ms,
calls,
round(mean_exec_time::numeric, 2) as avg_time_ms,
round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) as percentage,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
Check Index Usage
-- Find unused indexes
SELECT
schemaname,
relname as table_name,
indexrelname as index_name,
idx_scan as times_used,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;
Find Missing Indexes
-- Tables with sequential scans
SELECT
schemaname,
relname as table_name,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch,
n_live_tup as row_count
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
AND n_live_tup > 10000
ORDER BY seq_tup_read DESC;
Regular Maintenance
-- Analyze updates statistics
ANALYZE posts;
-- Vacuum reclaims dead tuples
VACUUM ANALYZE posts;
-- Reindex if bloated
REINDEX INDEX idx_posts_author_id;
Real-World Optimization Example
Before optimization:
EXPLAIN ANALYZE
SELECT p.*, u.name as author_name
FROM posts p
JOIN users u ON u.id = p.author_id
WHERE p.published = true
AND p.tags @> ARRAY['typescript']
ORDER BY p.created_at DESC
LIMIT 20;
-- Execution Time: 450ms
After optimization:
-- 1. Add covering index
CREATE INDEX idx_posts_published_tags ON posts(created_at DESC)
INCLUDE (title, author_id)
WHERE published = true;
-- 2. Add GIN index for tags
CREATE INDEX idx_posts_tags ON posts USING GIN(tags)
WHERE published = true;
-- 3. Rewrite query
SELECT p.id, p.title, p.created_at, u.name as author_name
FROM posts p
JOIN users u ON u.id = p.author_id
WHERE p.published = true
AND p.tags @> ARRAY['typescript']
ORDER BY p.created_at DESC
LIMIT 20;
-- Execution Time: 3ms
Conclusion
PostgreSQL performance optimization involves:
- Understand your queries - Use EXPLAIN ANALYZE
- Index strategically - Right index type, right columns, right order
- Optimize queries - Avoid anti-patterns
- Configure wisely - Tune memory and connections
- Monitor continuously - Catch issues early
- Maintain regularly - VACUUM and ANALYZE
A 150x improvement (450ms to 3ms) is achievable with proper optimization. Start with the slowest queries and work your way through.
Related Articles
Building Scalable REST APIs with NestJS and PostgreSQL
A comprehensive guide to architecting production-ready REST APIs using NestJS, Prisma ORM, and PostgreSQL. Learn best practices for validation, error handling, and database design.
AWS Infrastructure as Code with Terraform: A Practical Guide
Learn how to manage AWS infrastructure using Terraform. This guide covers VPCs, EC2, RDS, and S3 with real-world examples and best practices for team collaboration.
Docker Multi-Stage Builds for Production Node.js Applications
Optimize your Docker images with multi-stage builds. Learn techniques to reduce image size, improve security, and speed up deployments for Node.js applications.