PostgreSQL Internals Series, Part 3: Indexes & B-Trees
How PostgreSQL indexes work — B-tree structure, scans, deduplication, index types, bloat detection, and when the planner uses them.
Indexes are the primary tool for making queries fast. A sequential table scan reads every tuple; an index scan jumps directly to matching rows. But indexes have overhead: they consume disk space, slow down writes (to maintain them), and can become bloated.
This part explores how B-tree indexes actually work, how the planner decides whether to use them, and why your index might be much larger than you expect.
B-Tree Basics
PostgreSQL’s default index type is B-tree (Balanced Tree). Unlike binary trees, B-trees have many children per node, making them cache-efficient.
A B-tree has this structure:
[50 | 100 | 150] ← Internal node
/ | \
[10 | 30] [60 | 80] [120 | 130] [160 | 200]
/ | \ / | \ / | \ / | \
Leaf nodes with actual key → row pointers
Each internal node contains keys and pointers to children. Each leaf node contains keys and pointers to heap tuples (their (block, offset) from Part 1).
Key properties:
- Balanced: All leaf nodes are at the same depth
- Sorted: Keys are in ascending order within each node
- Branching: Each node can have many children (not just 2)
- Efficient: Search is O(log N) — with 1M keys, at most ~4 nodes visited
Creating an Index
CREATE TABLE users (
id BIGINT,
name TEXT,
email TEXT,
age INT
);
-- Single-column index
CREATE INDEX idx_users_id ON users(id);
-- Multi-column index (composite key)
CREATE INDEX idx_users_name_age ON users(name, age);
-- Partial index (only rows matching WHERE)
CREATE INDEX idx_active_users ON users(id) WHERE age > 18;
-- Covering index (include non-key columns)
CREATE INDEX idx_users_id_with_name ON users(id) INCLUDE (name);
Index Scan Types
The planner has several ways to use an index:
Index Scan
Lookup keys in the index, get heap (block, offset), fetch from heap.
EXPLAIN ANALYZE
SELECT * FROM users WHERE id = 42;
Output might show:
Index Scan using idx_users_id on users (cost=0.42..8.44)
Index Cond: (id = 42)
Cost: Fast index lookup (0.42) + heap fetch (8.44).
Index-Only Scan
The covering index approach. The index contains all columns needed for the query, so no heap fetch is required.
EXPLAIN ANALYZE
SELECT id, name FROM users WHERE id = 42;
With CREATE INDEX idx_users_id_with_name ON users(id) INCLUDE (name);:
Index Only Scan using idx_users_id_with_name on users (cost=0.42..0.44)
Index Cond: (id = 42)
Cost: Just index lookup (0.42), no heap fetch (0.44 is tiny overhead for page header checks).
Index-only scans are fast because:
- No random heap I/O
- Index pages are often cached
- Smaller than heap pages (more keys per page)
Bitmap Index Scan
Used when multiple indexes are combined or for range scans on columns with low cardinality:
EXPLAIN ANALYZE
SELECT * FROM users WHERE age > 18 AND age < 65;
Bitmap Index Scan on idx_users_age (cost=0.00..X)
Index Cond: (age > 18) AND (age < 65)
Bitmap Heap Scan on users (cost=X..Y)
Recheck Cond: (age > 18) AND (age < 65)
The bitmap is a memory-efficient representation of matching heap pages. The planner decides it’s faster than an index scan.
Seq Scan (no index)
If the query matches many rows, a sequential scan is faster than random heap I/O:
EXPLAIN ANALYZE
SELECT * FROM users WHERE age > 0;
Expected:
Seq Scan on users (cost=0.00..X)
Filter: (age > 0)
If the table is small or most rows match, scanning all pages sequentially is faster than jumping around with random I/O.
B-Tree Maintenance & Splits
As you insert rows, keys are added to leaf nodes. When a leaf fills up, it splits:
Before split: [10, 20, 30, 40, 50] (full node)
Insert 35
After split: [10, 20, 30] [35, 40, 50]
Key 35 promoted to parent
The middle key is promoted to the parent, and two new nodes are created. This keeps the tree balanced and logarithmic search time.
Cost of splits: When a node splits, its parent might also split (cascade up). In rare cases, the root splits, increasing tree depth by 1.
Composite Indexes
Multi-column indexes are useful for certain queries:
CREATE INDEX idx_users_name_age ON users(name, age);
This index can be used for:
WHERE name = 'Alice'✓ (leading column)WHERE name = 'Alice' AND age = 30✓ (both columns)WHERE age = 30✗ (age is not leading)
Rule: Indexes can only be used if the leading column(s) are in the WHERE clause.
For queries on both name and age independently, you need separate indexes:
CREATE INDEX idx_users_name ON users(name);
CREATE INDEX idx_users_age ON users(age);
Or use a covering index if you want one index for both queries:
CREATE INDEX idx_users_multi ON users(name) INCLUDE (age);
Key Deduplication
PostgreSQL’s B-trees use key deduplication to save space. When a key is repeated many times, it’s stored once with a counter:
Before dedup: [10, 10, 10, 10, 20, 20, 30]
After dedup: [10 (count=4), 20 (count=2), 30]
This reduces index size significantly for columns with low cardinality (many duplicates).
Other Index Types
PostgreSQL supports several index types beyond B-tree:
BRIN (Block Range Index)
Summarizes ranges of blocks (e.g., “blocks 0–1000 contain values 1–100”). Very small and useful for large, naturally sorted tables.
CREATE INDEX idx_events_time ON events USING BRIN (event_time);
Use case: Time-series data. If rows are inserted in time order, BRIN can cover billions of rows with a tiny index.
Hash
Distributes keys by hash value. Not recommended for most queries (no range scans).
CREATE INDEX idx_users_id ON users USING HASH (id);
GiST (Generalized Search Tree)
Supports geometric data (points, polygons, IP ranges). Complex but powerful.
CREATE INDEX idx_locations ON places USING GIST (location);
GIN (Generalized Inverted Index)
Optimized for searching within arrays and full-text documents.
CREATE INDEX idx_tags ON posts USING GIN (tags);
Index Bloat Detection
Like tables, indexes can become bloated with dead space. Monitor index size:
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
Check if an index is being used:
SELECT
indexname,
idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0; -- Unused indexes
Unused indexes slow down writes. Drop them:
DROP INDEX CONCURRENTLY idx_unused; -- CONCURRENTLY = no lock
Rebuilding Bloated Indexes
If an index becomes bloated:
-- Check index size before
SELECT pg_size_pretty(pg_relation_size('idx_users_id'));
-- Rebuild (non-blocking)
REINDEX INDEX CONCURRENTLY idx_users_id;
-- Check size after (should be smaller)
SELECT pg_size_pretty(pg_relation_size('idx_users_id'));
Practical Example: Query with Index
CREATE TABLE orders (
order_id BIGINT,
customer_id BIGINT,
order_date DATE,
total DECIMAL,
status TEXT
);
INSERT INTO orders VALUES
(1, 100, '2024-01-01', 50.00, 'completed'),
(2, 101, '2024-01-02', 75.50, 'pending'),
(3, 100, '2024-01-03', 100.00, 'completed');
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_date ON orders(order_date);
-- Query 1: Benefit from idx_orders_customer
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 100;
-- Output: Index Scan using idx_orders_customer (cost=0.42..8.44)
-- Query 2: Benefit from idx_orders_date
EXPLAIN ANALYZE
SELECT * FROM orders WHERE order_date = '2024-01-01';
-- Output: Index Scan using idx_orders_date (cost=0.29..8.40)
-- Query 3: Can't use both at once (would need composite index or bitmap scan)
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 100 AND order_date = '2024-01-01';
-- Output: Bitmap Index Scan or fallback to one index
When NOT to Index
Don’t create an index if:
- The column has very low cardinality (few distinct values)
- The query selects most rows (seq scan is faster)
- The column is rarely queried
- The table is small (< 10K rows)
- The index duplicates an existing composite index
High-maintenance indexes:
- Multi-column indexes on frequently-updated columns
- Partial indexes with complex predicates (harder to understand)
- Too many indexes on one table (write performance suffers)
Key Takeaways
- B-trees are balanced, sorted trees with O(log N) search time.
- Index scans use the index to find rows, then fetch from heap.
- Index-only scans avoid heap fetches by including all needed columns in the index.
- Composite indexes work best with leading-column filters.
- Key deduplication reduces index size for columns with duplicates.
- BRIN indexes are tiny and great for large, naturally-sorted tables.
- Unused indexes waste space and slow writes. Monitor and drop them.
- Index bloat can be fixed with
REINDEX INDEX CONCURRENTLY.
Monitoring Indexes
Create a dashboard query to monitor index health:
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
CASE
WHEN idx_scan = 0 THEN 'UNUSED'
WHEN idx_tup_read / NULLIF(idx_tup_fetch, 0) > 100 THEN 'INEFFICIENT'
ELSE 'OK'
END AS status
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
Next Steps
Indexes exist to make queries fast. But the planner decides whether to use them. It estimates the cost of different execution strategies and picks the cheapest.
Part 4 dives into the cost model: how PostgreSQL estimates costs, how statistics influence decisions, and why the planner sometimes picks seq scans over indexes.
Part 3 complete. Next: Query Planning & Optimization