Blog Posts

Elasticsearch Internals Series, Part 8: Write Path & Translog

The complete lifecycle of a write — from index request to durable disk storage. Translog as Elasticsearch's WAL, refresh vs flush, and tuning durability vs throughput.

Elasticsearch Internals Series, Part 7: Cluster Architecture & Replication

Node roles, primary vs replica shards, the write path from primary to replicas, split-brain prevention with quorum, and observing cluster recovery under node failure.

Elasticsearch Internals Series, Part 6: Aggregations & Analytics

How aggregations work internally using doc_values, bucket vs metric vs pipeline aggs, cardinality approximation with HyperLogLog++, and building analytics dashboards.

Elasticsearch Internals Series, Part 5: Query DSL Deep Dive

Query vs filter context, bool query anatomy, leaf queries, pagination strategies, and building a real product search from scratch.

Elasticsearch Internals Series, Part 4: Search Internals & Relevance Scoring

How a search query flows from client to shards and back, how BM25 calculates relevance scores, and how to debug scoring with the _explain API.

Elasticsearch Internals Series, Part 3: Document Storage & Mappings

How Elasticsearch stores fields in multiple representations — _source, inverted index, doc_values, fielddata — and why the wrong mapping kills performance.

Elasticsearch Internals Series, Part 2: Shards, Segments & Lucene

How Elasticsearch splits indexes into shards, how each shard is a Lucene index made of immutable segments, and why refresh interval controls search freshness.

Elasticsearch Internals Series, Part 1: Inverted Index & Text Analysis

How Elasticsearch stores text for full-text search — inverted index structure, analyzers, tokenizers, token filters, and practical inspection with _analyze and _termvectors.

Elasticsearch Internals Series, Part 0: Overview

A roadmap through Elasticsearch 8.x internals — from inverted indexes to cluster replication. Why learning the engine makes you a better search engineer.