Kafka
20 items using Kafka
Projects
Clickstreamer — Real-time Clickstream Pipeline
End-to-end clickstream analytics pipeline using Kafka, Apache Flink, ClickHouse, and Grafana with a full Docker Compose setup.
Real-time Ingestion Pipeline
FeaturedA high-throughput streaming ingestion platform built with Apache Flink and Kafka, processing 500k+ events/sec into ClickHouse.
Blog Posts
Debezium Series, Part 9: Production Concerns
Operating Debezium in production: offset management, failure recovery, monitoring connector lag, replication slot health, rebalancing, and the operational patterns that keep CDC pipelines healthy.
Debezium Series, Part 8: Transforms & Routing
Single Message Transforms (SMTs) for reshaping, filtering, and routing CDC events. Field extraction, topic routing, sensitive data masking, and when to reach for a stream processor.
Apache Pekko Series, Part 9: Production Best Practices
Running Pekko in production: Kafka connectors, OpenTelemetry distributed tracing, health checks, dispatcher tuning, Kubernetes deployment, and migrating from Akka.
Debezium Series, Part 7: Snapshotting
How Debezium captures existing data before streaming live changes. All snapshot modes explained — initial, never, always, when_needed — plus isolation guarantees and large-table strategies.
Debezium Series, Part 6: Handling Schema Changes
What happens when someone alters a table. DDL propagation, Schema Registry integration, breaking vs non-breaking changes, and strategies to evolve without downtime.
Debezium Series, Part 5: Sink Connectors — Delta Lake & Iceberg
Landing CDC events into open table formats. Upsert and delete semantics with Delta Lake MERGE, Iceberg MERGE INTO, partition strategies, and JDBC sink for relational targets.
Debezium Series, Part 4: Source Connectors — PostgreSQL & MySQL
Deep dive into PostgreSQL (pgoutput) and MySQL (binlog) source connectors. Configuration reference, behavioral differences, and connector-specific gotchas.
Debezium Series, Part 3: Change Event Anatomy
Dissecting every field in a Debezium change event — before, after, op, source metadata, tombstones, and how the Kafka message key is structured.
Debezium Series, Part 2: Setting Up Debezium
Hands-on Docker Compose setup with PostgreSQL, Kafka, Kafka Connect, and the Debezium connector. See your first change event in under 10 minutes.
Debezium Series, Part 1: How CDC Works
Log-based vs query-based CDC, how PostgreSQL WAL and MySQL binlog work, what Debezium reads, and at-least-once delivery guarantees explained.
Debezium Series, Part 0: Overview
A practical guide to Change Data Capture with Debezium — from WAL internals to Delta Lake and Iceberg sinks. What you'll learn and why CDC matters.
Kafka Series, Part 6: Kafka Streams
Stream processing natively inside Kafka — KStream vs KTable, stateful aggregations, joins, windowing, and state stores.
Kafka Series, Part 5: Kafka Connect
Moving data in and out of Kafka without writing custom code — connectors, transforms, and running Connect in production.
Kafka Series, Part 4: Reliability & Operations
Replication, in-sync replicas, durability guarantees, and operational concerns for running Kafka in production.
Kafka Series, Part 3: Consumers & Consumer Groups
Reading from Kafka at scale — consumer groups, partition assignment, offset commits, and handling rebalances.
Kafka Series, Part 2: Producers
Writing to Kafka reliably — the producer API, batching, compression, delivery guarantees, and idempotent producers.
Kafka Series, Part 1: Topics, Partitions & Offsets
The core data model behind Kafka — how topics are structured, why partitions matter, and how offsets track consumer position.
Kafka Series, Part 0: Overview
What is Apache Kafka, what problem does it solve, and when should you use it? A roadmap for the series.