Kafka | ifkarsyah

Projects

Clickstreamer — Real-time Clickstream Pipeline

End-to-end clickstream analytics pipeline using Kafka, Apache Flink, ClickHouse, and Grafana with a full Docker Compose setup.

Streaming KafkaFlinkClickHouse

↗

Real-time Ingestion Pipeline

Featured

A high-throughput streaming ingestion platform built with Apache Flink and Kafka, processing 500k+ events/sec into ClickHouse.

Streaming KafkaFlinkClickHouse

↗

Blog Posts

Mar 4, 2026

Debezium Series, Part 9: Production Concerns

Operating Debezium in production: offset management, failure recovery, monitoring connector lag, replication slot health, rebalancing, and the operational patterns that keep CDC pipelines healthy.

Streaming DebeziumKafka

→

Mar 3, 2026

Debezium Series, Part 8: Transforms & Routing

Single Message Transforms (SMTs) for reshaping, filtering, and routing CDC events. Field extraction, topic routing, sensitive data masking, and when to reach for a stream processor.

Streaming DebeziumKafka

→

Mar 3, 2026

Apache Pekko Series, Part 9: Production Best Practices

Running Pekko in production: Kafka connectors, OpenTelemetry distributed tracing, health checks, dispatcher tuning, Kubernetes deployment, and migrating from Akka.

Streaming PekkoScala

→

Mar 2, 2026

Debezium Series, Part 7: Snapshotting

How Debezium captures existing data before streaming live changes. All snapshot modes explained — initial, never, always, when_needed — plus isolation guarantees and large-table strategies.

Streaming DebeziumKafka

→

Mar 1, 2026

Debezium Series, Part 6: Handling Schema Changes

What happens when someone alters a table. DDL propagation, Schema Registry integration, breaking vs non-breaking changes, and strategies to evolve without downtime.

Streaming DebeziumKafka

→

Feb 28, 2026

Debezium Series, Part 5: Sink Connectors — Delta Lake & Iceberg

Landing CDC events into open table formats. Upsert and delete semantics with Delta Lake MERGE, Iceberg MERGE INTO, partition strategies, and JDBC sink for relational targets.

Streaming DebeziumKafka

→

Feb 27, 2026

Debezium Series, Part 4: Source Connectors — PostgreSQL & MySQL

Deep dive into PostgreSQL (pgoutput) and MySQL (binlog) source connectors. Configuration reference, behavioral differences, and connector-specific gotchas.

Streaming DebeziumKafka

→

Feb 26, 2026

Debezium Series, Part 3: Change Event Anatomy

Dissecting every field in a Debezium change event — before, after, op, source metadata, tombstones, and how the Kafka message key is structured.

Streaming DebeziumKafka

→

Feb 25, 2026

Debezium Series, Part 2: Setting Up Debezium

Hands-on Docker Compose setup with PostgreSQL, Kafka, Kafka Connect, and the Debezium connector. See your first change event in under 10 minutes.

Streaming DebeziumKafka

→

Feb 24, 2026

Debezium Series, Part 1: How CDC Works

Log-based vs query-based CDC, how PostgreSQL WAL and MySQL binlog work, what Debezium reads, and at-least-once delivery guarantees explained.

Streaming DebeziumKafka

→

Feb 23, 2026

Debezium Series, Part 0: Overview

A practical guide to Change Data Capture with Debezium — from WAL internals to Delta Lake and Iceberg sinks. What you'll learn and why CDC matters.

Streaming DebeziumKafka

→

May 19, 2024

Kafka Series, Part 6: Kafka Streams

Stream processing natively inside Kafka — KStream vs KTable, stateful aggregations, joins, windowing, and state stores.

Streaming Kafka

→

May 12, 2024

Kafka Series, Part 5: Kafka Connect

Moving data in and out of Kafka without writing custom code — connectors, transforms, and running Connect in production.

Streaming Kafka

→

May 5, 2024

Kafka Series, Part 4: Reliability & Operations

Replication, in-sync replicas, durability guarantees, and operational concerns for running Kafka in production.

Streaming Kafka

→

Apr 28, 2024

Kafka Series, Part 3: Consumers & Consumer Groups

Reading from Kafka at scale — consumer groups, partition assignment, offset commits, and handling rebalances.

Streaming Kafka

→

Apr 21, 2024

Kafka Series, Part 2: Producers

Writing to Kafka reliably — the producer API, batching, compression, delivery guarantees, and idempotent producers.

Streaming Kafka

→

Apr 14, 2024

Kafka Series, Part 1: Topics, Partitions & Offsets

The core data model behind Kafka — how topics are structured, why partitions matter, and how offsets track consumer position.

Streaming Kafka

→

Apr 7, 2024

Kafka Series, Part 0: Overview

What is Apache Kafka, what problem does it solve, and when should you use it? A roadmap for the series.

Streaming Kafka

→