Streaming | ifkarsyah

Projects

Clickstreamer — Real-time Clickstream Pipeline

End-to-end clickstream analytics pipeline using Kafka, Apache Flink, ClickHouse, and Grafana with a full Docker Compose setup.

Streaming KafkaFlinkClickHouse

↗

Real-time Ingestion Pipeline

Featured

A high-throughput streaming ingestion platform built with Apache Flink and Kafka, processing 500k+ events/sec into ClickHouse.

Streaming KafkaFlinkClickHouse

↗

Blog Posts

Mar 4, 2026

Debezium Series, Part 9: Production Concerns

Operating Debezium in production: offset management, failure recovery, monitoring connector lag, replication slot health, rebalancing, and the operational patterns that keep CDC pipelines healthy.

Streaming DebeziumKafka

→

Mar 3, 2026

Debezium Series, Part 8: Transforms & Routing

Single Message Transforms (SMTs) for reshaping, filtering, and routing CDC events. Field extraction, topic routing, sensitive data masking, and when to reach for a stream processor.

Streaming DebeziumKafka

→

Mar 3, 2026

Apache Pekko Series, Part 9: Production Best Practices

Running Pekko in production: Kafka connectors, OpenTelemetry distributed tracing, health checks, dispatcher tuning, Kubernetes deployment, and migrating from Akka.

Streaming PekkoScala

→

Mar 2, 2026

Debezium Series, Part 7: Snapshotting

How Debezium captures existing data before streaming live changes. All snapshot modes explained — initial, never, always, when_needed — plus isolation guarantees and large-table strategies.

Streaming DebeziumKafka

→

Mar 2, 2026

Apache Pekko Series, Part 8: CQRS & Projections

Separating write models from read models with CQRS. Pekko Projection — consuming the event journal to build materialized views, exactly-once processing, and offset tracking.

Streaming PekkoScala

→

Mar 1, 2026

Debezium Series, Part 6: Handling Schema Changes

What happens when someone alters a table. DDL propagation, Schema Registry integration, breaking vs non-breaking changes, and strategies to evolve without downtime.

Streaming DebeziumKafka

→

Mar 1, 2026

Apache Pekko Series, Part 7: Clustering & Distributed Actors

Running Pekko across multiple JVMs. Cluster membership, the gossip protocol, cluster sharding for stateful actors, and singleton actors — all with practical configuration examples.

Streaming PekkoScala

→

Feb 28, 2026

Debezium Series, Part 5: Sink Connectors — Delta Lake & Iceberg

Landing CDC events into open table formats. Upsert and delete semantics with Delta Lake MERGE, Iceberg MERGE INTO, partition strategies, and JDBC sink for relational targets.

Streaming DebeziumKafka

→

Feb 28, 2026

Apache Pekko Series, Part 6: gRPC with Pekko

Protocol Buffers, generated Pekko service stubs, server and client setup, and bidirectional streaming. When to use gRPC instead of REST and how to run both side by side.

Streaming PekkoScala

→

Feb 27, 2026

Debezium Series, Part 4: Source Connectors — PostgreSQL & MySQL

Deep dive into PostgreSQL (pgoutput) and MySQL (binlog) source connectors. Configuration reference, behavioral differences, and connector-specific gotchas.

Streaming DebeziumKafka

→

Feb 27, 2026

Apache Pekko Series, Part 5: HTTP with Pekko

Build REST APIs with pekko-http's routing DSL. HTTP server setup, route composition, request and response marshalling, and integrating HTTP endpoints with an actor system.

Streaming PekkoScala

→

Feb 26, 2026

Debezium Series, Part 3: Change Event Anatomy

Dissecting every field in a Debezium change event — before, after, op, source metadata, tombstones, and how the Kafka message key is structured.

Streaming DebeziumKafka

→

Feb 26, 2026

Apache Pekko Series, Part 4: Streams & Reactive Processing

Source, Flow, and Sink — the building blocks of Pekko Streams. Backpressure by design, composable pipelines, and how to process data without dropping messages or crashing.

Streaming PekkoScala

→

Feb 25, 2026

Debezium Series, Part 2: Setting Up Debezium

Hands-on Docker Compose setup with PostgreSQL, Kafka, Kafka Connect, and the Debezium connector. See your first change event in under 10 minutes.

Streaming DebeziumKafka

→

Feb 25, 2026

Apache Pekko Series, Part 3: Persistence & Event Sourcing

How EventSourcedBehavior works in Pekko: journals, snapshots, and recovery. Build actors whose state survives restarts by recording every change as an immutable event.

Streaming PekkoScala

→

Feb 24, 2026

Debezium Series, Part 1: How CDC Works

Log-based vs query-based CDC, how PostgreSQL WAL and MySQL binlog work, what Debezium reads, and at-least-once delivery guarantees explained.

Streaming DebeziumKafka

→

Feb 24, 2026

Apache Pekko Series, Part 2: Actor Lifecycle & Supervision

How actors start, fail, and recover. Parent-child supervision hierarchies, restart vs stop vs escalate strategies, and building self-healing systems in Pekko.

Streaming PekkoScala

→

Feb 23, 2026

Debezium Series, Part 0: Overview

A practical guide to Change Data Capture with Debezium — from WAL internals to Delta Lake and Iceberg sinks. What you'll learn and why CDC matters.

Streaming DebeziumKafka

→

Feb 23, 2026

Apache Pekko Series, Part 0: Overview

A practical guide to building concurrent, distributed, and resilient systems with Apache Pekko — the open-source fork of Akka. What you'll learn and why Pekko matters.

Streaming PekkoScala

→

Feb 23, 2026

Apache Pekko Series, Part 1: The Actor Model

What an actor is, how message passing replaces shared state, and how to create your first ActorSystem in Scala with Pekko Typed.

Streaming PekkoScala

→

May 19, 2024

Kafka Series, Part 6: Kafka Streams

Stream processing natively inside Kafka — KStream vs KTable, stateful aggregations, joins, windowing, and state stores.

Streaming Kafka

→

May 12, 2024

Kafka Series, Part 5: Kafka Connect

Moving data in and out of Kafka without writing custom code — connectors, transforms, and running Connect in production.

Streaming Kafka

→

May 5, 2024

Kafka Series, Part 4: Reliability & Operations

Replication, in-sync replicas, durability guarantees, and operational concerns for running Kafka in production.

Streaming Kafka

→

Apr 28, 2024

Kafka Series, Part 3: Consumers & Consumer Groups

Reading from Kafka at scale — consumer groups, partition assignment, offset commits, and handling rebalances.

Streaming Kafka

→

Apr 21, 2024

Kafka Series, Part 2: Producers

Writing to Kafka reliably — the producer API, batching, compression, delivery guarantees, and idempotent producers.

Streaming Kafka

→

Apr 14, 2024

Kafka Series, Part 1: Topics, Partitions & Offsets

The core data model behind Kafka — how topics are structured, why partitions matter, and how offsets track consumer position.

Streaming Kafka

→

Apr 7, 2024

Kafka Series, Part 0: Overview

What is Apache Kafka, what problem does it solve, and when should you use it? A roadmap for the series.

Streaming Kafka

→

Apr 7, 2024

Spark Streaming Series, Part 5: Operations and Tuning

Checkpointing, fault tolerance, exactly-once semantics, monitoring, and production performance tuning.

Streaming Spark

→

Mar 31, 2024

Spark Streaming Series, Part 4: Stateful Processing

Per-key state tracking across events, timeouts, and RocksDB state stores for complex streaming logic.

Streaming Spark

→

Mar 24, 2024

Spark Streaming Series, Part 3: Time, Watermarks, and Windows

Event time vs processing time, watermarks to handle late data, and window types for time-based aggregations.

Streaming Spark

→

Mar 17, 2024

Spark Streaming Series, Part 2: Sources and Sinks

Reading from Kafka and files, writing to Delta Lake and databases — the connectors that power real-time pipelines.

Streaming Spark

→

Mar 10, 2024

Flink Series, Part 5: Performance & Production

Making Flink production-ready — diagnosing backpressure, tuning parallelism, sizing network buffers, and monitoring with metrics.

Streaming Flink

→

Mar 10, 2024

Spark Streaming Series, Part 1: Structured Streaming Fundamentals

The unbounded table model — how Spark Streaming treats streams as infinite DataFrames, with output modes, triggers, and writing.

Streaming Spark

→

Mar 3, 2024

Flink Series, Part 4: Exactly-Once & Checkpointing

How Flink guarantees end-to-end correctness after failures — Chandy-Lamport barriers, two-phase commit, checkpoints vs savepoints.

Streaming Flink

→

Mar 3, 2024

Spark Streaming Series, Part 0: Overview

Stream processing with Apache Spark — from basics to Structured Streaming, the modern architecture for real-time data pipelines.

Streaming Spark

→

Feb 25, 2024

Flink Series, Part 3: State Management

How Flink stores and manages state — keyed vs operator state, state backends, TTL, and practical stateful patterns.

Streaming Flink

→

Feb 18, 2024

Flink Series, Part 2: Time & Windows

Flink's most powerful feature — temporal reasoning over streams. Event time, watermarks, and window types explained.

Streaming Flink

→

Feb 11, 2024

Flink Series, Part 1: DataStream API

The fundamental building block of Flink — how to read, transform, and write streams using the DataStream API.

Streaming Flink

→

Feb 4, 2024

Flink Series, Part 0: Overview

What is Apache Flink, what problem does it solve, and how does it differ from Spark Streaming? A roadmap for the series.

Streaming Flink

→

Nov 20, 2023

How Flink's Exactly-Once Semantics Actually Work

A deep dive into Flink's checkpointing mechanism and how it guarantees exactly-once processing even when jobs fail and restart.

Streaming Flink

→