August 2024 View Project ↗

Clickstreamer — Real-time Clickstream Pipeline

End-to-end clickstream analytics pipeline using Kafka, Apache Flink, ClickHouse, and Grafana with a full Docker Compose setup.


Overview

Clickstreamer is a self-contained demo of a real-time clickstream analytics stack. Pageview events are generated and ingested via Kafka (using the Confluent Datagen connector), processed by Apache Flink, stored in ClickHouse for fast OLAP queries, and visualized in Grafana — all wired together with Docker Compose.

Architecture

[Confluent Datagen] → Kafka → [Flink Jobs] → ClickHouse → Grafana

Features

  • Event simulation — Confluent Datagen connector produces realistic pageview events into Kafka topics
  • Stream processing — Flink consumes Kafka topics, applies windowed aggregations, and sinks to ClickHouse
  • Fast analytics — ClickHouse columnar storage enables sub-second query response on millions of rows
  • Live dashboards — Grafana dashboards show real-time page traffic, top pages, and session metrics
  • One-command setup — Full stack runs with docker compose up

Tech Stack

  • Apache Kafka + Confluent Datagen — event streaming and source simulation
  • Apache Flink — stateful stream processing
  • ClickHouse — columnar OLAP database
  • Grafana — real-time visualization
  • Docker Compose — local orchestration
← Back to Projects