Projects

Blog Posts

Apache Pekko Series, Part 9: Production Best Practices

Running Pekko in production: Kafka connectors, OpenTelemetry distributed tracing, health checks, dispatcher tuning, Kubernetes deployment, and migrating from Akka.

Kubernetes Series, Part 0: Overview

What is Kubernetes, what problem it solves over bare metal and Docker, and a roadmap for running data workloads on K8s.

Kubernetes Series, Part 1: Core Concepts

Pods, Deployments, Services, ConfigMaps, and Namespaces — the essential vocabulary every K8s user must know.

Kubernetes Series, Part 2: Storage and Configuration

PersistentVolumes, PersistentVolumeClaims, StorageClasses, Secrets, and ConfigMaps — how stateful data workloads survive pod restarts.

Kubernetes Series, Part 3: Workload Patterns for Data Engineering

StatefulSets, Jobs, CronJobs, and DaemonSets — the right workload type for each data engineering use case.

Kubernetes Series, Part 4: Running Spark on Kubernetes

Submitting Spark jobs natively to K8s, the Spark Operator, executor resource sizing, and shuffle storage.

Kubernetes Series, Part 5: Running Flink and Kafka on Kubernetes

Deploying Flink with the Flink Kubernetes Operator and Kafka with Strimzi — the streaming stack on K8s.

Kubernetes Series, Part 6: Production Operations

Resource quotas, autoscaling (HPA/KEDA), monitoring with Prometheus and Grafana, and cluster cost management for data platforms.