September 2025 View Project ↗

Kadita — Config-Driven Data Ingestion Platform

A Kubernetes-inspired YAML-configured data platform that ingests from Postgres, MySQL, MongoDB, Jira, Zendesk, and S3 into an Apache Iceberg data lake.


Overview

Kadita is a configuration-driven data ingestion platform inspired by Kubernetes resource manifests. Instead of writing ingestion code for each data source, you declare your sources and tables in YAML — and Kadita handles the rest, landing data into an Apache Iceberg table on S3.

Configuration Model

Kadita uses a DataSource / TableConfig separation, similar to how Kubernetes separates Deployment from Service:

# datasource.yaml
apiVersion: kadita/v1
kind: DataSource
metadata:
  name: my-postgres
spec:
  type: postgres
  host: db.example.com
  database: production
# table.yaml
apiVersion: kadita/v1
kind: TableConfig
metadata:
  name: users-table
spec:
  source: my-postgres
  table: users
  destination:
    format: iceberg
    path: s3://datalake/users/

Supported Sources

SourceType
PostgreSQLRelational DB
MySQLRelational DB
MongoDBDocument DB
JiraSaaS API
ZendeskSaaS API
S3 FilesObject Storage

Tech Stack

  • Apache Iceberg — open table format for the data lake
  • S3 — storage layer
  • Python — ingestion engine
  • YAML — declarative configuration
← Back to Projects