Debezium Series, Part 6: Handling Schema Changes
What happens when someone alters a table. DDL propagation, Schema Registry integration, breaking vs non-breaking changes, and strategies to evolve without downtime.
Schema changes are inevitable. Columns get added, renamed, or dropped as applications evolve. In a traditional batch pipeline, a schema change is a scheduled event — you update the ETL script and rerun. In a CDC pipeline, schema changes happen in the middle of a live stream and must be handled without dropping data or breaking consumers.
What Happens on a Schema Change
When a table schema changes, Debezium detects it through the replication log and updates the Kafka message schema accordingly.
For PostgreSQL, the schema is embedded in each message (or registered in Schema Registry). When a column is added, subsequent events include the new column. For MySQL, the DDL statement is recorded in the schema history topic and replayed on connector restart.
The core problem: consumers that were built against the old schema may break when they encounter new schema events.
Non-Breaking vs Breaking Changes
Non-Breaking Changes (Safe)
These changes do not break existing consumers:
| Change | Impact |
|---|---|
| Add a nullable column with default | New field appears in after; old consumers ignore unknown fields |
| Add a new table | New topic created; existing consumers unaffected |
| Increase column length (e.g., VARCHAR(50) → VARCHAR(200)) | No change to event structure |
| Add an index | Not visible in events |
Breaking Changes (Dangerous)
These changes can break existing consumers:
| Change | Impact |
|---|---|
| Drop a column | Field disappears from events; consumers expecting it fail |
| Rename a column | Old field gone, new field appears; consumers using old name break |
| Change column type (e.g., INT → TEXT) | Value type in event changes; consumers fail on deserialization |
| Add a NOT NULL column without a default | Insert events start carrying the new field as required |
Schema Registry Integration
Without Schema Registry, every Debezium message embeds its full schema — which is verbose and makes schema evolution harder to coordinate. With Schema Registry, schemas are registered centrally and messages carry only a compact schema ID.
┌──────────┐ register schema ┌─────────────────┐
│ Debezium │ ─────────────────► │ Schema Registry │
│ │ ◄───── schema ID ── │ │
└────┬─────┘ └─────────────────┘
│ Kafka message: [schema_id | payload]
▼
┌──────────┐ fetch schema by ID ┌─────────────────┐
│ Consumer │ ─────────────────────► │ Schema Registry │
└──────────┘ └─────────────────┘
Configuring Debezium with Schema Registry
{
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://schema-registry:8081"
}
With Avro serialization, messages are binary-compact and schemas are managed centrally.
Compatibility Modes
Schema Registry enforces a compatibility mode per subject (topic). The mode determines which schema changes are allowed:
| Mode | Allowed changes |
|---|---|
BACKWARD | Consumers with new schema can read old messages. Add fields with defaults; delete fields. |
FORWARD | Consumers with old schema can read new messages. Add fields; old consumers ignore new fields. |
FULL | Both backward and forward. Only add optional fields with defaults. |
NONE | No compatibility check. Any change allowed. |
For CDC pipelines, FORWARD or FULL is recommended — new events should be readable by consumers that haven’t been updated yet.
# Set compatibility for a subject
curl -X PUT http://schema-registry:8081/config/shop.public.orders-value \
-H "Content-Type: application/json" \
-d '{"compatibility": "FORWARD"}'
Handling Specific Schema Changes
Adding a Column
ALTER TABLE orders ADD COLUMN discount NUMERIC(5,2) DEFAULT 0;
Debezium detects the change on the next WAL read. Subsequent events include the discount field. The Schema Registry registers a new schema version.
Consumers using FORWARD compatibility can ignore the new field until they are updated. Consumers using strict schema validation will fail — update them before the DDL is applied, or use a lenient deserializer.
Dropping a Column
ALTER TABLE orders DROP COLUMN internal_note;
After the DROP, events no longer contain internal_note. Any consumer that requires this field will break. The safe sequence:
- Update all consumers to not require
internal_note - Deploy consumers
- Run the DDL
- Verify no consumer errors
- Remove
internal_notefrom consumer code
Renaming a Column
PostgreSQL does not propagate RENAME COLUMN through logical replication in a way that Debezium can detect atomically. The safest approach:
- Add the new column (
customer_name) - Copy data via a trigger or application logic
- Update consumers to read
customer_name - Drop the old column (
customer)
This is a multi-step migration, not a single ALTER TABLE.
Changing Column Type
Type changes are the most disruptive. An INT → BIGINT change is relatively safe (consumers expecting INT may overflow on very large values, but usually work). An INT → TEXT change is a hard break.
Approach:
- Add a new column with the target type
- Backfill via application or migration script
- Switch application writes to the new column
- Update consumers
- Drop the old column
MySQL Schema History Topic
MySQL records every DDL statement in the schema history topic. When Debezium restarts, it replays this history to reconstruct the schema at the current binlog position.
"schema.history.internal.kafka.topic": "schema-changes.shop",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092"
Critical: this topic must never be deleted or compacted. Set cleanup.policy=delete with retention.ms=-1 (infinite retention):
kafka-topics.sh --alter \
--topic schema-changes.shop \
--config retention.ms=-1 \
--bootstrap-server kafka:9092
If the schema history topic is lost, the connector cannot reconstruct the schema and must be reset with a fresh snapshot.
PostgreSQL and DDL
PostgreSQL’s logical replication does not propagate DDL changes the same way MySQL does. Schema changes in PostgreSQL are detected by Debezium when the first event arrives after the DDL — because the WAL record includes the new row format.
This means:
- No separate schema history topic is needed for PostgreSQL
- Schema changes are picked up immediately on the next DML
- The schema embedded in events (or registered in Schema Registry) updates automatically
Practical Strategy: Schema Change Runbook
For production systems, follow a structured runbook for every schema change:
1. Pre-change
□ Check active consumers and their schema version tolerance
□ Verify Schema Registry compatibility mode is FORWARD or FULL
□ Test the DDL in staging with a running Debezium pipeline
2. Apply change
□ Run DDL in database
□ Verify Debezium connector status is still RUNNING
□ Inspect new event schema in Schema Registry
3. Post-change
□ Check consumer error rates in monitoring
□ Update consumers to handle new schema
□ Remove backward-compatibility code after all consumers are updated
Key Takeaways
- Non-breaking changes (add nullable column, add table) are safe; breaking changes (drop, rename, type change) require coordination
- Schema Registry with
FORWARDorFULLcompatibility prevents breaking consumers during schema evolution - For dropping or renaming columns: update consumers first, then apply the DDL
- MySQL requires a persistent schema history topic — never delete it
- PostgreSQL detects schema changes automatically from the WAL; no schema history topic needed
- Treat schema changes as a deployment: plan, stage, and roll back if needed
Next: Snapshotting