· 6 min read ·

Kafka's Storage Layer Was Always the Problem: What Ursa Gets Right

Source: lobsters

For years, the canonical way to get Kafka data into a lakehouse was to run a second system. You’d stand up Flink or Spark Streaming, wire it to your Kafka cluster, configure a sink connector, manage checkpoints, and hope the pipeline stayed healthy. The data lived in Kafka’s native log format until something consumed it and rewrote it into Parquet. The cost of that gap, in engineering hours and operational surface area, has been enormous.

Ursa is a storage engine for Kafka that puts Apache Iceberg at the center rather than at the end of the pipeline. Instead of treating Iceberg as a downstream destination, Ursa makes it the actual storage medium. The distinction sounds subtle, but it changes the entire operational picture.

What Kafka’s Storage Actually Looks Like Today

Kafka’s storage has always been a log: an append-only sequence of segment files per partition, with .log, .index, and .timeindex files organized under each partition directory. Each segment has a base offset in its filename. Reads scan the index to find the position, then seek into the log file. Writes are purely sequential appends.

This model is extremely fast for sequential write throughput. It was designed for one job: accept a high-velocity stream of messages and make them available to consumers with minimal latency. It does that well. What it does poorly is everything else. The format is opaque to analytical engines. There’s no schema embedded in the file layer. Predicate pushdown, column projection, and partition pruning don’t exist at the storage level. Every consumer that wants analytics has to read the full stream.

Kafka 3.6 shipped KIP-405 Tiered Storage, which lets brokers offload older log segments to remote object storage like S3. This was a real improvement for retention economics, but it kept the same format. The data on S3 is still Kafka’s proprietary segment files. Trino can’t query it. Spark can’t read it without going through the Kafka consumer API. Tiered storage solved the cost problem; it didn’t solve the format problem.

Projects like WarpStream and AutoMQ went further, rebuilding Kafka’s storage layer on top of object storage with a disaggregated architecture. WarpStream stores everything in S3 and eliminates inter-broker replication entirely by relying on S3’s durability. AutoMQ uses a similar approach with its cloud-native stream storage. Both make Kafka cheaper and more elastic. Neither changes what the data looks like on disk.

What “Iceberg-First” Actually Means

An “Iceberg-first” storage engine is not a connector. This distinction matters. The Apache Iceberg Kafka Connect sink is widely used and works well: it consumes from Kafka topics, batches records, writes Parquet files to object storage, and commits Iceberg metadata. But it’s a consumer. It reads Kafka’s format, transforms the data, and writes it somewhere else. The original Kafka data and the Iceberg data are two separate things with two separate lifecycles.

Ursa inverts this. When a producer writes a message to Kafka, the message goes directly into Iceberg’s storage model. The broker doesn’t maintain a separate Kafka log; the Iceberg table is the log. This means the data is immediately queryable by any Iceberg-compatible engine the moment it’s committed, without any additional pipeline.

From the producer side, nothing changes. Producers speak the Kafka protocol. From the consumer side, offset-based reads still work. But the underlying bytes are organized as Iceberg data files, with Iceberg metadata tracking what exists where.

The Write Path Tension

The hard engineering problem here is reconciling Kafka’s streaming write model with Iceberg’s commit semantics.

Iceberg’s write path is inherently batchy. You write a set of data files to object storage, then you commit by atomically swapping in a new table metadata file that points to a new snapshot. The commit is what makes the data visible. This is how Iceberg achieves isolation and ACID semantics. It’s also why Iceberg is typically used with micro-batch or batch jobs rather than per-record writes.

Kafka producers expect acknowledgment at the record level (or small batch level). You can’t tell a Kafka producer “your write will be visible in 30 seconds when the next Iceberg commit runs.” You also can’t commit a new Iceberg snapshot for every individual record without overwhelming the metadata layer. A busy topic might receive millions of messages per second across thousands of partitions.

The likely approach in Ursa, and the one that makes the most architectural sense, is a two-phase model. Writes land in a fast, durable staging buffer, similar to a write-ahead log, which handles the per-record acknowledgment semantics. A background process compacts staged records into proper Iceberg data files and commits snapshots on a configurable cadence. The Iceberg table’s “latest” state might lag production writes by seconds or minutes, but durability is immediate.

This is conceptually similar to what Apache Hudi’s write path does with its timeline and compaction model, or what Delta Lake does with its transaction log and checkpoint mechanism. The difference is that in Ursa’s case, this buffer-and-commit model is baked into the broker itself, not bolted on by an external job.

Consumer Offsets and Iceberg Snapshots

Kafka consumers track position with offsets: an integer that identifies where in a partition’s log a consumer has read. This is fundamental to Kafka’s consumer group semantics, rebalancing, and exactly-once processing.

Iceberg tracks history with snapshot IDs and sequence numbers. A snapshot is a point-in-time view of the table. Sequence numbers are monotonically increasing across all changes to a given partition’s data.

For Ursa to be Kafka-protocol compatible, it has to maintain a mapping between Kafka offsets and Iceberg positions. One approach is to treat each committed record’s position within an Iceberg data file as its offset, with sequence numbers providing the ordering. Another is to maintain a lightweight offset index alongside the Iceberg metadata that maps offset ranges to data file positions and row groups.

This offset translation layer is probably the most performance-sensitive piece of the system. Every consumer fetch request that goes to the broker needs to resolve an offset to a file location quickly, without a full metadata scan. Iceberg’s partition statistics and column statistics can help prune the search space, but the mapping still needs to be fast.

Schema Evolution Without the Registry Dance

One underappreciated benefit of making Iceberg the storage layer is what happens to schema management. In a conventional Kafka setup, schemas live in a separate Schema Registry. Producers register schemas, consumers look them up. Schema evolution is governed by compatibility rules (backward, forward, full). The storage layer is oblivious to all of this; it just stores bytes.

Iceberg has schema evolution built into the table format. Column additions, column renames, type promotions, and column reordering are all tracked in the table metadata and handled transparently at read time. Parquet and ORC files written with an older schema can be read correctly by a reader expecting a newer schema because Iceberg stores column IDs that are stable across schema changes, not just column names or positions.

With Ursa, the schema lives with the data in a way that any Iceberg reader understands natively. You still need agreement between producers on what they’re writing, but consumers don’t need to consult an external registry for every read. Analytics queries against the table benefit from column statistics and schema history without any additional tooling.

The Broader Implication

What Ursa represents is a consolidation of two systems that most data teams have been running in parallel for years. The real-time pipeline and the analytical storage layer have historically been separate concerns with separate engineering ownership, separate SLAs, and separate failure modes. Every team that runs both Kafka and a Hive/Iceberg/Delta data lake has a connector layer in between that breaks, falls behind, or requires tuning.

Making Iceberg the native storage for Kafka is a bet that the overhead of that connector layer outweighs the complexity of integrating the two models at the storage layer. Given how central both Kafka and Iceberg have become to the modern data stack, that bet seems reasonable. The hard work is in the details: write path latency, offset semantics, compaction scheduling, and metadata scalability at high partition counts.

Projects like Apache Paimon, which started as Flink Table Store and explicitly targets the streaming-to-lakehouse use case, show that the appetite for this kind of unified architecture is real. Ursa approaches the same problem from the broker side rather than the compute side. Both directions are worth watching, and how they converge will shape what the data infrastructure stack looks like over the next few years.

Was this interesting?