Teaching OpenBSD's 30-Year-Old ext2fs Driver to Read ext4

OpenBSD ships with an ext2fs driver that traces its lineage back roughly three decades. It came from NetBSD, which imported it from Linux circa 1994, when Remy Card’s ext2 filesystem was the standard Linux choice. The driver has been maintained carefully since then, gained enough ext3 support to ignore the journal on read-only mounts, and quietly refused to touch anything with the ext4 extent feature flag set. A recent LWN article covers someone changing that, with notable help from an LLM.

The work is technically interesting independent of how it was written. Understanding why ext4 support requires non-trivial surgery to the existing driver means understanding one specific structural decision the ext4 developers made around 2006 to 2008: replacing indirect block pointers with an extent B-tree.

The Indirect Block Problem

ext2 stores file block locations in the inode’s i_block[15] array. The first twelve entries are direct, each holding the physical block number of a data block. Entry 12 holds a pointer to an indirect block, which is a full filesystem block containing block pointers. Entry 13 is doubly-indirect (pointer to a block of pointers to blocks). Entry 14 is triply-indirect.

This scheme works but has two problems at scale. For a fragmented file, each logical block maps to a different physical block, so large files require large, deep pointer trees. Even for a perfectly contiguous 1 GB file, you still need to store roughly 256,000 individual block pointers spread across multiple indirect blocks, all of which must be read during random access.

OpenBSD’s ext2fs block mapper in ext2fs_bmap.c implements this traversal. Given a logical block number, it determines which tier of indirection applies, reads the appropriate indirect blocks from the buffer cache, and returns the physical block address. That file is the core of what needed to change.

How ext4 Extents Change Everything

ext4 replaces the indirect scheme with an extent tree. The same i_block[0..3] space in the inode (60 bytes) now holds an extent tree header followed by either leaf extents or index nodes.

A leaf extent (struct ext4_extent) is 12 bytes:

struct ext4_extent {
    __le32 ee_block;    /* first logical block extent covers */
    __le16 ee_len;      /* number of blocks */
    __le16 ee_start_hi; /* high 16 bits of physical start block */
    __le32 ee_start_lo; /* low 32 bits of physical start block */
};

An uninitialized (preallocated) extent sets the high bit of ee_len; for initialized extents the maximum length is 32,768 blocks, which is 128 MB with 4 KB blocks. Four extents fit directly in the inode. When a file needs more, the tree grows: index nodes (struct ext4_extent_idx) hold the physical block address of child nodes, and the header’s eh_depth field tells you how deep the tree is.

The read algorithm is a B-tree traversal. Check the header magic (0xf30a), binary search the current node for the target logical block, follow index pointers down through eh_depth levels, then binary search the leaf node for the covering extent. If the logical block falls within [ee_block, ee_block + ee_len), the physical address is ((ee_start_hi << 32) | ee_start_lo) + (logical - ee_block).

This is the code ext2fs_bmap.c needs to grow. Files with the EXT4_INODE_EXTENTS flag (bit 15 of i_flags) take the extent path instead of the indirect block traversal. The two paths coexist because ext4 volumes can contain older files using the legacy scheme.

What Else Needs to Change

The extent tree is the core change, but not the only one.

64-bit block numbers. ext4 filesystems with the INCOMPAT_64BIT superblock feature flag use 64-bit block addresses throughout. Group descriptors gain _hi suffix fields: bg_block_bitmap_hi, bg_inode_bitmap_hi, bg_inode_table_hi. The low 32 bits sit in the same position as the ext2 group descriptor, so the struct layout is additive, but any code that reads group descriptors must assemble the full 64-bit address when the flag is present.

Large inodes. ext4’s default inode size is 256 bytes, double the ext2 128-byte inode. The i_extra_isize field at byte offset 128 of each inode says how much extra data follows. The extra space holds nanosecond timestamps (i_crtime, i_mtime_extra, and friends), a version counter, and in-inode extended attribute storage. For a read-only driver, getting the inode table stride right is essential: each inode slot is s_inode_size bytes wide as declared in the superblock, not a hardcoded 128.

Metadata checksums. ext4 with RO_COMPAT_METADATA_CSUM stores CRC32c checksums over the superblock, group descriptors, bitmaps, and inode table blocks. A read-only driver can skip validating them and still function. Validating them gives the user confidence that the data coming off the disk is intact, which matters for a driver whose stated purpose is data access from foreign systems.

Feature flag gating. OpenBSD’s mount code rejects filesystems with unrecognized incompatible features by comparing the on-disk s_feature_incompat field against a compile-time mask EXT2F_INCOMPAT_SUPP. Enabling ext4 support means extending that mask to include EXT2F_INCOMPAT_EXTENTS, EXT2F_INCOMPAT_64BIT, EXT2F_INCOMPAT_FLEX_BG, and EXT2F_INCOMPAT_DIRDATA, while leaving others like inline data, encryption, and casefold as rejection triggers until those are explicitly supported.

Vibe Coding at the Systems Level

The LWN article frames this as a “vibe coded” implementation, referencing the style of LLM-assisted programming that Andrej Karpathy described in early 2025: working through prompts rather than direct line-by-line authorship, letting the model handle struct layouts and algorithm bodies while the human focuses on architecture and verification.

Systems filesystem code is an interesting case for this approach because the specification is unusually complete. The ext4 disk layout documentation, the Linux kernel source in fs/ext4/, and the e2fsprogs userspace tools together constitute a reference detailed enough that an LLM can generate structurally correct extent tree traversal code. The OpenBSD codebase provides the surrounding scaffold: VFS hooks, buffer cache calls via bread(), mount infrastructure, inode locking. The human’s job is knowing what to ask for and how to test the output.

The risk in kernel space is the cost of subtle errors. A misread inode table stride causes silent wrong data rather than a crash. An off-by-one in the extent binary search returns a wrong physical block, and you get garbage reads on files whose extents happen to land in the affected range. The testing methodology matters enormously: mounting disk images created by a Linux system, comparing file contents byte-for-byte against known values, and probing edge cases like extent trees with eh_depth > 0 and extents that span block group boundaries.

This is a useful test of where LLM-generated systems code currently stands. The algorithmic structure of a B-tree traversal is well within what modern models produce reliably. The subtler parts, reading 64-bit fields from misaligned descriptors, handling the ee_len high bit for uninitialized extents, respecting i_extra_isize before accessing inode extension fields, require careful prompting and verification. The model can get the shape right while leaving correctness details to the human reviewer.

Why This Matters for OpenBSD

OpenBSD’s approach to foreign filesystem support has always been conservative: read-only where possible, mounting for data access rather than primary use. ext2fs fits that pattern, and ext4 read support continues it.

The practical argument is direct. Since Linux 2.6.28 in 2008, ext4 has been the default filesystem for most major Linux distributions. Virtually every Linux USB drive, external disk, or data partition a user might want to access from an OpenBSD machine is formatted as ext4 with extents enabled by default. Without extent support, the driver cannot read any file created by a modern Linux system on such a volume, because tune2fs -O extents has been on by default for nearly two decades.

The patch brings OpenBSD’s ext2fs driver from roughly 2001 compatibility to 2008 compatibility. That is not a criticism; read-only foreign filesystem support is low-priority maintenance work, and the OpenBSD team allocates their effort carefully. What makes the vibe-coded approach notable here is that it compressed what would otherwise be a careful multi-week driver study into a faster iteration cycle, with the LLM handling the mechanical translation of the ext4 spec into C struct layouts and B-tree traversal code.

The work is a reasonable example of what read-only filesystem support actually requires: not a new driver, but a carefully bounded set of additions to an existing one, guided by a well-specified on-disk format and tested against real disk images. The 30-year-old driver gets a few hundred lines of new code, and OpenBSD users can finally mount the Linux disk sitting next to it on the shelf.