The ext4 Gap in OpenBSD's Filesystem Layer, and What It Takes to Close It
Source: lobsters
An article on LWN covers a developer’s attempt to add ext4 read support to OpenBSD using AI-assisted development, colloquially known as “vibe coding.” The combination is genuinely interesting: a notoriously conservative, correctness-focused operating system, a filesystem format change that breaks assumptions baked into a driver that has been largely dormant for years, and a development method that trades understanding for iteration speed. The technical gap being addressed is real, even if the approach is unconventional.
What OpenBSD’s ext2fs Driver Actually Does
OpenBSD has carried an ext2fs driver in its kernel tree for a long time, living under sys/ufs/ext2fs/. It descends from NetBSD’s implementation, which was itself ported from Linux’s ext2 code in the mid-1990s. The driver handles ext2 volumes correctly and mounts ext3 volumes by ignoring the journal, which is safe for read-only access when the filesystem was cleanly unmounted.
ext4, however, is a different story. The driver does not support it at all in any meaningful sense. If you try to mount an ext4 volume that uses extents, the driver will either refuse to mount or produce garbage, depending on exactly which feature flags are set. This has been the state of affairs since ext4 became the default filesystem in Ubuntu 9.10 in 2009, which means the gap has been open for roughly fifteen years.
For most OpenBSD users this is a background irritant rather than a showstopper. Dual-booting, sharing USB drives formatted on Linux, reading VM disk images: all of these hit the same wall. FreeBSD users have not had this problem for several years, which makes the gap more conspicuous.
The Extent Tree Is the Core Problem
The reason ext4 cannot simply be handled by updating the old ext2 driver with a few tweaks comes down to one incompatible on-disk format change: the extent tree.
ext2 and ext3 use a block map model for file data. Each inode contains a 60-byte i_block array that holds 12 direct block pointers, one single-indirect pointer, one double-indirect pointer, and one triple-indirect pointer. To find the physical blocks containing a file’s data, the kernel follows this tree of pointers. The BSD drivers know this model thoroughly; the whole block mapping path in ext2fs_bmap.c is built around it.
ext4 replaces this with an extent tree. The same 60-byte i_block area now stores an ext4_extent_header followed by either leaf nodes (ext4_extent) or index nodes (ext4_extent_idx):
struct ext4_extent_header {
__le16 eh_magic; /* 0xF30A */
__le16 eh_entries;
__le16 eh_max;
__le16 eh_depth; /* 0 = leaf */
__le32 eh_generation;
};
struct ext4_extent {
__le32 ee_block; /* first logical block */
__le16 ee_len; /* length, up to 32768 blocks */
__le16 ee_start_hi;
__le32 ee_start_lo;
};
A leaf node describes a contiguous run of logical blocks mapping to a contiguous run of physical blocks. Large files with scattered allocations have deeper trees, with index nodes pointing to additional blocks that contain further extent headers and leaves. The magic number 0xF30A is how you tell a valid extent tree from a corrupted inode.
When the old ext2fs driver reads an extent-mapped inode, it interprets i_block[0] through i_block[11] as direct block pointers. Those bytes actually contain the extent header and the first extent. The result is that every read from an ext4 file with extents produces either wrong data or a kernel panic. There is no graceful degradation.
The driver does check feature flags in the superblock before mounting. EXT4_FEATURE_INCOMPAT_EXTENTS (0x0040) is an incompatible feature flag, meaning a driver that does not understand it is required to refuse to mount the filesystem. If OpenBSD’s driver is correctly implementing that check, it will reject the mount. If the check is incomplete or missing, the mount proceeds and the data corruption begins. Either way, ext4 volumes with extents are effectively inaccessible.
What FreeBSD Did
FreeBSD’s ext2fs implementation (sys/fs/ext2fs/) received extent support over several releases, roughly around FreeBSD 10 through 12. The work involved adding an ext4_ext_find_extent() function to walk the extent tree, detecting the EXT4_INODE_EXTENTS flag per inode, and dispatching the block mapping logic accordingly. The result is read support for most ext4 volumes. Write support for extent-mapped files remains incomplete and is not recommended for production use.
FreeBSD’s approach is instructive: you can add ext4 read support to a BSD ext2fs driver without rewriting everything. The block map path remains intact for ext2 and ext3 volumes. ext4 volumes get a parallel code path that handles extent tree traversal. The superblock parsing gets extended to check the 64-bit block support flag (EXT4_FEATURE_INCOMPAT_64BIT, 0x0080) and widen address arithmetic accordingly. Journal replay is skipped for read-only mounts of cleanly-unmounted filesystems, just as with ext3.
This is the template that any OpenBSD implementation would follow. The question is how much of that work can be done correctly by an AI generating kernel C code.
Vibe Coding and the Kernel Problem
Andrej Karpathy coined the term “vibe coding” in February 2025 to describe a development style where you describe what you want in natural language, run the output, paste errors back, and iterate without necessarily developing a deep understanding of the generated code. His examples were web apps and scripts. The method trades comprehension for speed, and it works reasonably well for code where bugs are caught quickly and the blast radius is a crashed process.
Filesystem drivers do not have those properties. A subtle error in the extent tree traversal, such as an off-by-one in the index node descent or a missing bounds check on eh_entries, does not crash a process. It reads the wrong physical block, returns wrong data, or scribbles over an unrelated buffer. If the driver has write support, a metadata error corrupts the filesystem in ways that may not be apparent until weeks later. The OpenBSD kernel has no equivalent of Linux’s filesystem fuzzing infrastructure or KASAN in-tree, which means verification depends on manual testing and code review.
This is not an argument that AI assistance cannot work for kernel code. It is an argument that the iteration loop is much slower and the correctness bar is much higher. For extent tree traversal specifically, the right mental model requires understanding that eh_depth == 0 means you are at the leaf level, that index nodes store the physical block address of the next level in split ei_leaf_lo and ei_leaf_hi fields, and that the maximum number of extents per leaf node is constrained by the 60-byte i_block area. Getting any of those details wrong produces code that looks plausible and fails on edge cases.
The LWN piece frames the community response as cautious but not dismissive. The OpenBSD project has a well-established culture of preferring fewer, better-reviewed commits over high volume. Theo de Raadt has historically been direct about code quality expectations on the tech@ mailing list. A vibe-coded driver patch is going to face scrutiny that is independent of how it was produced, because the output is what goes into the CVS tree.
The Other ext4 Complications
Extents are the main event, but a complete ext4 implementation also has to handle several other incompatible features that modern Linux filesystems commonly enable.
64-bit block numbers (EXT4_FEATURE_INCOMPAT_64BIT) extend block group descriptors from 32 to 64 bytes, with the high 32 bits of block bitmap addresses, inode table addresses, and counts stored in the upper half. A driver that reads only the low 32 bits will silently truncate addresses on large volumes and read the wrong blocks.
Metadata checksums (EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) attach CRC32c checksums to block group descriptors, inodes, and directory entries. This is a read-only-compatible feature, meaning a driver without checksum support can still mount the filesystem; it just will not validate the checksums. For a read-only driver this is acceptable. Skipping checksum validation does mean silent data corruption goes undetected if a block is misread.
Inline data (EXT4_FEATURE_INCOMPAT_INLINE_DATA) stores small file contents directly in the inode’s i_block area instead of in separate data blocks. A driver that always tries to read data through the normal block or extent path will misread these files. This is another incompatible feature, so the driver must refuse to mount if it sees the flag and does not handle it.
Flexible block groups (EXT4_FEATURE_INCOMPAT_FLEX_BG) group multiple block groups together so that their metadata can be packed contiguously. This changes how block group descriptors are located, which affects the mount logic.
For read-only support of typical modern ext4 volumes, you need extents, 64-bit blocks, and some form of flex_bg handling at minimum. Metadata checksums are important for data integrity but technically optional. Inline data support matters for directories with many small entries and for files written by programs that explicitly enable it.
Where This Leaves the Project
The existence of a vibe-coded ext4 patch for OpenBSD is interesting regardless of its immediate commit prospects, because it forces a concrete conversation about what the implementation actually requires. The OpenBSD source tree has carried ext2fs code that does not handle ext4 for long enough that the gap has become background noise. A working, reviewable patch, however it was produced, makes that gap visible and gives the project something concrete to evaluate.
The more durable question is whether read-only ext4 support, with the extent tree handled correctly and 64-bit block numbers supported, can be produced at the quality level OpenBSD merges. FreeBSD’s trajectory suggests it is achievable. The work is bounded: the on-disk format is stable and well-documented in the Linux kernel documentation, the VFS integration points are already there from the existing ext2fs driver, and the scope for read-only support is narrow enough that a careful implementation does not need to touch the write path at all.
How much of that implementation can be produced by AI and how much needs to be hand-written and hand-reviewed is the real experiment being conducted here. The ext4 format is consistent and well-specified enough that an LLM can produce structurally correct code for the common cases. Whether it handles the edge cases, the error paths, and the VFS locking discipline correctly is what a code review will determine. That review is the part vibe coding cannot skip.