Ext4 on OpenBSD, Written by Vibes: What This Experiment Actually Shows

Andrej Karpathy coined the term “vibe coding” in early 2025 to describe a mode of software development where the programmer leans heavily on an AI assistant, accepting suggestions with minimal scrutiny and iterating on feel rather than full comprehension. It generated the expected range of reactions. For frontend glue code or one-off scripts, the debate is mostly about productivity. For kernel filesystem drivers, the stakes are different, and a recent experiment covered by LWN puts that gap in concrete terms: someone used AI-assisted vibe coding to implement ext4 read support for OpenBSD.

The result is worth examining not for the hype angle but for what it reveals about the technical gap between ext2 and ext4, the shape of the OpenBSD VFS interface, and the specific ways AI-generated kernel code can be subtly wrong even when it compiles and mostly works.

What OpenBSD Already Had, and Why It Wasn’t Enough

OpenBSD has carried read-only ext2 support for years, inherited from the common BSD ext2fs lineage that FreeBSD also ships. The FreeBSD ext2fs driver is a more complete reference point: it handles both ext2 and ext3, including the htree directory index (also known as dir_index, which is a B-tree structure over directory entries that replaces linear scans), and it has been maintained alongside the rest of FreeBSD’s VFS layer as a real production driver.

Ext2 and ext3 are, from an on-disk perspective, largely the same format. Ext3 added the JBD (Journaling Block Device) journal, but the data structures for inodes, block groups, and directory entries remained compatible. An ext2 reader that skips the journal can mount an ext3 filesystem and read it without any format-level changes.

Ext4 is a different story. The headline feature is the extents tree, and it changes the fundamental block mapping representation in ways that an ext2 reader cannot handle without explicit support.

The Extents Tree: What Changed and Why It Matters

In ext2 and ext3, an inode maps logical file blocks to physical disk blocks through a three-level indirect block tree. The inode structure has 15 block pointers: 12 direct, one indirect (pointing to a block of pointers), one double-indirect, and one triple-indirect. For large files, this results in significant metadata overhead and poor locality: reading a large sequential file requires multiple block-pointer lookups that may themselves be scattered across the disk.

Ext4 replaces this with an extents tree. The same 60 bytes in the inode that previously held 15 block pointers now hold an ext4_extent_header followed by up to four ext4_extent records if the tree fits in the inode, or an ext4_extent_idx pointing to external extent tree nodes if it doesn’t. A single extent record looks like this:

struct ext4_extent {
    __le32 ee_block;    /* first logical block extent covers */
    __le16 ee_len;      /* number of blocks covered */
    __le16 ee_start_hi; /* high 16 bits of physical block number */
    __le32 ee_start_lo; /* low 32 bits of physical block number */
};

A single extent can cover up to 32,768 contiguous blocks (with the high bit of ee_len reserved for unwritten extents, which are preallocated but not yet initialized). For a file written sequentially to a non-fragmented filesystem, the entire block map might fit in two or three extents inside the inode itself, with no indirect block reads at all.

An ext2 reader that encounters an inode with the EXT4_EXTENTS_FL flag set and tries to interpret the extent tree as a legacy block pointer array will produce garbage. The 60 bytes look superficially similar, the block pointer fields overlap in layout, but the semantics are entirely different. Mounting an ext4 filesystem with a legacy driver and then reading a large file is not just slow or incorrect; it reads the wrong physical blocks entirely.

JBD2 and the Journal Problem

Ext3 used JBD (Journal Block Device) for its journaling layer. Ext4 upgraded this to JBD2, with a wider 64-bit block numbering scheme and support for barriers and checksums. The journal itself lives in a reserved inode (inode 8 by default) and is structured as a circular log of transaction records.

For a read-only driver, journal replay is the critical concern. If a filesystem was not cleanly unmounted, the journal may contain committed transactions whose blocks have not been written to their final locations. A read-only driver that ignores the journal will see stale data in those blocks. The correct behavior for a read-only mount is to replay committed transactions before exposing the filesystem to the VFS layer, then refuse to write anything, including journal updates.

This is not optional. On a filesystem that crashed mid-write, the directory tree may be inconsistent in ways that only the journal can resolve. A driver that skips replay and reads anyway might return directory entries pointing to deallocated inodes, or return old file contents when the journal holds a newer version.

FreeBSD’s ext2fs driver handles this; it calls into its JBD implementation at mount time to check the journal state and replay if necessary. Getting this right in a new driver requires careful reading of the ext4 specification and the JBD2 source, not just a rough structural approximation.

The OpenBSD VFS Layer

OpenBSD’s VFS interface follows the standard BSD vnode model. A filesystem driver registers a vfsops structure with mount, unmount, root, and sync operations, and a vnodeops structure covering the per-vnode operations: lookup, open, close, read, write, getattr, setattr, readdir, readlink, inactive, reclaim, and several others.

The lookup vnode operation is where most of the complexity lives for a read-only driver. Given a directory vnode and a component name, it must search the directory and return a vnode for the result. For ext4, this means walking the htree index if dir_index is enabled (which it is by default on any ext4 filesystem created in the last decade), or falling back to a linear scan of the directory’s data blocks if the index is absent.

The htree structure is a two-level B-tree keyed on a hash of the filename. The root block of the directory’s data contains an ext4_dx_root header followed by entries mapping hash ranges to leaf blocks. Each leaf block contains a linear array of ext4_dir_entry_2 records. The hash function is selectable (half-MD4, Tea hash, or legacy unsigned) and is stored in the superblock. Getting lookup wrong means either returning ENOENT for files that exist, returning wrong results for hash collisions, or accessing uninitialized memory if the tree node parsing is off by a field.

getattr is simpler but has its own ext4 wrinkles: nanosecond timestamps, 64-bit file sizes, and the i_extra_isize field that extends the inode beyond the base 128-byte size all need to be handled correctly, or stat calls return truncated or garbage data.

What Vibe Coding Produces Here

The LWN coverage describes an implementation that was largely AI-generated and largely works for common cases. The developer iterated on it by mounting test filesystems, trying operations, and fixing crashes as they appeared. That feedback loop is faster than reading every line of the ext4 specification before writing any code.

The failure modes that this approach tends to produce are not the dramatic ones. A driver that dereferences null on mount will fail immediately and visibly. The failure modes to worry about are the subtle ones: a journal replay that is skipped on unclean mounts, an extent tree walker that handles the common case but silently reads wrong blocks when the tree is deeper than one level, a hash computation that works for ASCII filenames but produces wrong results for certain byte sequences, a nanosecond timestamp field that is left unread so all files appear to have been modified at the epoch.

Filesystem correctness is dominated by edge cases. The common path, a clean mount with small files and short filenames, exercises maybe 30% of the code paths that matter. The rest is exercised by large files, deeply nested directories, filenames with non-ASCII characters, filesystems with inline data, filesystems with large extended attributes, and filesystems that crashed.

The specific problem with AI-generated code for this domain is not that the AI doesn’t know ext4. Current models have read the kernel source, the ext4 documentation, and a significant fraction of the filesystem driver corpus. They produce structurally plausible code. The problem is that plausible structure and correct behavior under all conditions are not the same thing, and the AI has no way to distinguish between the two when generating. It is producing the most likely continuation given the context, not verifying that the corner case on line 847 is handled consistently with the spec text that was in its training data.

What This Tells Us

The experiment is genuinely useful as a proof of concept and as a starting point for a real driver. The code exists, it mounts filesystems, it reads files. That is a non-trivial amount of scaffolding that would have taken longer to write from scratch without AI assistance.

FreeBSD’s ext2fs driver is the sensible comparison point for anyone wanting to build on this work. It has been production-hardened, handles the journal lifecycle correctly, and has accumulated fixes for obscure cases over years of use. An OpenBSD port of that driver, modernized for ext4 extents and JBD2, would be a more conservative path to a correct result than iterating a vibe-coded implementation toward correctness through issue reports.

The broader point is that vibe coding’s cost scales with the correctness requirements of the domain. For a REST endpoint that returns slightly wrong data occasionally, the iteration loop is fast and the cost of a bug is low. For a read-only filesystem driver, the cost of a silent data corruption bug is a user who trusts the output of a cat command and gets wrong data without any error. For a read-write driver, the cost is filesystem corruption on the mounted volume.

That is not an argument against using AI assistance for systems programming. It is an argument for knowing what the AI is good at in this context: producing plausible structure quickly, saving time on boilerplate, and getting to a working prototype faster than starting from a blank file. The gap between prototype and correct is where the human has to do the reading.