· 6 min read ·

Reading Linux Drives from OpenBSD: What Happens When an LLM Writes the Kernel Driver

Source: lobsters

OpenBSD can already mount ext2 and ext3 volumes. What it cannot do, at least not without patching, is mount a modern Linux ext4 filesystem. The reason is a single incompatible feature flag: EXT4_FEATURE_INCOMPAT_EXTENTS. Any volume using the extents tree, which is virtually every ext4 volume formatted in the last decade, will cause OpenBSD’s existing ext2fs driver to refuse the mount entirely. That refusal is correct behavior, because the extents tree is a fundamentally different block-addressing mechanism than what the driver knows how to read.

So there is a genuine, practical gap. OpenBSD users who dual-boot with Linux or need to read Linux drives have no good answer. The LWN coverage describes an attempt to close that gap using vibe coding: describing the desired changes to a large language model and iterating on the output until the code works.

This is worth examining carefully, not as a dismissal of the approach, but because filesystem code is one of the more unforgiving environments for testing what AI-assisted development can actually produce.

What ext4 Actually Adds

The framing “ext4 is ext3 with extents” is close enough to true that it explains both why the project is tractable and why it is not trivial.

The core structural change is the extents tree. In ext2 and ext3, a file’s block locations are stored as a tree of indirect block pointers: the inode holds 12 direct block numbers, then a single-indirect pointer, a double-indirect pointer, and a triple-indirect pointer. Reading a large file requires following chains of these pointers. For a 1 GB file, you might need several separate reads just to reconstruct the block map.

ext4 replaces this with an extent tree. Each extent describes a contiguous range of blocks with a single header structure:

struct ext4_extent {
    __le32  ee_block;    /* first logical block covered */
    __le16  ee_len;      /* number of blocks covered */
    __le16  ee_start_hi; /* high 16 bits of physical block */
    __le32  ee_start_lo; /* low 32 bits of physical block */
};

For files that are stored contiguously, a single extent can describe the entire file. The tree has a header (ext4_extent_header) that identifies whether the current node contains leaf extents or internal index nodes (ext4_extent_idx), and the whole structure is rooted directly in the inode’s i_block field, which previously held the indirect pointer tree.

When the EXT4_INODE_EXTENTS flag is set in an inode’s i_flags, you parse this extents tree. When it is not set, you fall back to the old indirect block behavior. OpenBSD’s ext2fs driver never learned to parse the extents tree, so it correctly refuses to mount volumes where inodes might use it.

Beyond extents, ext4 adds several other incompatible features that a real implementation has to handle: INCOMPAT_64BIT extends block addresses from 32 to 64 bits; INCOMPAT_FLEX_BG allows block groups to be combined into flexible groups with relocated metadata; INCOMPAT_INLINE_DATA stores small file contents directly in the inode. A driver that ignores these will silently misread data on volumes that use them.

For read-only support, the most important one is extents. Get that right and you can mount most Linux volumes. Miss it and nothing works.

OpenBSD’s Filesystem Landscape

OpenBSD’s primary filesystem is FFS, the BSD Fast File System, which descends from the same Unix ancestry as ext2 (both trace back to the original Unix filesystem research). OpenBSD also ships FFS2, the 64-bit-capable evolution of FFS, along with FAT, ISO 9660, and NFS support.

The ext2fs driver in sys/ufs/ext2fs/ has been in OpenBSD for a long time and provides read-only access to ext2 and some ext3 volumes. It is conservative by OpenBSD standards: read-only, cautious, not trying to be clever. The driver correctly rejects volumes with unrecognized incompatible feature flags, which is why ext4 volumes fail to mount.

OpenBSD’s culture around kernel code is worth noting. The project has historically prioritized code clarity, correctness, and reviewability over feature velocity. The base system is audited. The kernel gets careful attention. Bringing in a large block of AI-generated kernel code sits awkwardly against that background, not because the code is inherently wrong, but because the project’s confidence in any given piece of code depends on humans having read and understood it.

Vibe Coding a Filesystem Driver

Andrej Karpathy’s original framing of vibe coding described a workflow where you tell a model what you want, accept the output without deep review, and iterate until it runs. Applied to a web application with good test coverage, this is a reasonable productivity shortcut. Applied to a kernel filesystem driver, the dynamics are different.

For the ext4 project, the most plausible vibe coding path is incremental modification of the existing ext2fs driver. The LLM would be shown the existing driver, told about the ext4 extents format, and asked to add extents support. The on-disk structures are well-documented in the Linux kernel ext4 documentation and in the Linux kernel source, so the model has extensive training material to draw from.

What an LLM does well here: generating the struct definitions from the specification, writing the traversal logic for a tree structure given a clear description of the node format, adapting existing block-reading functions to handle the new format. These are pattern-matching tasks that LLMs handle reasonably well.

What an LLM does poorly here: endianness. ext4 is little-endian on disk. OpenBSD runs on sparc64 among other platforms, which is big-endian. Every field read from disk needs the appropriate le32toh() or le16toh() conversion. The existing ext2fs driver applies these carefully. An LLM generating new code alongside that driver might get most of them right while missing one, producing code that works perfectly on x86 and corrupts data silently on big-endian hardware. There is no test that catches this without actually running on big-endian hardware.

Error paths are another weak point. The ext4 extents tree can be corrupted on disk. A robust driver needs to validate the magic number in ext4_extent_header, check that depth values are sane, and handle malformed trees without panicking or looping. LLMs tend to generate the happy path fluently and treat error handling as an afterthought.

Read-only support does change the risk calculus meaningfully. The worst outcome from a buggy read-only driver is incorrect data reads or a kernel panic; it cannot corrupt your filesystem. That is genuinely better than write support, where a bad block allocation could silently destroy data. The project is correctly scoped to read-only first, which is where most users’ actual need lies anyway.

What This Experiment Reveals

The broader question the project raises is where filesystem implementation sits on the spectrum of tasks where AI-generated code is good enough versus tasks where it needs thorough human review before deployment.

Filesystem drivers are not like application logic. A web route handler that has an off-by-one error usually produces a wrong response that a user notices and a developer fixes. A filesystem driver that has an off-by-one error in block address calculation might silently return wrong data for specific file sizes on specific volume configurations, and you might not notice until data recovery becomes relevant.

This does not mean the vibe-coded approach is wrong. It means the bar for review is higher than for most code. The interesting outcome of this project is not whether the LLM produced working code on the first try, but whether the resulting code is reviewable enough for the OpenBSD community to accept. OpenBSD’s kernel is maintained through a patch submission process, and the project’s developers are not going to accept a thousand lines of AI-generated kernel code on the basis that it mostly works.

That review burden is the real bottleneck. An LLM can generate plausible implementation code faster than a human could write it from scratch. But the cost of careful review does not decrease just because the initial code was generated quickly. If anything, code whose provenance is uncertain requires more scrutiny, not less, precisely because the author cannot explain their reasoning about the edge cases they considered.

The experiment does demonstrate something useful: the extents tree implementation is a well-defined enough problem that an LLM can produce something structurally correct. The on-disk format is documented, the existing driver provides the scaffolding, and the test case is simple: does the volume mount and do files read correctly. That is a legitimate path to getting a first-pass implementation that a kernel developer can then audit and clean up.

Whether OpenBSD’s kernel ends up with ext4 read support because of this work depends less on whether the LLM got the extents tree right and more on whether someone takes ownership of auditing the result and shepherding it through review. The vibe coding part is the easy half.

Was this interesting?