· 7 min read ·

Why Running Windows 9x on Linux Is Harder Than Wine Makes It Look

Source: hackernews

Wine has been solving Windows compatibility on Linux for over thirty years. It handles a staggering breadth of the NT-era Windows API surface, runs games and productivity software at near-native speed, and has become a genuine infrastructure project with serious corporate backing. The one era it has never handled well is the one most people nostalgically remember: Windows 95, 98, and Millennium Edition.

Hailey’s Windows 9x Subsystem for Linux, which surfaced on Hacker News with nearly 800 points, is an attempt to close that gap. The name deliberately echoes Microsoft’s Windows Subsystem for Linux but inverts the direction: instead of Linux binaries running on Windows, this is a compatibility layer for running Win9x binaries on Linux. The gap it fills has existed for decades and understanding why it exists gets into some genuinely deep systems territory.

The Hardware Problem Wine Never Had

Wine’s target, the NT API lineage, runs entirely in 32-bit or 64-bit flat memory mode. Every Win32 process gets a clean address space, the system call interface is well-defined, and user-mode code never needs to execute 16-bit x86 instructions in the normal application path. This is important for a reason that goes beyond software design: modern 64-bit CPUs running in long mode are physically incapable of executing 16-bit x86 segmented code.

Intel’s x86-64 architecture, which describes every 64-bit desktop processor in use today, removed the virtual-8086 mode (VM86) that allowed 32-bit operating systems to run 16-bit code. When a 32-bit kernel ran on x86, it could use VM86 mode to execute 16-bit real-mode code with hardware acceleration and trap exceptions cleanly. The Linux kernel exposed this through the vm86() and vm86old() syscalls on 32-bit x86 kernels.

On a 64-bit Linux kernel, those syscalls do not exist. The hardware feature they relied on is gone. Any 16-bit x86 code execution on a 64-bit host requires software emulation: an interpreter, a JIT compiler, or a full hardware emulator like QEMU. This is not a matter of missing implementation effort in Wine; it is a consequence of CPU architecture decisions made when AMD designed x86-64.

For Win9x compatibility this matters because Win9x’s 32-bit API is not actually fully 32-bit. The KERNEL32.DLL implementation in Win9x regularly thunks down into KRNL386.EXE, which is a 16-bit DLL, to do real work. A 32-bit application calling CreateFile travels through a chain that includes 16-bit code before reaching the actual file system layer. Any subsystem that wants to run genuine Win9x binaries, including the system DLLs themselves, needs to handle that 16-bit execution path.

Three Thunk Types and Why They All Bite You

Microsoft documented three distinct thunking mechanisms for the Win9x era, each solving a different interoperability direction between 16-bit and 32-bit code.

Universal Thunks (UT) let 32-bit code call 16-bit DLLs. The 32-bit caller sets up parameters in 32-bit format, the thunk stub transitions to 16-bit protected mode, translates the calling convention to 16-bit segment-based addressing, and returns. Flat Thunks let 16-bit code call 32-bit DLLs using a shared flat selector that maps the same physical memory into both the 16-bit segmented view and the 32-bit flat view. Generic Thunks, the most complex variant, handle arbitrary 32-bit DLL calls from 16-bit code with full parameter marshaling.

All three require the execution environment to be able to switch between 16-bit segmented mode and 32-bit flat mode at call boundaries. On hardware where VM86 mode exists this is achievable, though complex. On a 64-bit host without VM86 it means the 16-bit side of every thunk boundary must be emulated in software. A JIT approach that translates 16-bit x86 instructions to native 64-bit code has to handle all the segment register semantics, the different stack layout, and the privilege transitions correctly.

The DOSBox project, which has the most mature x86 16-bit emulation outside of QEMU, handles this through a dynamic recompiler for the DOS era but still runs slower than native. DOSBox-X has extended this into experimental Win9x territory by emulating the full hardware stack and running Win9x inside the emulator, which works but carries the full overhead of hardware emulation rather than the syscall-interception approach that makes Wine fast.

VxDs: The Ring 0 Interface Applications Called Directly

Beyond thunking, Win9x’s Virtual Device Driver (VxD) architecture represents a second structural problem. VxDs are Ring 0 drivers in Win9x, and unlike NT’s well-enforced kernel/user boundary, Win9x applications could and frequently did call into VxDs directly through software interrupts and a specific calling interface.

The canonical example is games. A significant portion of Win9x-era games communicated with DirectX components, hardware abstraction layers, and CD-ROM drivers that were implemented as VxDs. VWIN32.VXD provided Win32 services from VxD context. VXDCALL was a legitimate mechanism for applications to reach into kernel-mode code. Some games used this for timing, for DMA setup, or for memory management tricks that required kernel-level access.

Wine’s architecture intercepts at the DLL boundary and redirects API calls to POSIX equivalents. This works because NT maintains a clear separation between kernel code and the user-mode API surface. Win9x blurs that boundary by design, and software that crossed it cannot be supported purely by intercepting DLL calls. A VxD stub layer that handles common VxD interfaces and returns plausible values covers the easy cases; applications that relied on specific driver behavior require individual attention.

The Shared Address Space Trap

The third structural difference, and possibly the most surprising to anyone who came up programming in the post-XP era, is Win9x’s memory model. Win9x’s 32-bit address space is not per-process. The region from roughly 4MB to 2GB is a shared system arena visible to all running processes simultaneously. From 2GB to 3GB is a shared DLL arena. Only the top gigabyte is private per-process memory.

This was not an intentional security model; it was a compromise to support 16-bit legacy applications and shared DLL state without implementing full address space isolation. The practical consequence is that one Win9x process can read and write another process’s memory if it has the address. Some software relied on this behavior knowingly, using it as an informal IPC mechanism. Other software accidentally relied on it, written by developers who did not know or care about memory isolation because Win9x provided none.

Linux processes have fully isolated virtual address spaces. Emulating Win9x’s shared arena requires establishing shared memory regions mapped at identical addresses across all running emulated processes, which is achievable with mmap and file-backed shared memory but requires careful bookkeeping, or handling cross-process memory accesses through some kind of trap-and-forward mechanism. Neither approach is free, and the second requires the emulator to mediate every memory access in the shared region.

Where This Sits in the Preservation Landscape

The practical case for getting Win9x right is preservation. A large body of software ran only on Win9x: business applications from the late 1990s, early multimedia tools, specialized utilities, and a long tail of games that were never ported to NT or XP. Some of this software is culturally significant and some contains data in formats that are only accessible through applications that ran on Win9x.

ReactOS targets NT semantics deliberately and has no plans to support Win9x’s architecture. The Internet Archive’s software library has invested heavily in DOSBox-based browser emulation for DOS software but Win9x-era software remains underserved. The Software Preservation Network has documented the problem without a clean solution.

The performance gap between Wine’s approach and a full emulator matters here. DOSBox-X can run Win9x, but running a 1999-era game at interactive framerates inside a hardware emulator on 2026 hardware is achievable only because modern CPUs are fast enough to paper over the overhead. A subsystem-style implementation that runs the binary natively and intercepts only the OS interface would be significantly faster and would not require a full Win9x installation.

The Implementation Surface

A plausible architecture for a Win9x subsystem on Linux involves a PE loader that maps Win32 executables into process memory, an import interception layer that redirects calls from KERNEL32.DLL, USER32.DLL, GDI32.DLL, and the other core Win9x DLLs to Linux-backed implementations, a 16-bit x86 interpreter or JIT for handling thunk paths, and a VxD stub layer for common driver interfaces.

The triage strategy matters. Pure Win32 applications that were distributed on Win9x but do not use Win9x-specific features are the easiest case; their API usage overlaps substantially with Wine’s NT target and an NT-compatible Win32 implementation covers them. Applications that use DirectX 1 through 7 form the next tier, where a DirectX-to-OpenGL or DirectX-to-Vulkan translation layer handles most cases. Applications that make direct VxD calls or depend on the shared address space layout are the hardest cases and likely require per-application investigation.

What makes this project notable is not that someone wanted Win9x compatibility; that desire has existed for twenty years. It is that someone has engaged seriously with the structural constraints that make the problem hard, particularly the 16-bit execution requirement on a 64-bit host, and is building something with a subsystem architecture rather than reaching for full hardware emulation as the default answer. The unglamorous work of getting x86 segment register semantics right inside a 64-bit process is what separates a demonstration from something that actually runs software.

Was this interesting?