Running Windows Binaries Without Windows: The Static Emulation Approach
Source: lobsters
The dominant strategy for running Windows software on non-Windows systems has been dynamic compatibility: Wine intercepts Win32 API calls at runtime, Proton layers DirectX translation on top of that, and QEMU emulates the entire processor if you need something lower-level. All of these involve a live Windows process, real API calls flying around, and a translation layer scrambling to keep up. Theseus, described by its author on neugierig.org, takes the opposite approach: analyze the binary before it runs, resolve what you can statically, and produce something that doesn’t need that runtime scaffolding at all.
That distinction is more significant than it sounds.
What Static Emulation Actually Means
A PE (Portable Executable) binary carries a lot of information that most dynamic emulators treat as runtime concerns. The import table lists every external symbol the binary expects to find, organized by DLL name. The relocations section describes every absolute address that needs patching when the image loads at a non-preferred base. The .pdata section on x64 encodes the exception handling unwind information for every function. All of this exists in the binary before a single instruction executes.
A static emulator reads that information ahead of time. Rather than hooking LoadLibrary and GetProcAddress at runtime, it walks the import directory and resolves symbols during analysis. Rather than patching relocations when the loader maps segments into memory, it can compute final addresses during translation. The result is an output artifact, not a running process waiting for interception.
Rosetta 2 on Apple Silicon is the closest widely-deployed relative. Apple’s ahead-of-time translation tool oah converts x86_64 Mach-O binaries to ARM64 before they run, caching the translation so subsequent launches skip the JIT warmup that FEX-Emu and similar tools pay on every cold start. The difference is that Rosetta 2 only crosses ISA boundaries, while Theseus crosses both ISA and OS API boundaries simultaneously.
The PE Format as an Analysis Target
The Windows PE format is more structured than many people expect. At the outermost layer is the DOS MZ header, kept for historical compatibility, followed by the PE signature and the COFF header. What matters for static analysis is the optional header’s data directory, a fixed-size table of pointers to structures like the import descriptor table, export table, base relocation table, exception directory, and TLS directory.
The import descriptor table is particularly useful. Each entry names a DLL and points to two parallel arrays: the Import Lookup Table (ILT), which names the imported symbols, and the Import Address Table (IAT), which the loader fills with resolved function pointers at load time. A static emulator can walk the ILT during analysis and synthesize IAT entries that point to its own stub implementations, without any runtime loader involvement.
This is cleaner than Wine’s approach. Wine implements a full Windows loader in ntdll.dll.so, which maps PE sections, processes relocations, and resolves imports using Wine’s reimplemented DLL set. That loader runs every time an application starts. A static translator does equivalent work once and bakes the results in.
Where Static Analysis Hits Its Limits
The hard problems are self-modifying code, indirect control flow, and anything that conflates code with data. x86 has no mandatory alignment, instructions are variable-length, and nothing prevents a program from writing bytes into a code page and jumping into them. A static pass cannot follow that.
Windows exception handling compounds this on x64. Structured Exception Handling on 32-bit Windows used a linked list threaded through the stack via FS:[0], which a static tool can track fairly mechanically. On 64-bit Windows, exception dispatch is table-driven: the runtime walks .pdata to find RUNTIME_FUNCTION entries, follows UNWIND_INFO structures to understand each function’s frame layout, and invokes registered handlers. A static emulator that changes function layouts or addresses has to update all of this or synthesize equivalent structures for the target platform’s unwinder.
Thread Local Storage adds another wrinkle. Windows TLS uses a directory in the PE image pointing to callbacks that fire on thread creation and destruction, plus an index into the process’s TLS slot array maintained by the OS. Emulating TLS correctly means either preserving the Windows slot model or rewriting all TLS accesses during translation.
COM and window message dispatch are in a different category entirely: they involve runtime-registered objects, late binding through vtables populated at runtime, and message loops that block waiting for external events. Static analysis can identify vtable layouts and COM interface pointer patterns, but cannot fully resolve what concrete implementations will be registered at runtime.
The Advantage Worth Having
Despite those limitations, the things static analysis handles well are exactly the things that cause the most friction in dynamic approaches.
Wine’s main operational cost is that every Win32 API call crosses a translation boundary. CreateFile becomes a thunk through Wine’s kernel32.dll.so, which calls into ntdll.dll.so, which translates the call to a Linux open(2) or openat(2). For an application making thousands of small API calls per frame, that overhead accumulates. A static approach that identifies CreateFile imports at translation time can inline a direct call to the POSIX equivalent, eliminating the indirection entirely.
Security analysis is another domain where static wins. Wine has to run the binary to observe its behavior; a static emulator can extract the full import graph, identify which Win32 subsystems the application touches, and flag suspicious patterns before execution. Malware analysis tools like Cutter and Ghidra already do this kind of static PE analysis, though they stop short of full emulation.
Comparison with x86 Lifting Projects
McSema, RetDec, and the LLVM-based llvm-mctoll all attempt to lift x86 binaries to LLVM IR, from which you can target any LLVM backend. The approach works for straight-line code with recoverable control flow graphs. The failure modes are the same as Theseus faces: indirect jumps through computed addresses, overlapping instructions, and code that is also data.
What distinguishes a Windows-specific static emulator from general binary lifting is the OS knowledge layer. Knowing that a binary imports VirtualAlloc means the emulator can model heap semantics rather than treating the call as an opaque side effect. Knowing that CreateThread imports appear means the translation must handle the thread-startup TLS callbacks and get the calling convention right for the thread function pointer. That domain knowledge turns what would be a partially-lifted binary into something that can actually run.
What Theseus Represents
The project sits in an interesting position. Full dynamic compatibility, the Wine approach, handles the widest range of applications but carries permanent runtime overhead and requires ongoing maintenance as Windows evolves. Full static translation handles a narrower range but can produce cleaner, faster output for the applications it does handle.
The name is fitting in a quiet way. The Ship of Theseus replaces every plank until nothing original remains, yet the ship keeps sailing. A static Windows emulator does something similar to a PE binary: it takes apart the Windows-specific structure, replaces each piece with an equivalent from the target platform, and produces something that runs without needing Windows at all. Whether the result is still “the same program” is a question for philosophers. Whether it runs correctly is a question for the test suite.