Open Worlds on 4 Megabytes: The Engineering Constraints of N64 Engine Development

Someone built an open-world engine for the Nintendo 64, and the video documenting it surfaced on Lobsters recently. If you have any interest in low-level graphics programming or constrained hardware, it is worth watching. But the video is a starting point, not a full explanation of why this is hard or what tradeoffs the author had to navigate. The N64’s architecture is specific enough that the interesting parts of this project are invisible without some background.

The Hardware You Are Actually Working With

The N64’s CPU is a NEC VR4300, a MIPS R4300i-compatible chip running at 93.75 MHz. It has a 64-bit ALU but communicates with RAM over a 32-bit bus, which is a common source of confusion. The system ships with 4MB of RDRAM, expandable to 8MB with the Expansion Pak. 4MB. That is the entire memory budget for your engine, your game state, your audio, and every texture you render.

The graphics pipeline is split across two custom chips: the Reality Signal Processor (RSP) and the Reality Display Processor (RDP). The RSP is a MIPS R4000-based DSP that handles geometry transformation, clipping, and lighting through a programmable microcode layer. The RDP is a fixed-function rasterizer that receives commands from the RSP and writes pixels to the framebuffer. These two chips communicate through a shared FIFO buffer, and the CPU drives the whole system by writing display lists to main RAM, then signaling the RSP to process them.

The RDP has its own 4KB texture cache. Not 4MB. 4KB. Every texture tile the RDP renders must fit through that 4KB window, which means large textures get split into tiles, and texture bandwidth becomes a constant design concern. The RDP can theoretically fill around 100 million pixels per second under ideal conditions, but texture cache misses degrade that substantially.

Why Open World Is a Different Problem Here

A typical N64 game handles its world in chunks: discrete rooms, levels, or arenas loaded in full before play begins. The camera never sees more than what the designer decided to put in that zone. Open-world design breaks this assumption by requiring a continuous, navigable space where the camera can face any direction and the engine must decide what to render in real time.

The immediate problem is memory. If your world is larger than 4MB (and any meaningful open world is), you cannot hold it all in RAM simultaneously. You need streaming: loading geometry and textures from the ROM cartridge as the player moves, and evicting data that is no longer needed. N64 cartridges can hold up to 64MB, so the data exists; the problem is getting it into RAM fast enough and managing what is resident at any given moment. The cartridge bus runs at around 90MB/s under favorable conditions but is shared with the CPU, so streaming competes with everything else the game is doing.

The second problem is visibility. Rendering everything in a 500-meter radius is not possible, so the engine needs to know what to skip. Modern games use GPU-accelerated occlusion queries and hierarchical Z-buffers. The N64’s RDP has no occlusion query mechanism. The RSP handles transformation in software microcode, so any culling has to happen before commands enter the display list. BSP trees, portal graphs, and distance-based culling are all CPU-side operations on a 93 MHz chip that is also running game logic and audio.

Fog is the universal answer to the draw distance problem on N64. Every large-world N64 game uses heavy fog to make pop-in invisible. Hyrule Field in Ocarina of Time fogs out at a relatively short distance specifically because rendering a full unfogged horizon is not feasible. Body Harvest by DMA Design, one of the few genuine open-world games on the platform, uses island-based level segmentation alongside fog to bound how much geometry can be visible at once. The islands are large, but they are discrete units, and the water between them functions as a natural visibility boundary.

Microcode and the RSP Programming Model

What makes the N64’s rendering pipeline genuinely unusual is that the RSP is programmable at the microcode level. Nintendo shipped several official microcode implementations with different capability tradeoffs: Fast3D (F3D) for standard use, F3DEX and F3DEX2 with extended vertex buffers and better performance, Turbo3D for raw speed at the cost of accuracy, and S2DEX for 2D sprite work. Third-party developers could, in principle, write their own microcode, though this required signing agreements with SGI and was rarely done.

Microcode programming targets the RSP’s vector unit, which has eight 128-bit vector registers, each holding eight 16-bit lanes. Geometry transformation in F3DEX2 processes vertices in batches; the vertex buffer holds 32 vertices at a time. If your mesh has more than 32 vertices visible at once, you issue multiple load commands. The implication for open-world rendering is that large terrain meshes need to be designed with the vertex buffer size in mind, or you pay for repeated buffer loads.

Libdragon, the open-source homebrew SDK for N64 development, has matured significantly and now includes a modern OpenGL 1.1 implementation that compiles down to RSP microcode. This is what most contemporary homebrew projects use, including the open-world engine from the video. Libdragon handles display list generation, RSP/RDP synchronization, and the texture loading pipeline, which removes a large class of low-level errors but does not change the underlying hardware constraints.

Terrain Streaming Architecture

For an open-world engine specifically, terrain streaming is the central problem. A practical approach divides the world into chunks: fixed-size tiles in the XZ plane, each containing a height map, texture assignments, and any static props. The engine maintains a ring of resident chunks around the player’s current position, loading new chunks from ROM as the player moves and evicting distant ones.

On N64, the texture budget per chunk is the tightest constraint. Terrain commonly uses a splatmap approach where a small number of base textures are blended by a weight map. But texture blending in software requires RSP cycles, and the 4KB RDP texture cache means you want your textures small and your palette limited. A tile set of 32x32 or 64x64 textures with careful re-use across chunks is the realistic working zone.

Geometry LOD is equally necessary. A distant chunk needs fewer polygons than one the player is standing in. Computing LOD transitions on the CPU and regenerating display lists at runtime is expensive, so most implementations pre-generate two or three LOD meshes per chunk at build time and select among them based on camera distance. This inflates ROM size but moves work off the CPU at runtime.

What Commercial Developers Learned

Body Harvest (1998) is the most instructive commercial example of large-scale world design on the N64. DMA Design, the studio that would later ship Grand Theft Auto III, structured each of Body Harvest’s five planets as a collection of large contiguous regions separated by non-playable transitions. Within a region, the engine streams geometry from the cartridge as the player’s harvester vehicle moves. The draw distance is modest and fog is heavy, but the navigable area within a region is genuinely large for the hardware.

GoldenEye 007 used a more traditional BSP approach for its indoor levels, with portals between rooms to limit rendering. Perfect Dark extended this with a larger vertex budget and more complex multi-level geometry. Both games avoided the streaming problem by keeping level scope bounded. Rare’s Banjo-Kazooie and its sequel used a single-room-at-a-time model similar to Mario 64, where each world is small enough to fit in RAM entirely.

The honest conclusion from surveying the commercial library is that true open-world streaming on N64 was almost never shipped. DMA came closest, and the constraints visible in Body Harvest’s design choices, the island segmentation, the fog, the relatively sparse geometry, reflect exactly the same problems a modern homebrew developer faces.

Why This Is Worth Studying

Building a constrained-hardware engine forces clarity. When you have 4MB of RAM and a 4KB texture cache, every architectural decision has a measurable cost. You cannot rely on the GPU to hide a poorly-designed culling strategy. You cannot add more RAM when the streaming math does not work. The constraints eliminate ambiguity in a way that building for modern hardware does not.

The techniques that solve these problems, chunked streaming, pre-generated LOD, aggressive texture atlasing, portal-based visibility, fog as a first-class design tool, are not antiquated. They appear in modern open-world engines in more sophisticated forms. Understanding why they exist requires understanding the hardware that forced them into existence.

The N64 project documented in the video is a working demonstration that these problems are solvable with contemporary tooling like libdragon. It does not make them easy. It makes them legible.