Four Megabytes and a DMA Bus: What an N64 Open-World Engine Reveals About Engine Design
Source: lobsters
Building an open-world engine for the Nintendo 64 sounds like a contradiction. The console ships with 4 MB of RDRAM, streams data from cartridge at roughly 10 MB/s, and renders through a graphics processor with 4 KB of working texture memory. Open worlds, in the modern understanding, are defined by streaming: the world is larger than memory, and the engine continuously loads what the player is about to see while freeing what they left behind. Everything about the N64 seems to resist this.
The video linked from Lobsters covers exactly how someone built one anyway, and the techniques involved are not exotic hacks — they are the same architectural patterns that every open-world engine uses, implemented at the limits of 1996-era hardware.
The Hardware in Concrete Terms
The N64’s CPU is a NEC VR4300, a MIPS R4300i derivative running at 93.75 MHz with a 32-bit external data bus. Most commercial games ran it in 32-bit mode since the 64-bit path offered marginal benefit at additional cost. The FPU is technically 64-bit but slow enough that most geometry math ran on the RSP instead.
The RSP (Reality Signal Processor) is the interesting part. It is a MIPS R4000 scalar core paired with an 8-lane 16-bit SIMD vector unit, running at 62.5 MHz. What makes it unusual is that it runs microcode: Nintendo and third parties shipped custom RSP programs (called ucodes) that defined the rendering pipeline for each title. F3DEX2, the most common ucode, implements transform, lighting, clipping, and triangle setup. There is nothing preventing you from writing a different ucode that implements a completely different pipeline, and several N64 homebrew developers have done exactly that. The RSP has 4 KB of instruction memory and 4 KB of data memory, total. Whatever your pipeline does, it has to fit in 8 KB.
The RDP (Reality Display Processor) handles rasterization and texturing. It has a 4 KB texture cache. That number is not a typo. To texture a surface, you load a tile of texture data into the cache and draw triangles against it. Large textures are split into multiple tiles and loaded in passes. Textures in N64 games are typically 16x16 to 64x64 pixels, partly because of artistic conventions and partly because larger textures require multiple cache loads per surface.
With the Expansion Pak installed, you get 8 MB of RAM. After framebuffers (roughly 300 KB each for 320x240 at 16-bit), the Z-buffer (another ~300 KB), audio buffers, and game code, a typical open-world engine on the Expansion Pak might have 5-6 MB available for world data. Cart DMA over the PI (Peripheral Interface) delivers roughly 8-12 MB/s effective throughput. At 30 fps, you have 33 ms per frame, which means a single large DMA transfer can consume several frames of streaming budget if you are not careful with scheduling.
The Cell Streaming Pattern
The approach demonstrated in the video is a cell-based streaming system: the world is divided into a grid of cells, each sized to a predictable memory budget. At any given time, the engine keeps the player’s current cell and a ring of adjacent cells in RAM. When the player moves into a new cell, the engine queues a DMA request for the newly adjacent cells on the far edge of the ring and frees the cells that fell off the back.
The key constraint is that N64 DMA is asynchronous. You issue a transfer to the PI controller and it runs in the background while the CPU and RSP continue working on the current frame. A well-designed streaming system keeps several frames of DMA requests queued up, so that by the time the player reaches a cell boundary, the next cell’s data has already arrived. At 10 MB/s, a 64 KB cell takes about 6.5 ms to transfer, just under one frame at 30 fps. Smaller cells allow multiple transfers per frame and a deeper lookahead buffer.
With 6 MB of available world RAM and 64 KB cells, you can hold about 96 cells in memory simultaneously — enough for a 9x9 cell neighborhood around the player. At typical N64 game scale, that provides a view distance of several hundred world units before additional culling is needed.
How Commercial Games Solved Adjacent Problems
Commercial N64 games solved related problems without fully committing to continuous world streaming. Ocarina of Time’s overworld (Hyrule Field) is one continuous mesh, but it achieves this by being small and sparse. The field contains roughly 200-400 triangles for the ground geometry; distant mountains are painted into the skybox and do not exist as geometry at all. OoT’s larger structure uses a scene-room hierarchy where rooms are loaded and unloaded via DMA as the player passes through doorways. At most two or three rooms reside in RAM at once. This is portal-based visibility management with hard transition points — effective, but not truly continuous streaming.
Banjo-Kazooie took a simpler approach: each level is a self-contained asset that fits entirely in available RAM after code and audio. There is no streaming within a level, just discrete loading screens between worlds. Dinosaur Planet, the unreleased N64 precursor to Star Fox Adventures, had a more ambitious sector-based loading system that came closer to genuine open-world streaming, but it shipped in a different form on GameCube as Star Fox Adventures.
The homebrew project in the video goes further than any commercial N64 game did: continuous streaming without hard transitions, driven by the cell system above. That is a meaningful engineering achievement given the hardware budget.
LOD and Fog as Engineering Tools
At the polygon budgets the N64 supports — typically 500 to 2,500 visible triangles per frame for a complex scene — LOD (level of detail) is not optional. The standard approach is distance-based mesh swapping: maintain two or three versions of each mesh and select based on projected screen size or distance. Beyond a certain threshold, geometry is replaced with a billboard sprite, a single textured quad that always faces the camera. The RDP renders a textured quad far faster than a multi-polygon mesh, and at distance the visual difference disappears.
N64’s built-in fog support provides a culling mechanism that commercial games exploited heavily. Setting a short fog distance hides geometry pop-in behind a gradual fade, which allows the engine to cull distant cells without a visible seam. For an open-world engine on constrained hardware, this is not just an aesthetic effect — it is a functional part of streaming budget management. The fog distance is effectively your view distance, and tuning it is a direct trade-off between visual quality and frame budget.
Frustum culling handles the rest. For each cell or object, you test its axis-aligned bounding box against the six frustum planes. On N64, this runs on the VR4300 CPU rather than the RSP, since it is scalar math and the RSP’s vector unit is better spent on triangle throughput. At 30 fps, frustum culling for 200-300 objects takes roughly 1-2 ms on the R4300i, well within budget.
The libdragon Toolchain
Building this today is more accessible than it was in 1996. libdragon is the modern open-source N64 SDK, distributed as a Docker container with a mips64-elf-gcc cross-compilation toolchain. It includes an rdpq (RDP queue) API for direct hardware access, an OpenGL 1.1 compatibility layer that translates GL calls to RSP microcode, asset conversion tools for sprites and fonts, and audio support for WAV, XM, and YM formats. The n64brew community wiki covers the hardware at the register level with cycle-accurate documentation.
For testing, the Ares emulator provides high-accuracy N64 emulation suitable for catching timing and DMA behavior that less accurate emulators miss. Deploying to hardware uses an EverDrive-64 flash cart, which loads ROMs from SD card with DMA characteristics close enough to real cartridge behavior that code tested on EverDrive generally behaves identically on original hardware.
The sm64 decompilation project is the most useful reference for understanding how commercial N64 games structured memory, display lists, and actor management. Reading the decomp source alongside the Ultra64 SDK documentation archive gives a complete picture of the baseline that any homebrew open-world engine is building on top of.
What the Constraints Make Visible
The N64 does something useful for understanding engine architecture: it makes every design decision legible. When you have 6 MB for world data and 10 MB/s of streaming bandwidth, the cost of every choice is concrete. A cell that is 10 KB too large reduces your lookahead by one frame at 30 fps. A texture that does not fit in the 4 KB RDP cache triggers an extra DMA stall per draw call. The polygon budget leaves no room for geometry the player cannot see.
Modern open-world engines face structurally identical problems with larger numbers. Every current open-world title runs some form of cell or chunk streaming with async IO, LOD mesh hierarchies, frustum and occlusion culling, and billboard impostors at distance. The engineering in the N64 project is the same engineering; the hardware just makes the costs visible in a way that a modern engine, with its terabytes of storage and gigabytes of VRAM, can afford to obscure.
Building on N64 is a useful discipline for that reason. The relationship between geometry budgets, memory bandwidth, and LOD thresholds becomes something you cannot hand-wave. Every open-world engine designer is solving the same resource allocation problem — the N64 just refuses to let you pretend otherwise.