Building an Open World on 4MB: The N64's Constraints as a Design Teacher
Source: lobsters
There is a particular kind of clarity that comes from working on hardware that cannot forgive you. The Nintendo 64 ships with 4MB of RDRAM, expandable to 8MB with the Expansion Pak, and a memory bus architecture that punishes naive access patterns. Building an open-world engine on top of that is not a curiosity project; it is a masterclass in every technique that modern engines abstract away.
This YouTube deep-dive from Lobsters covers exactly that: someone built a working open-world engine for N64 hardware. The video is worth watching in full, but what I want to do here is unpack the underlying systems problems this project has to solve, and why each solution maps directly to constraints in the hardware.
The Hardware You Are Working With
The N64’s CPU is a MIPS R4300i running at 93.75 MHz. That number is deceptive because the CPU is almost a passenger in the rendering pipeline. The real workhorse is the Reality Co-Processor (RCP), a custom chip split into two functional units: the RSP (Reality Signal Processor) and the RDP (Reality Display Processor).
The RSP is a 62.5 MHz vector unit that runs small programs called microcode. When you render geometry, you do not call a graphics API; you write a display list in main RAM, DMA it to the RSP, and the RSP processes it. The RSP then feeds the RDP, which does rasterization, texture mapping, and blending. The RDP has 4KB of texture memory (TMEM), which is the real constraint for any serious texture work.
The memory architecture is Rambus DRAM, which has high peak bandwidth (around 562 MB/s) but significant latency penalties for random access. This pushes every good N64 engine toward sequential, predictable memory access patterns wherever possible. Random reads from cartridge ROM are also slow; burst reads are much more efficient.
Triangle throughput is roughly 100,000 triangles per second under reasonable conditions, and fill rate sits around 15 million pixels per second with a single texture. At 320x240 at 30fps, you have about 2.3 million pixels per frame. That sounds generous until you account for overdraw.
The Open World Problem at 4MB
A conventional level-based N64 game loads an entire zone into RAM and keeps it there. Ocarina of Time uses this model; each dungeon and overworld segment is a distinct memory domain. The seams between areas are loading zones, which exist precisely because the hardware cannot hold two full environments simultaneously.
An open world removes those seams. The player can walk in any direction continuously, which means the world must stream in and out of RAM as the player moves. On modern hardware this is a solved problem with well-documented solutions: virtual texture systems, asset streaming threads, background IO with double-buffering. On the N64, you are doing this with no operating system, no threads in any meaningful sense, and a DMA controller you must schedule yourself.
The streaming architecture has to answer several questions simultaneously: what region of the world is the player currently in, what adjacent regions need to be resident in RAM, what can be evicted, and how do you perform the DMA transfer from cartridge to RAM without stalling the main game loop.
The answer N64 developers historically reached for is a grid or chunk-based world partition. The world is divided into fixed-size cells. At any point, a small window of cells around the player is resident. When the player crosses a cell boundary, the engine begins loading the new cells on the far edge and evicting the cells that are now out of range. The cell size is tuned so that at maximum player movement speed, a cell always loads before the player reaches it.
This requires knowing, at load time, exactly where in the ROM each cell’s data lives, and how large it is. The ROM layout becomes a first-class concern of the build pipeline. No dynamic allocation, no file system; just a table of offsets and sizes baked into the binary at compile time.
Spatial Partitioning and Culling
With a constrained polygon budget, the engine cannot afford to submit geometry that is not visible. Frustum culling, the process of rejecting objects outside the camera’s view frustum, is not an optimization on N64; it is a requirement.
For an open world, the spatial data structure that supports culling also doubles as the structure that drives streaming. An octree or a uniform grid over the world lets you query which cells intersect the view frustum efficiently. Cells that do not intersect can be skipped entirely in the display list construction phase. This matters because constructing display lists is CPU work, and the CPU has other responsibilities: physics, AI, input handling, audio.
N64 games that handled large open areas well, most notably Body Harvest by DMA Design (later Rockstar North), used aggressive fog to reduce effective draw distance. Fog is not only an aesthetic choice; it is a culling budget multiplier. If the fog plane is close enough, you never have to render geometry behind it, which means distant cells can simply not exist in the display list. Body Harvest had large drivable environments with vehicles, buildings, and enemies precisely because it controlled what the player could see at any given moment.
Custom Microcode
The RSP runs whatever microcode you give it. Nintendo shipped several standard microcode variants: Fast3D, Turbo3D, and others. Fast3D is the standard renderer with perspective-correct texture mapping and full clipping. Turbo3D is a stripped-down variant that trades clipping accuracy for throughput.
For an open-world engine, writing custom microcode is one of the highest-leverage things you can do. The RSP has 4KB of instruction memory and 4KB of data memory. Within those limits, you can implement specialized rendering modes that are impossible with the stock microcode. Some N64 homebrew projects have implemented custom microcode for things like heightmap rendering, where the terrain is represented as a grid and the RSP generates the triangles on the fly from vertex height data rather than storing explicit triangle lists.
This matters for open world terrain specifically. Explicit triangle meshes for large terrain patches consume RAM quickly. A heightmap is far more compact: a 64x64 grid of height values at one byte per sample is 4KB. The equivalent explicit triangle mesh (two triangles per grid cell, three vertices per triangle, 16 bytes per vertex) is around 400KB. Custom microcode can bridge that gap.
Modern Tooling
Anyone working on N64 homebrew today is not starting from scratch. libdragon is the modern open-source SDK for N64 development. It provides a C toolchain, peripheral drivers, and graphics abstractions without requiring the original Nintendo SDK. Recent versions include hardware-accelerated 3D rendering support through a new triangle rasterization interface that exposes the RDP directly.
The libdragon project has put real work into making the development loop reasonable: there is an emulator integration mode, a proper build system, and enough documentation to understand what the hardware is actually doing. This is in sharp contrast to the original N64 SDK, which was proprietary, Japanese-primary, and shipped on SGI workstations.
For cartridge development, flash cartridges like the EverDrive 64 allow running ROM images on real hardware without manufacturing a cartridge. This closes the feedback loop between writing code and testing it on actual silicon, which matters when you are tuning DMA timing or trying to reproduce a glitch that only appears on hardware and not in emulators.
What This Kind of Project Teaches
Every abstraction in a modern game engine exists because someone, at some point, hit a hardware limit and had to build the abstraction to survive it. Memory streaming, background loading, LOD systems, spatial culling structures, draw call batching: all of these have a direct ancestor in the techniques that N64 developers and demo coders worked out on constrained hardware in the late 1990s.
Building an open-world engine on N64 in 2025 or 2026 is not an exercise in relevance. It is an exercise in understanding why the abstractions exist, stripped of the comfort of having a system that will mostly do the right thing if you do not think too carefully. When you have 4MB and no second chances, the design decisions surface immediately.
The project documented in the video is the kind of work that makes the constraints visible again. Not every developer needs to write RSP microcode or hand-schedule DMA transfers. But understanding what those systems are doing underneath the API is the difference between guessing at performance and actually knowing it.