· 7 min read ·

Forty Kilobytes, Six Worlds: The Engineering Inside The Last Ninja

Source: hackernews

A tweet circulating on Hacker News recently made the rounds with a simple observation: System 3’s 1987 C64 game The Last Ninja was 40 kilobytes. The reaction split predictably between nostalgia and skepticism. The skepticism is fair, because the 40KB figure needs some unpacking before it means anything. The nostalgia, though, is completely earned.

The number refers to the game’s in-RAM footprint per level, not the total disk image. The Last Ninja streamed its six worlds from floppy disk at level transitions, so only one level’s worth of data needed to live in memory at a time. The full disk image across all levels is larger, probably somewhere in the 100–120KB range. But that detail doesn’t diminish the achievement. The working memory envelope was still roughly 40KB, and understanding what had to fit inside it requires understanding what the Commodore 64 actually gave you to work with.

The RAM Budget

The C64’s 64KB of RAM was never fully yours. The machine shipped with a KERNAL ROM (8KB), a BASIC ROM (8KB), and a fixed I/O area for the SID chip, VIC-II chip, and CIA controllers occupying another 4KB. The character ROM took another 4KB when mapped in. Games could bank out the BASIC and KERNAL ROMs by toggling processor port bits, reclaiming those address ranges for code and data, but the I/O registers at $D000–$DFFF had to remain accessible — or you’d lose access to the graphics and sound hardware entirely.

After all that accounting, a game running on bare metal had somewhere between 38KB and 52KB of genuinely usable RAM, depending on how aggressively it banked out the ROMs. The VIC-II added another wrinkle: it could only see a 16KB window at a time, one of four possible banks, so sprite and character data had to be positioned precisely within whichever bank the chip was pointed at.

Forty kilobytes wasn’t a generous budget. It was almost everything available.

How the Isometric Engine Worked

The Last Ninja was one of the first C64 games to make isometric 3D graphics feel convincing rather than gimmicky. The engine had to solve a specific set of problems: how do you represent a three-dimensional-looking world on hardware with a fixed 320×200 display and no floating-point unit?

The answer was a tile dictionary plus the painter’s algorithm. Instead of storing pixel art for every screen, the engine maintained a small set of reusable tile graphics — floor segments, wall faces, pillars, water, foliage — packed as VIC-II character data. Each character cell is 8×8 pixels, stored as 8 bytes in standard high-res mode or in multicolor mode for 4×8 pixels with 2 bits per pixel. The map itself was then an array of tile indices, one byte per reference into the shared tile dictionary. A 20×20-tile map costs 400 bytes of map data plus the tile graphics, far less than storing pixel art for every unique screen configuration.

Drawing order was handled with the painter’s algorithm: tiles and sprites rendered back-to-front based on their isometric coordinates. The isometric projection the engine used was a standard 2:1 ratio — two horizontal pixels per one vertical pixel of depth — which meant depth sorting reduced to a simple comparison of world coordinates.

Background tiles rendered almost for free because the VIC-II did the work in hardware. In character mode, the CPU just pointed the VIC-II at the right tile data in RAM, and the chip composited the grid during the raster scan. The CPU was free to run game logic while the display hardware handled rendering.

Sprite Multiplexing and Hardware Sprites

Moving objects — the ninja, guards, weapons, pickups — used the VIC-II’s eight hardware sprites. Each sprite is 24×21 pixels in high-res mode, 63 bytes of pixel data plus a few registers for position, color, and flags. Eight sprites sounds limiting, and it was, except that C64 programmers discovered they could recycle sprites mid-frame.

The technique is called sprite multiplexing. The VIC-II’s raster beam moves top-to-bottom through the display, and you can fire a CPU interrupt at any scanline. When the raster passes below a sprite’s current on-screen position, you redirect that sprite’s data pointer and position registers to a new object lower on the screen. The VIC-II then draws the same hardware sprite as a different object on the next scan. With careful timing, eight physical sprites could represent far more than eight visual objects. The Last Ninja used this to populate its screens with guards, items, and environmental details simultaneously.

Sprite animations were handled partly through VIC-II’s horizontal flip register. By storing only the left-facing frames and flipping them at the register level, you could represent both directions of movement with half the sprite data.

The SID Chip’s Role

Ben Daglish’s soundtrack for The Last Ninja is considered one of the finest pieces of C64 music, which is a real category of artistic achievement. The SID chip — the MOS 6581 — had three independent voice channels, each with selectable waveforms (sine, sawtooth, triangle, pulse), a programmable ADSR envelope, ring modulation between voices, and a multi-mode analog filter with adjustable cutoff and resonance. That last detail matters: the SID’s filter section was fully analog, implemented in actual capacitors and op-amp-equivalent circuits on the die. Individual chips sounded different from each other due to manufacturing variation, which is part of why SID music has such a distinctive character that’s hard to emulate perfectly.

Music on the C64 followed a consistent pattern. A small interrupt-driven player routine, typically 200–500 bytes of 6510 assembly, fired on every vertical blank interrupt — 50 times per second on PAL hardware, 60 times on NTSC. Each tick, it read the next event from a compact music sequence stored as patterns (short repeating phrases) referenced by a song table. This is essentially a tracker format in firmware. The player wrote frequency values, waveform selections, ADSR parameters, and filter settings directly to the SID’s memory-mapped registers at $D400–$D41C.

A typical C64 SID tune, player code and all data included, occupied 1–4KB of RAM. The entire Last Ninja soundtrack — title theme, six level themes, multiple ambient cues — likely sat in the 4–8KB range. The music in the High Voltage SID Collection, the canonical archive of C64 music, preserves these as .sid files: literally the extracted 6510 machine code and data, playable today by emulating the chip and CPU together.

Six-Thousand Hours of Context, Zero Abstractions

Every layer of the C64 game stack was hand-written 6510 assembly. The 6510 is a variant of the MOS 6502 running at just over 1MHz. There was no operating system to call, no standard library, no runtime, no allocator. A .PRG file is two bytes of load address followed by raw binary. It loads directly into RAM at the specified address and runs.

This meant programmers talked directly to the hardware at every level. Setting a sprite’s x-coordinate meant writing to $D000 with a store instruction. Playing a note meant writing a frequency value to $D400. Changing the background color meant writing to $D021. Self-modifying code was routine — changing an instruction’s operand at runtime to save the overhead of an indirect address lookup.

The constraint produced experts in the full vertical stack. A C64 programmer in 1987 understood the VIC-II raster timing well enough to schedule sprite moves between scanlines. They understood the SID filter’s behavior well enough to design patches around its quirks. They understood the 1541 floppy drive’s timing well enough to write fast loaders that bypassed the slow built-in routines. The demo scene that grew out of this era still produces 4KB and 64KB programs that generate real-time 3D graphics and synthesized audio, continuing the tradition of treating hardware limits as a design brief.

What 40KB Contained

For completeness: six distinct isometric worlds with unique tilesets, a full combat system covering punching, kicking, throwing stars, and melee weapons, enemy AI with patrol routes and combat behaviors, item collection and inventory mechanics, environmental puzzles, a full musical score rendered in real time by a three-channel analog synthesizer, sound effects layered over that music, a scrolling camera that tracked the player across connected screens, and an ending sequence.

That list in 2026 terms would be described as a small indie game’s feature set. In 40KB it was an extraordinary compression of design and engineering simultaneously.

The Modern Contrast

The comparison that makes this feel striking is not really about developer skill or effort. A single uncompressed JPEG from a modern phone camera runs 3–8MB. A minimal React application bundled for production comes in at 40–130KB before any application code. The lodash utility library minified is around 73KB. A blank .docx file is about 12KB. Modern software operates at a scale that makes 40KB feel like a rounding error.

The contrast is worth sitting with not as a complaint but as a calibration. The constraints of 1987 were not merely limitations — they were a forcing function that required every byte to justify its existence. Tile-sharing to compress map data, interrupt-driven music players measured in hundreds of bytes, sprite multiplexing to exceed hardware limits, level streaming to extend a 40KB envelope across six distinct worlds. Each technique solved a specific problem with the minimum necessary mechanism.

Modern software complexity is not simply waste. Supporting accessibility, internationalization, security models, operating system interfaces, network stacks, and user expectations that have grown by orders of magnitude over four decades genuinely costs bytes and abstractions. But 40KB fitting six worlds and a legendary SID soundtrack is a useful data point about what’s possible when the budget is absolute and the programmer cannot defer to a library.

Elite, the 1984 space trading game, fit an entire procedurally generated galaxy of 2,048 star systems into 22KB on the BBC Micro. The C64 version sat around 40KB. These numbers circulate because they represent a genuinely different relationship between programmer and machine, one where the constraint was not a budget line item but a physical fact of the hardware.

The Last Ninja’s 40KB is remarkable for the same reason a ship in a bottle is remarkable: not because ships shouldn’t be large, but because the craft required to get it through the opening is worth acknowledging on its own terms.

Was this interesting?