Geir Isene recently published a login shell written in assembly, and the project deserves more than a quick appreciation for its compactness. Writing a shell in assembly is the kind of exercise that forces you to answer questions you normally defer to libc and POSIX: what exactly does a login shell do, what does the kernel actually provide, and how thin is the membrane between a working shell session and the raw process execution model?
What Makes Something a Login Shell
The POSIX spec distinguishes login shells from non-login shells by one behavioral difference: a login shell reads /etc/profile and then ~/.profile before presenting a prompt. The distinction matters because login shells are responsible for establishing the environment that every subsequent process inherits. They set HOME, USER, SHELL, PATH, and any site-local variables. A non-login interactive shell opened in a terminal emulator inherits an already-configured environment; it does not need to re-read profile files because the login shell already ran.
Bash and zsh blur this with additional files (/etc/bash.bashrc, ~/.bashrc, ~/.zshrc), but the minimal POSIX definition is clean: read two files, set up the environment, present a prompt. Everything else is features layered on top.
This matters for the assembly implementation because sourcing a profile file turns out to be a non-trivial operation without libc. You need to open a file, read it line by line, parse environment variable assignments, handle export, skip comments, and call execve for any commands found. Each step requires system calls and manual string handling with no help from the C runtime.
The Syscall Surface
On Linux x86_64, every kernel service is accessed via the syscall instruction. Arguments go in rdi, rsi, rdx, r10, r8, r9, with the syscall number in rax. The return value comes back in rax, with errno signaled by a negative value in the range -1 to -4095.
A minimal login shell needs a short list of syscalls:
| Syscall | Number | Purpose |
|---|---|---|
read | 0 | Read input from stdin or a file |
write | 1 | Print the prompt and output |
open | 2 | Open profile files and executables |
close | 3 | Close file descriptors |
stat | 4 | Check if a path exists before exec |
fork | 57 | Spawn a child to execute commands |
execve | 59 | Replace a process with a command |
wait4 | 61 | Wait for a child process to exit |
exit | 60 | Terminate the shell |
getcwd | 79 | Get the current directory for the prompt |
chdir | 80 | Implement cd |
getuid | 102 | Determine if running as root |
Pipeline support adds pipe (22) and dup2 (33). Signal handling adds rt_sigaction (13). Those twelve syscalls cover the core behavior of a working login shell.
The NASM skeleton for reading a line from stdin looks like this:
section .bss
buf resb 256
section .text
read_line:
mov rax, 0 ; sys_read
mov rdi, 0 ; fd = stdin
mov rsi, buf ; buffer address
mov rdx, 256 ; max bytes
syscall
; rax holds bytes read, or negative errno
ret
The pattern here is mechanical. The complexity begins when you try to do something useful with the bytes that come back.
The Parts That Are Genuinely Hard
PATH searching. When the user types ls, the shell needs to search each directory in $PATH in order, appending /ls to each entry and calling stat or access to check existence before calling execve. In C, execvp handles this automatically. In assembly, you split the PATH string on :, iterate each component, concatenate the command name, check the resulting path, and execute. Concatenation means careful register and memory management, since there is no strcat or sprintf. You maintain source pointers, destination pointers, and length counters manually.
Environment variable expansion. Substituting $HOME or $PATH inside a command string requires scanning for $, extracting the variable name, searching the environ array for a matching KEY=value string, and inserting the value into the output buffer. The environ pointer is at a known offset from the initial stack frame in a Linux x86_64 process: past argc and argv in the aux vector. Accessing it without libc means knowing the System V AMD64 ABI process startup layout, where [rsp] holds argc, [rsp+8] points to the first argv entry, and the environment pointer array starts after the null terminator of argv.
Argument splitting. A command like grep -r "foo bar" . requires splitting on whitespace while respecting quotes. The state machine has at minimum four states: unquoted, single-quoted, double-quoted, and backslash-escaped. Writing that in assembly is verbose but finite, and it is one of those implementations where bugs become immediately visible because there is no error handling infrastructure to obscure the failure.
Profile sourcing. Executing /etc/profile means running each line as a command in the current shell’s environment, not in a child process. Statements like export VAR=value must modify the environment of the live shell process, which means maintaining a growable array of string pointers. Without malloc, this is typically a fixed-size static buffer or a simple bump allocator in .bss. The bump allocator approach works well for a login shell because the number of environment variables set during profile sourcing is bounded and predictable.
Why It Stays Small
The binary size of an assembly shell is small because there is no linked runtime. Bash on a typical x86_64 Linux system is about 1.1MB stripped. Dash, the minimal POSIX sh used as /bin/sh on Debian and Ubuntu, comes in around 150KB stripped. A shell written in assembly with no external dependencies can fit well under 20KB; a truly minimal implementation with no pipeline support and no job control sits closer to 8-12KB.
That difference matters in concrete contexts. Embedded systems with read-only firmware partitions count kilobytes. Container base images benefit from small init processes. Initial ramdisks for initramfs boot sequences benefit from shells that are fast to load and do not pull in shared library resolution overhead. The BusyBox project has long occupied this space by combining dozens of Unix utilities into a single statically linked binary around 1MB total, but even BusyBox links against a C runtime. A pure assembly shell eliminates the runtime entirely.
The closest comparison is the size-coding tradition in the demoscene, where programmers hand-craft x86_64 ELF binaries under 512 bytes for specific tasks. A shell is orders of magnitude more complex than a demoscene intro, but the same technique of eliminating all padding and runtime overhead applies at the structural level.
What the Exercise Clarifies
A login shell written in assembly will be missing readline, tab completion, job control signal forwarding, POSIX parameter expansion, and arithmetic evaluation. For daily use, you would reach for dash within seconds. The output binary is not the point.
What the exercise produces is a precise understanding of which layer is doing what work. The kernel provides: process spawning via fork and execve, file I/O via descriptors, signal delivery, and memory via anonymous mappings. The C standard library provides: string functions, formatted I/O, environment manipulation, and dynamic memory allocation. The shell itself provides: parsing, variable expansion, job control, and interactive features.
When you write in assembly, you personally implement everything in the second and third categories. There is no confusion about which layer handles which concern, because you have written all of it. Reading the POSIX shell specification alongside an assembly implementation makes the spec concrete in a way that reading it against a bash source tree does not, simply because bash has too many layers between the spec and the machine.
Geir’s project is a concrete artifact of someone who wanted to know exactly what runs at login, with no components they had not personally built. The binary is small, the scope is intentionally narrow, and the understanding it produces is not something you can get from reading documentation alone.