Writing a Login Shell in Assembly Reveals What a Shell Actually Costs

The login shell is one of the most taken-for-granted programs on a Unix system. It runs before anything else when you open a terminal or SSH in, sources your environment from /etc/profile and ~/.profile, and then sits in a read-eval loop waiting for commands. Bash does it. Zsh does it. Fish does it. All of them carry parsers, job control, history management, completion frameworks, and thousands of lines of C.

Geir Isene decided to write one in assembly, and the result is a precise demonstration of what a shell actually is at the kernel level.

What POSIX Actually Requires

The POSIX specification for sh defines login shell behavior with unusual precision. A shell is a login shell when argv[0] begins with a hyphen, or when it is invoked with the --login flag. In that case, it must read /etc/profile and then ~/.profile, in that order, before doing anything else. That is the entire official requirement. Everything beyond it, the ~/.bash_profile fallback chain, the ENV variable for interactive subshells, the /etc/profile.d/ snippet directory, is layered on top by individual shell implementations and Linux distributions.

The minimum viable login shell therefore needs to:

Detect that it was invoked as a login shell by inspecting argv[0][0]
Open and interpret /etc/profile and ~/.profile
Display a prompt
Read a line of input
Parse and execute commands
Repeat until EOF

If you are willing to trade full script interpretation for a simpler execution model, the scope contracts considerably.

The Syscall Surface

A shell’s entire interface with the Linux kernel is narrow. Here are the system calls a minimal interactive shell needs on x86-64:

sys_read   (0)  -- accept a line from the terminal
sys_write  (1)  -- display the prompt
sys_open   (2)  -- open profile files
sys_close  (3)  -- close file descriptors
sys_stat   (4)  -- check if a profile file exists
sys_execve (59) -- replace the child image with a command
sys_fork   (57) -- create a child process
sys_wait4  (61) -- reap the child after it finishes
sys_exit   (60) -- terminate
sys_chdir  (80) -- implement the cd builtin

Add sys_pipe (22) and sys_dup2 (33) for pipeline support. That is the list. No dynamic memory allocation, no signal disposition setup unless you want job control, no file descriptor table management beyond what the kernel already handles. The kernel does more of the work here than most people assume.

The core loop in x86-64 NASM looks like this:

section .data
    prompt  db "$ ", 0
    plen    equ $ - prompt

section .bss
    buf     resb 4096
    argv    resq 64

section .text
    global _start

_start:
.loop:
    ; write prompt to stdout
    mov rax, 1
    mov rdi, 1
    lea rsi, [rel prompt]
    mov rdx, plen
    syscall

    ; read a line from stdin
    mov rax, 0
    mov rdi, 0
    lea rsi, [rel buf]
    mov rdx, 4096
    syscall
    test rax, rax
    jle .exit          ; EOF or error

    ; tokenize buf into argv[], fork, execve, wait4...

    jmp .loop

.exit:
    mov rax, 60
    xor rdi, rdi
    syscall

The interesting work is in the parts elided above. Tokenizing the input buffer means walking bytes, replacing spaces with null bytes, and building a pointer array. Handling the envp argument to execve means reading the environment pointer array that the kernel placed on the stack at program startup, at the address [rsp + 8 + 8*(argc+1)] after _start. A shell that propagates environment variables to children can do so by passing that original envp pointer directly, without parsing or copying anything.

History Repeating

This is not a new idea. The CP/M CCP (Console Command Processor) was written in 8080 assembly and fits in roughly 2KB. It displayed a prompt, accepted commands, parsed them, and dispatched programs via BDOS calls, which are CP/M’s equivalent of syscalls. DOS’s COMMAND.COM followed the same pattern in 8086 real-mode assembly, handling internal commands like DIR and COPY directly and loading .COM and .EXE files via INT 21h for everything else.

The shell was originally an assembly program. The C implementation came later, as a convenience, and brought with it all the features that eventually required C to express cleanly. Writing a login shell in assembly is not an eccentric experiment; it is a return to the original form.

The Hard Parts

String parsing is tedious in assembly in proportion to how much flexibility you want. A minimal tokenizer that splits on spaces and newlines is perhaps 40 lines of NASM. A tokenizer that handles single and double quotes, escape sequences, variable expansion, and here-documents is closer to 400 lines and becomes its own project. This is where most assembly shell experiments stop: they handle simple commands but not real-world shell syntax, because real-world shell syntax carries decades of accumulated edge cases.

Sourcing profile files adds another layer. To find ~/.profile, the shell needs the user’s home directory. In C, this is a call to getpwuid(getuid()). In pure assembly without libc, that means calling sys_getuid (syscall 102), then opening /etc/passwd, reading it line by line, parsing the colon-separated fields, matching on the UID, and extracting the home directory field. The more practical approach is to walk the envp pointer array looking for a string that begins with HOME=, since login(1) and PAM set it reliably before the shell starts. That works in about 20 lines of assembly.

Detecting login shell mode is straightforward:

; argv[0] is at [rsp + 8] after _start prologue
mov rsi, [rsp + 8]   ; argv[0]
mov al, [rsi]
cmp al, '-'
je  .login_shell

Then opening and reading /etc/profile:

; open("/etc/profile", O_RDONLY, 0)
mov rax, 2
lea rdi, [rel etc_profile]
xor rsi, rsi           ; O_RDONLY = 0
xor rdx, rdx
syscall
; rax = fd, then loop read() into buffer

The execution from there is: read the file into a buffer, walk line by line, skip comments, and for lines that look like export FOO=bar, update the environment. Doing that without a working variable store or string interpolation is where a pure assembly login shell runs into fundamental scope questions about what it actually wants to be.

Compared to Minimal C Implementations

The xv6 shell, used in MIT’s operating systems course, implements a working shell in about 300 lines of C. It handles pipes, I/O redirection, and command sequences. It is the canonical reference for how small a shell can be while remaining genuinely useful.

dash, the Debian Almquist Shell and the default /bin/sh on Ubuntu and Debian, is about 8,000 lines of C across multiple files. It supports the full POSIX shell language, login shell initialization, and job control. BusyBox’s ash is around 13,000 lines in a single file.

The gap between those numbers tells you what the POSIX shell language costs once you commit to supporting it fully. An assembly shell that only runs external programs and implements cd and exit as builtins sits below the xv6 level in complexity, and that is exactly where projects like Isene’s live. They are not replacements for dash. They are proof that the kernel interface for a shell is much smaller than the shell language built on top of it.

Why This Is Worth Doing

There is a practical argument for a minimal login shell in environments where you control the entire stack: embedded systems, container base images, recovery partitions, or network appliances where every kilobyte matters and a dependency on libc is genuinely inconvenient. A shell written in assembly with no dynamic dependencies and no runtime overhead is a legitimate tool in those contexts, in the same way that the CP/M CCP was the right tool for a machine with 64KB of total RAM.

There is also the educational argument, which is probably what motivated this project. Writing a shell in C, following Stephen Brennan’s well-known guide or reading through the xv6 source, teaches you the fork/exec/wait model clearly. Writing the same thing in assembly adds one more layer of clarity: you see that the system call interface is the actual API, and C is a layer of notation over it. When the assembly shell calls sys_execve, there is nowhere else to look. The kernel does what it does, and you see exactly what you handed it.

Most programs hide this behind enough abstraction that the kernel feels distant. A login shell in assembly makes it visible: the shell is a loop over ten system calls, and everything else is a choice.