A Shell Is Just Fork and Exec Until It Isn't

Andrew Healey’s walkthrough of building a shell in Python covers the essential arc: read a line, split it into tokens, fork a child, exec the command, wait for it to finish. That loop, maybe 50 lines of code, is a working shell. It is also the point where most educational treatments stop, just before things get genuinely hard.

The basic loop is elegant because it maps almost directly onto three system calls: fork(2), exec(2), and wait(2). The kernel does the heavy lifting. What the tutorial version tends to gloss over is the contract the shell has to maintain with the kernel to make job control, pipes, and signal handling work correctly, and that contract has subtleties that will silently break your shell in ways that take hours to debug.

The Pipe File Descriptor Leak

Implementing ls | grep foo | wc -l feels straightforward once you understand dup2. Create a pipe, fork, wire stdout of the first child to the write end, wire stdin of the second child to the read end. But there is a trap most implementations fall into: the parent process holds open file descriptors to every pipe it creates.

int pipe_fds[2];
pipe(pipe_fds);  // pipe_fds[0] = read end, pipe_fds[1] = write end

pid_t pid = fork();
if (pid == 0) {
    // child: wire stdout to write end
    dup2(pipe_fds[1], STDOUT_FILENO);
    close(pipe_fds[0]);
    close(pipe_fds[1]);
    execvp(argv[0], argv);
}
// Parent MUST close its copies
close(pipe_fds[0]);
close(pipe_fds[1]);

If the parent does not close pipe_fds[1], the read end of the pipe will never receive EOF. The consumer process blocks forever, waiting for data that will never arrive, because the shell itself holds the write end open. This is one of those bugs that only manifests when you try to pipe into wc or cat and the command just hangs. The kernel’s reference counting on file descriptors is the mechanism, and you have to respect it even in the parent.

For pipelines with three or more stages, the pattern generalizes: create all pipes upfront, then after forking all children, close every pipe end in the parent.

`exit()` vs. `_exit()` in the Child

This distinction is subtle enough that experienced C programmers get it wrong. After fork(), the child process has a full copy of the parent’s memory, including buffered I/O state from the C standard library. If the child calls exit(), the C runtime flushes those buffers and runs all atexit() handlers registered by the parent. In a shell with readline or any prompt-rendering code, this can produce duplicate output or corrupted terminal state.

The correct call in the child, after a failed exec, is _exit(). It terminates the process immediately without running cleanup:

execvp(argv[0], argv);
// execvp only returns on failure
perror(argv[0]);
_exit(127);  // 127 is the conventional "command not found" exit code

The Python equivalent is os._exit() rather than sys.exit() or raise SystemExit. The distinction matters less in Python because buffer flushing semantics differ from C, but the principle holds: after fork, before exec, do not let the child run any cleanup meant for the parent.

Why `cd` Cannot Be a Child Process

The set of commands that must run inside the shell process rather than in a child is not arbitrary. Every builtin exists because it needs to mutate state that is local to a process. chdir(2) changes the working directory of the calling process. If you fork and call chdir in the child, the parent’s working directory is untouched. The user types cd /tmp and nothing happens.

The same logic applies to export and unset, which modify the environment passed to future execve calls; source, which executes a script in the current shell context so its variable assignments persist; exec, which replaces the shell process entirely; and ulimit and umask. These are not performance shortcuts. They are correctness requirements enforced by the Unix process model.

This is why lookup order matters: aliases, then functions, then builtins, then $PATH. Reaching $PATH means you are going to fork. Matching a builtin means you are calling a function directly in the shell process.

The POSIX Expansion Pipeline

Most shell tutorials implement variable expansion as a simple string substitution. POSIX specifies an ordered pipeline of seven transformations that must happen in sequence, and the order is not interchangeable:

Tilde expansion (~ becomes $HOME)
Parameter expansion ($VAR, ${VAR:-default}, ${#VAR}, etc.)
Command substitution ($(cmd) or the backtick form)
Arithmetic expansion ($((expr)))
Word splitting (split on $IFS characters)
Pathname expansion, or globbing: *, ?, [abc]
Quote removal

The reason order matters is that word splitting happens after parameter and command expansion, but before globbing. If $VAR expands to foo bar, word splitting turns it into two words. To prevent that, you quote the variable: "$VAR". Quote removal then strips the quotes at the end. Implementing this correctly requires threading quote state through the entire expansion pipeline, not just the tokenizer.

Reading the POSIX grammar is a useful exercise in understanding why bash is around 130,000 lines of C while a tutorial shell is 300.

Job Control Is Where Toy Shells Stop

Job control, the ability to run processes in the background with &, stop them with Ctrl+Z, and bring them forward with fg, requires the shell to maintain a table of jobs and actively manage process groups and terminal ownership.

On startup, a proper interactive shell puts itself in its own process group and takes control of the terminal:

shell_pgid = getpid();
setpgid(shell_pgid, shell_pgid);
tcsetpgrp(STDIN_FILENO, shell_pgid);
tcgetattr(STDIN_FILENO, &shell_tmodes);  // save terminal state

When launching a foreground job, the shell creates a new process group for it, transfers terminal control with tcsetpgrp, and waits with WUNTRACED so it can detect if the job is stopped rather than finished. When the job finishes or stops, the shell reclaims the terminal and restores its saved terminal attributes. When launching a background job, it skips the tcsetpgrp call entirely and collects the child later via SIGCHLD.

The shell itself must ignore SIGINT, SIGQUIT, SIGTSTP, SIGTTIN, and SIGTTOU. If it does not, pressing Ctrl+C at the wrong moment kills the shell rather than the foreground job. Children inherit these ignored dispositions and must reset them to SIG_DFL before calling exec, so the programs they run behave normally.

There is also a race condition in setpgid worth noting: both the parent and the child call setpgid(child_pid, child_pgid) after the fork, because either might run first. The call is idempotent when the process group already exists, so the redundancy is intentional.

Performance: Where `fork` Costs You

For short-lived scripts that invoke many subcommands, shell startup time and fork cost matter. The dash shell starts in roughly a millisecond on modern hardware; bash takes around five milliseconds with a typical configuration. The difference compounds in configure scripts or test harnesses that spawn thousands of subshells.

Dash uses vfork() in places where the child will immediately call exec without mutating shared state. vfork suspends the parent and lets the child share its memory space temporarily, avoiding the copy-on-write page table setup that fork requires. It is risky to use incorrectly but measurably faster for simple command execution. posix_spawn() provides a safer high-level interface that maps to vfork-style semantics on platforms where fork is expensive.

Bash and dash both maintain a hash table of previously resolved command paths to avoid repeated $PATH searches. The hash builtin exposes this table. In a tight loop calling the same external command repeatedly, the difference between a hash hit and a full PATH scan is small but adds up.

What the Exercise Teaches

Building a shell, even a toy one in Python as in Healey’s article, forces you to reason about the Unix process model at a level that most application programming never requires. You learn why fork semantics exist, what file descriptors are, and why the terminal is a shared resource with an owner.

Going further, toward job control and correct POSIX expansion, exposes the accumulated design decisions of decades of Unix shell history. The dash shell implements most of POSIX in around 15,000 lines of C. Bash is closer to 130,000 lines. The distance between them is the feature surface, and the distance between either and a tutorial shell is the difference between a contract and a sketch.

The sketch is still worth writing.