Seven Roots, Infinite Branches: The Ancestral Languages Behind Every Language You Know

Most programmers work inside a single lineage their entire career without realizing it. They learn Python, then JavaScript, then maybe Go or Rust, and each transition feels like growth. But these languages share deep structural DNA. They are all, in the taxonomy Fred Ross laid out in 2022, dialects of ALGOL.

Ross’s central claim is that every programming language descends from one of seven ur-languages: ALGOL, Lisp, ML, SNOBOL, APL, Prolog, and Forth. Not seven families in the way taxonomists group species by shared traits, but seven genuinely distinct computational worldviews, each one a different answer to the question of what a program fundamentally is.

The claim holds up better than it first appears, and the interesting part is not the taxonomy itself but what it implies: most working programmers have only ever inhabited one of these worldviews, and the remaining six are not just syntax variations but different theories of computation.

ALGOL: The Sea We Swim In

ALGOL 60 gave us the template that most programmers recognize as “programming”: named variables, assignment, conditionals, loops, and procedures organized into blocks. The language itself is almost never used today, but its descendants include C, Pascal, Ada, Java, Python, Ruby, Go, Rust, Kotlin, and Swift. The ALGOL worldview is so dominant that it tends to be invisible. When programmers say a language “feels natural,” they usually mean it resembles ALGOL.

The core model is the von Neumann machine made textual. A program is a sequence of state mutations. Variables name locations in memory. Loops and branches control which mutations happen. Procedures abstract over sequences of mutations and accept arguments. This model maps so cleanly to real hardware that it became the default, and the default became the assumption.

Rust is instructive here. It introduces ownership, borrowing, and lifetimes, which are genuinely novel additions to the type system, but its execution model is still deeply ALGOL. You write statements, assign to variables, loop with for and while, and call functions. The innovations are in the type system sitting above the ALGOL skeleton, not in a replacement of the skeleton itself.

Lisp: Code as Data, Data as Code

Lisp, designed by John McCarthy in 1958, starts from a different premise. A program is not a sequence of statements; it is a symbolic expression that can be manipulated by other expressions. The syntax of Lisp is its data format. A list (+ 1 2) is simultaneously a call to + and a data structure you can construct, inspect, and transform at runtime.

This property, homoiconicity, makes Lisp macros qualitatively different from C preprocessor macros or Python decorators. A Lisp macro receives unevaluated syntax as data and returns new syntax. You can write a while loop in Lisp as a library function because you can accept the condition and body as unevaluated expressions, wrap them in the recursive machinery you want, and return code the compiler then sees. Paul Graham’s essays on Lisp from the early 2000s made this point forcefully: features that other languages add as syntax can be added to Lisp as libraries.

The Lisp family today includes Common Lisp, Scheme, Racket, Clojure, and various Lisp-flavored languages embedded in other systems. Clojure in particular brought the ideas into practical production use on the JVM, and its persistent data structures influenced languages far outside the Lisp family. JavaScript’s prototype system and dynamic evaluation were influenced by Scheme, and the eval function that JavaScript inherited reflects the Lisp worldview’s attitude toward the line between code and data.

ML: Types as Theorems

ML emerged from the Meta Language of Robin Milner’s LCF theorem prover in the early 1970s. Its central contribution is the Hindley-Milner type system, which allows a compiler to infer most types without explicit annotation while still guaranteeing type safety. The programmer writes let f x = x + 1 and the compiler knows f has type int -> int without being told.

But the deeper contribution is algebraic data types and exhaustive pattern matching. In ML, you define a type by listing its possible shapes:

type shape =
  | Circle of float
  | Rectangle of float * float
  | Triangle of float * float * float

and then pattern match over it:

let area = function
  | Circle r -> Float.pi *. r *. r
  | Rectangle (w, h) -> w *. h
  | Triangle (a, b, c) ->
    let s = (a +. b +. c) /. 2.0 in
    sqrt (s *. (s -. a) *. (s -. b) *. (s -. c))

The compiler verifies that every case is handled. Add a new variant to shape and every pattern match in your codebase becomes a compile error until you handle it. This is not a convenience feature; it changes the contract between programmer and type system.

ML’s descendants include Haskell, OCaml, F#, and, in a significant sense, Rust. Rust borrowed algebraic data types and exhaustive pattern matching almost directly from ML, which is why Rust feels more like OCaml than like C despite being a systems language. The Result and Option types, the match expression, even the way Rust handles enums, all trace back to ML’s type theory. Swift made similar choices, which is why Swift’s enum with associated values looks familiar to anyone who has written OCaml.

APL: The Array as the Unit of Computation

Kenneth Iverson designed APL in the early 1960s, initially as a mathematical notation before it became a programming language. The central idea is that operations should apply to entire arrays by default, with rank polymorphism letting the same function work on scalars, vectors, matrices, and higher-dimensional arrays without the programmer writing explicit loops.

APL uses a dense symbolic notation that looks impenetrable to outsiders. The expression +⌿÷≢ computes the mean of an array: sum along the first axis, then divide by the count. Modern array languages like J, K, BQN, and Q carry this tradition forward.

The APL worldview has leaked into mainstream programming more than most programmers realize. NumPy’s broadcasting rules are a restricted form of APL’s rank polymorphism. When you write array * 2 in NumPy and it multiplies every element without a loop, you are using an APL idea. The same is true of R’s vectorized operations and pandas’ apply-style computation. The entire data science ecosystem runs on APL concepts mediated through friendlier syntax.

Where APL diverges most sharply from the ALGOL worldview is in its attitude toward loops. In ALGOL, iterating over a collection requires explicit control flow. In APL, the loop is implicit in the operation itself; you operate on the whole structure and the runtime handles the iteration. This is not just syntactic sugar. It reflects a different theory of what the primitive unit of computation should be.

SNOBOL: Pattern as Control Flow

SNOBOL, developed at Bell Labs in the 1960s, treats pattern matching not as a test within a larger program but as the primary mechanism of execution. A SNOBOL program consists of pattern-directed rules; matching a pattern against a string can succeed or fail, and success or failure determines which rules execute next.

The descendant most working programmers have encountered is AWK, which organizes a program as a set of pattern-action pairs applied sequentially to input lines. Perl drew heavily from SNOBOL’s approach to string processing, as did Icon, a more principled successor that introduced generators and goal-directed evaluation.

The SNOBOL worldview appears in modern code whenever someone uses a parser combinator library. Libraries like nom in Rust or parsec in Haskell are essentially SNOBOL’s pattern-directed computation idea expressed in a typed functional language. The mental model is the same: define patterns that match or fail, compose them to build larger patterns, and let the matching process drive computation.

Regular expressions are a thin slice of this worldview available in almost every language, but regexes are deliberately limited. SNOBOL patterns are Turing-complete in a way that regexes are not, and the full pattern-directed execution model supports recursive descent parsing and backtracking in ways that regular expressions do not.

Prolog: Computation as Proof Search

Prolog, developed by Alain Colmerauer and Robert Kowalski around 1972, treats a program as a set of facts and rules in predicate logic. Computation is the process of proving a query by searching through the space of derivations using unification and backtracking.

parent(tom, bob).
parent(tom, liz).
parent(bob, ann).

grandparent(X, Z) :- parent(X, Y), parent(Y, Z).

Asking grandparent(tom, ann) causes the system to search for a binding of intermediate variables that satisfies the definition. This is not a procedure call; it is a logical proof. The programmer declares what is true, not how to compute it.

Prolog’s influence is most visible in Datalog, a restricted logic programming language that has found serious application in program analysis, database query optimization, and security policy specification. Soufflé, a Datalog variant, is used in production static analysis tools. Datomic uses Datalog as its query language. The language Mercury applies ML-style typing to the Prolog execution model.

SQL is arguably a dialect of the Prolog worldview: you declare what result you want, and the query planner figures out how to produce it. The WHERE clause is a conjunction of constraints, JOIN is relational composition, and the optimizer performs a kind of proof search over possible execution plans. Most programmers who say they have never used Prolog use its ideas every time they write a nontrivial SQL query.

Forth: Composition as the Primitive

Forth, created by Charles Moore in the late 1960s, builds on a stack machine with a minimal vocabulary of words. A program is a sequence of words; each word pops values from the stack, operates on them, and pushes results back. New words are defined in terms of existing words. That is the entire language.

: square  dup * ;
: cube    dup square * ;
: sum-of-squares  square swap square + ;

The Forth worldview sees computation as the composition of small transformations. The stack is the universal communication mechanism between words; you never name intermediate results. Modern concatenative languages like Factor and Joy formalize this into a theory where programs are functions from stacks to stacks and composition of programs is just concatenation of their text.

Forth’s minimal footprint made it the language of choice for embedded systems where memory was measured in kilobytes. The GForth implementation runs in roughly 200KB. PostScript, the page description language, is a Forth-like stack language. WebAssembly’s execution model is a typed stack machine, which is not a coincidence; stack machines are easy to verify and optimize.

The concatenative paradigm also describes something fundamental about Unix pipes. cat file | grep pattern | sort | uniq -c is a composition of stack-like transformations where each stage consumes the output of the previous one. The data flows implicitly, named by its position in the pipeline rather than by a variable.

What the Map Reveals

Laying out these seven families next to each other clarifies something about the current state of mainstream programming. The ALGOL family is dominant, ML ideas have been steadily absorbed into mainstream languages over the past decade, and Lisp ideas arrive filtered through functional programming trends. But APL, Prolog, and Forth remain largely outside the mainstream, visited occasionally by specialists and then set aside.

This is not because those paradigms are weaker. Array programming in the APL tradition routinely outperforms loop-based code by factors that matter: vectorized operations exploit SIMD instructions and cache locality in ways that explicit loops cannot match without heroic optimization effort. Logic programming makes certain kinds of constraint satisfaction problems trivially expressible. Concatenative programming produces code where every intermediate step is testable and composable in a precise formal sense.

The practical lesson from this taxonomy is not that you should rewrite your services in J or Prolog. It is that spending time with one language from each family changes the questions you ask when writing in any language. After using APL seriously, you start noticing when you are iterating explicitly over a structure that could be transformed wholesale. After using Prolog, relational schemas and SQL queries read differently. After using Forth, you start decomposing functions more aggressively.

The seven ur-languages are not competitors. They are different instruments. Most of us have only learned to play one of them.