· 6 min read ·

Markdown Is Technically Broken and Practically Unbeatable

Source: hackernews

A post published on April 3rd is making the rounds on Hacker News this week, asking a question that resurfaces every few years in developer circles: why are we still writing Markdown? The 280-comment thread has the predictable energy of these debates, with people defending plaintext simplicity on one side and citing legitimate parsing nightmares on the other. Both sides are right. The interesting part is what that tells us.

What Markdown Actually Was

John Gruber published Markdown in March 2004. The original description was direct about its scope: a text-to-HTML conversion tool for web writers. The implementation was a Perl script. The spec was the README.

That last point matters. There was no formal grammar, no test suite, no definition of what to do when the syntax was ambiguous. Gruber wanted a lightweight way to write HTML without angle brackets everywhere, and he got one. What he did not produce was a portable, interoperable document format.

For years, that was fine. Most people used Markdown for blog posts in a single tool, and the edge cases only showed up when they used a different renderer. The real trouble started when the format spread.

The Spec Problem

By 2012, Markdown was everywhere: GitHub READMEs, Stack Overflow posts, Reddit comments, Jekyll sites. Every platform had implemented its own parser, and every parser made different decisions at the ambiguous edges. Nested lists inside blockquotes inside lists produced different output depending on whether you were using Python-Markdown, Discount, or marked.js. The emphasis parsing alone, with its rules about left-flanking and right-flanking delimiter runs, was implemented inconsistently across dozens of libraries.

In 2012, Jeff Atwood wrote a post calling for a canonical Markdown spec. The effort eventually became CommonMark, launched in 2014 with a 652-page specification written primarily by John MacFarlane, the same person who built Pandoc. CommonMark was rigorous. It defined behavior for every ambiguous case, provided a test suite with hundreds of examples, and specified exactly what a conforming implementation must do.

CommonMark did not fix Markdown. What it did was document exactly how broken the existing behavior was while picking a canonical answer for each ambiguity. The resulting spec is a monument to backward compatibility: a set of rules that no one would design from scratch, but which reflect what various parsers had independently decided to do with Gruber’s original Perl script.

GitHub shipped GitHub Flavored Markdown (GFM) as a CommonMark superset, adding tables, task list items, strikethrough, and autolinks. Every other major platform added its own extensions. Today, when someone says “Markdown,” they might mean any of: original Gruber Markdown, CommonMark, GFM, MDX, Obsidian’s wikilink dialect, Hugo’s Goldmark configuration, or one of a dozen others. The word describes a family of incompatible dialects loosely related by shared syntax for bold, italics, and headers.

The Specific Technical Failures

Some of Markdown’s problems are well-known. Others are genuinely subtle.

The most notorious is emphasis parsing. Consider *foo*bar*. Two of those three asterisks delimit emphasis, but which two? CommonMark has explicit rules about opening and closing delimiter runs based on surrounding whitespace and punctuation, but the rules are complex enough that most developers cannot predict the output without checking. The CommonMark spec section on emphasis and strong emphasis runs to over 80 examples illustrating corner cases.

List continuation is another persistent source of confusion. Whether a list item creates a loose or tight list (with or without paragraph spacing between items) depends on whether blank lines appear between items. Whether continuation text belongs to a list item or the surrounding paragraph depends on indentation measured in spaces, with special handling for tabs. These are not exotic edge cases; they are patterns that appear in normal technical writing.

Inline HTML passthrough is both a feature and a recurring security problem. Because Markdown allows raw HTML, any Markdown renderer that does not sanitize output is a potential XSS vector. Most renderers have added sanitization as an afterthought, creating yet another dimension of behavioral divergence.

The extension story is perhaps the worst part. Markdown has no native extension mechanism. There is no way to define a new block type or a new inline element within the format itself. Every tool that needs footnotes, definition lists, callout boxes, or math notation invents a syntax and ships a plugin. The result is documents that render correctly in one tool and break silently in another.

Djot: What a Real Fix Looks Like

In 2022, John MacFarlane, the person who wrote the CommonMark spec and who understands Markdown’s failure modes better than almost anyone alive, released Djot. He had spent years working with Markdown’s constraints while building Pandoc and writing the CommonMark specification, and Djot represents his answer to the question of what a clean redesign would look like.

Djot’s approach to emphasis is immediate evidence of the design philosophy difference. Instead of Markdown’s context-sensitive delimiter rules, Djot uses { and } as explicit span boundaries. {_emphasis_} is always emphasis; there is no need to reason about surrounding characters. For extensions, Djot has a proper generic block attribute syntax using the same curly-brace notation, so new block types can be added without inventing new syntax per extension.

Djot also removes HTML passthrough entirely. If you need raw HTML in output, you mark it explicitly as a raw block. This makes Djot safer by default and easier to reason about, since the document model is not contaminated with inline HTML that parsers must somehow interpret.

The parsing algorithm is simpler. MacFarlane designed Djot’s block structure to be parseable in a single linear pass without lookahead, unlike CommonMark’s more complex processing model. The result is a format that is demonstrably easier to implement correctly and consistently across different parsers.

Djot has implementations in Haskell, Lua, JavaScript, and Python. It has a clean spec. It solves the problems that Markdown has. Adoption has been modest.

Why Markdown Won Anyway

The failure of technically superior alternatives to displace Markdown is not a recent phenomenon. reStructuredText predates Markdown and is more expressive; the Python ecosystem adopted it for Sphinx documentation precisely because it handles complex technical documents better. AsciiDoc is used by O’Reilly technical books and by Red Hat for its entire documentation corpus, because it handles the kinds of structured content that real technical manuals require. Both formats have proper specs, extension mechanisms, and unambiguous parsers.

Neither ever came close to Markdown’s general adoption.

The inflection point for Markdown was GitHub’s decision in 2008-2009 to render Markdown in READMEs. Every open-source project that wanted a readable front page on GitHub learned Markdown. Every developer who learned it on GitHub carried it to their blog, their documentation, their notes app. The installed base compounded through network effects: more writers meant more readers meant more tool support meant more writers.

The LLM era has reinforced this. Language models are trained on enormous volumes of GitHub repositories, Hacker News discussions, Stack Overflow answers, and technical blog posts, the vast majority of which are written in some Markdown dialect. Models now generate Markdown instinctively. When you ask for a structured response, you get headers and bullet points in Markdown syntax. This is not a deliberate design choice; it is the natural output of models that have seen more Markdown than any other structured text format. Chat interfaces, notebook tools, and documentation generators all consume this output and render it, adding another layer to the ecosystem.

The Actual Answer to the Question

The bgslabs retrospective frames the persistence of Markdown as a puzzle worth solving. The answer is less interesting than the question suggests. Markdown won because it was embedded into the most important social layer of software development (GitHub) before better alternatives had comparable tooling, and now the switching cost is real enough that no individual actor has sufficient incentive to absorb it.

This is not unique to markup formats. It is the standard story of technical incumbency: QWERTY keyboards, the x86 instruction set, JavaScript. The technically inferior option wins a critical distribution moment, tooling accretes around it, and the format becomes the format not because it is good but because everything already supports it.

What makes the Markdown case instructive is the clarity of the documentation. MacFarlane wrote the most rigorous possible specification of Markdown’s behavior (CommonMark), identified its irreducible problems, and then built a clean replacement (Djot). The replacement is better by any technical measure. The incumbent keeps winning.

If you are building a new tool with no legacy Markdown content to support, Djot is worth a serious look, as is AsciiDoc for documentation-heavy projects. But if you are asking whether Markdown will still be the default format developers reach for in five years, the honest answer is almost certainly yes, for the same reason it is the answer today.

Was this interesting?