· 6 min read ·

When the Kotlin Creator Turns to Specs for LLMs, It Is Worth Paying Attention

Source: hackernews

Andrey Breslav spent roughly a decade shaping Kotlin into one of the more widely adopted languages of the past twenty years. His instincts there were consistently pragmatic: fix the real pain points in Java development, keep interop tight, bake null safety in at the type system level, and avoid the kind of theoretical purity that makes a language interesting to academics but difficult to ship production software in. So when Codespeak landed on Hacker News in March 2026 with 271 points and 236 comments, it was not the novelty of the idea that made people pay attention. It was who was building it.

The core claim is stated in the tagline: talk to LLMs in specs, not English. That framing deserves unpacking, because it sits at the intersection of a genuine engineering problem and a space that has already generated several serious attempts at a solution.

The Actual Problem With Prompts as Prose

Every developer who has spent meaningful time building LLM-backed applications runs into the same friction. You write a prompt in English, it works for a while, then some edge case produces output that does not match what you intended. You refine the prompt, reword a sentence, shuffle the order of instructions. The system gets more fragile as it gets more specific. There is no static analysis of an English string. There is no type system catching the mismatch between what you described and what the model understood. The whole thing lives in a directory of .txt files with no better tooling than grep.

This is not a complaint about LLM capability. Models have gotten considerably better at following instructions. The problem is structural: natural language is ambiguous by design, and that ambiguity is a feature when humans are having a conversation but a liability when you are trying to specify an interface contract.

The gap between what you said and what you meant is always there in English prose. In a formal specification, that gap does not exist. The spec either captures the constraint or it does not, and a type checker can tell you which.

Prior Art That Got There First

Breslav is not the first person to notice this. The space of structured LLM interaction has been active for several years.

TypeChat, released by Microsoft Research in 2023, took a direct approach: instead of writing an English prompt to describe what structure you want from an LLM, you write a TypeScript type definition. The model is given the type and told to produce a value that conforms to it. The TypeScript type checker validates the output. That is genuinely useful, and the ergonomics are good if you already live in TypeScript.

type SentimentResponse = {
  sentiment: 'positive' | 'negative' | 'neutral';
  score: number; // 0.0 to 1.0
  explanation: string;
};

DSPy from Stanford, which went through significant development through 2024, takes a different angle: you define typed signatures for LLM modules, then the framework optimizes the actual prompts automatically based on training examples. The idea is that the developer should express intent through signatures, not implementation details in prose. DSPy separates the what from the how at a level that raw prompt engineering never could.

Instructor wraps OpenAI and other APIs with Pydantic validation, so the model must return JSON matching a schema you define in Python. Outlines goes further down into the decoding layer and constrains token generation directly using JSON Schema or regular expressions, making it structurally impossible for the model to produce output that violates the schema. LMQL from ETH Zurich introduced a SQL-like query language for LLMs that supports typed variables and constrained decoding.

All of these are libraries. Some are frameworks. What Codespeak appears to be is a language, and that distinction matters more than it sounds.

What a Language Gets You That a Library Cannot

A library gives you an abstraction layer. A language gives you a compilation target, a formal grammar, a static analysis surface, and purpose-built tooling. When you write a Pydantic model in Python to constrain LLM output, you are repurposing a tool built for data validation. The semantics of your intent live in documentation and convention, not in the language itself.

A dedicated language can have semantics that are specifically designed for the domain. Preconditions and postconditions over model behavior. Typed slot definitions that compile to whatever the target model expects, whether that is a JSON Schema, a function calling spec, a structured output schema, or a raw prompt. IDE support that understands the semantics well enough to catch errors before you run anything. The compiler becomes the first reviewer of your specifications.

This is exactly where Breslav’s background is relevant. Kotlin’s type system was not invented fresh; it borrowed from Scala, Groovy, and others. But it made specific choices, particularly around null safety and the handling of platform types from Java, that reflected a coherent view of what mattered in practice. The same design instincts applied to a language for LLM specifications would produce something considerably different from a library, even a very good one.

The Probabilistic Tension

There is an honest skepticism worth naming. LLMs are probabilistic systems. A specification is a deterministic contract. You can enforce output schema compliance at the decoding level (as Outlines does) or through retry-and-validate loops (as Instructor does), but you cannot make a probabilistic model deterministically conform to a spec in the way that a type checker enforces types in compiled code. The spec constrains the form; it cannot guarantee the substance.

That said, the gap between a spec-constrained output and a prose-instructed output is real and measurable. Structured output modes in the major APIs have improved substantially. OpenAI’s JSON mode and structured outputs, Anthropic’s tool use with typed parameters, and constrained decoding libraries all demonstrate that you can get reliable schema conformance for a wide range of tasks. The model’s judgment still determines the semantic content, but the structural contract is enforceable. That is a meaningful improvement over hoping your prose instructions were precise enough.

A language like Codespeak would, in practice, sit on top of those mechanisms. The compiler transforms your spec into whatever the model API needs. The abstraction lets you write once and target multiple models, which has practical value as the model landscape continues to shift.

What the Kotlin Lineage Suggests

There is a pattern in how JetBrains-lineage tools succeed. They do not try to replace the existing ecosystem wholesale; they make it easier to work with it. Kotlin ran on the JVM and interoperated with Java before it did anything else. IntelliJ IDEA became the foundation for a suite of IDEs across multiple languages. The tooling is treated as a first-class deliverable.

If Codespeak follows that pattern, it will probably have IDE integration early, will target existing model APIs rather than requiring a specific runtime, and will make the migration path from raw prompts gradual rather than requiring a full rewrite. The Hacker News comment thread for the announcement drew comparisons to IDL (Interface Definition Language), to OpenAPI, and to the structured output work already in the ecosystem, which is roughly the right set of reference points.

The question is whether a dedicated language crosses the adoption threshold that several well-regarded libraries have not. DSPy is powerful but has a learning curve that discourages casual adoption. TypeChat is elegant but narrow in scope. The track record for new programming languages, even good ones, is humbling. Most do not make it.

But Breslav has shipped a language that tens of millions of developers use. The idea behind Codespeak is sound, the timing is right as LLM integration moves from experiments to production systems, and the prior art has validated the core thesis without landing a definitive solution. That is a reasonable combination of conditions for something worth watching closely.

Was this interesting?