· 6 min read ·

Constraint Decay: Why LLM Agents Forget the Rules Halfway Through a Backend

Source: hackernews

There is a particular flavor of frustration you get when an LLM agent writes a backend for you. The first endpoint is clean. The second mostly follows the conventions you specified. By the time it gets to the fifth, the auth middleware has quietly disappeared, the error envelope has mutated, and the ORM you told it to use has been silently swapped for raw SQL in two files. The code compiles. The tests it wrote pass. And the constraints you spent your prompt carefully enumerating have evaporated.

A paper making the rounds on Hacker News gives this phenomenon a name: constraint decay. The authors study LLM agents on multi-file backend generation tasks and show that constraint adherence degrades monotonically as the number of generation steps grows, even when the constraints are repeated in every turn. The effect is robust across frontier models and across agent scaffolds. I want to dig into why this happens, what it implies for how we build with agents, and where it fits in the broader literature on long-horizon LLM behavior.

What the paper actually measures

The setup is straightforward. The authors give an agent a backend specification that includes both functional requirements (endpoints, schemas, behaviors) and non-functional constraints (logging format, error envelope, ORM choice, auth pattern, file layout). They then measure two things at each step: did the agent satisfy the functional requirement, and did the new code respect every previously stated constraint?

The headline result is that functional success stays roughly flat across steps, while constraint adherence falls off a cliff. By step ten, agents are violating around 40 percent of the non-functional constraints they followed perfectly at step one. Re-injecting the full constraint list into every prompt slows the decay but does not stop it. The authors call this asymmetry the core of the fragility: the model keeps shipping working code while quietly abandoning the rules that make the code fit the project.

This is consistent with what the Lost in the Middle line of work has been showing for years. Transformers attend unevenly across long contexts, with a strong recency bias and a U-shaped retrieval curve. When your constraint list sits at position 200 and the agent is generating at position 12,000, the constraints are mechanically less salient than the file it just wrote. Anthropic’s own long-context engineering guidance basically concedes this: context is a finite resource, and what you put in it competes for attention.

Why repeating the constraints does not save you

The intuitive fix is to repeat the rules in every system prompt. The paper shows this helps but does not solve the problem, and I think the reason is interesting.

When an agent generates file number five, the dominant signal in its context is files one through four. Those files are concrete, specific, and recently written. The constraint list is abstract and was written by a different author (you). If file three happens to contain a small deviation from the constraints, that deviation becomes part of the implicit style the model is now imitating. The model is doing in-context learning on its own prior output, and its prior output is drifting. Each step amplifies the drift of the previous step. This is the same dynamic that Shumailov et al. described as model collapse in the training-data context, but happening inside a single agent session.

There is a second mechanism worth naming. Constraints are usually negative: do not use raw SQL, do not skip the auth decorator, do not return errors outside the envelope. Negative instructions are systematically harder for LLMs than positive ones. Recent work on negation finds that models often process do not X as semantically closer to X than to do Y. When you tell an agent ten things not to do, you have given it ten subtle pointers toward doing them.

The scaffolds in the wild

This is where the paper lands on top of an interesting moment. Almost every serious agent harness has converged on a small set of mitigations that, viewed through the constraint-decay lens, are all attempts to fight the same problem.

Claude Code leans heavily on CLAUDE.md files committed to the repo. The trick is that these files are reloaded on every session and live closer to the working files in retrieval order, so the constraints are anchored to the code rather than to a system prompt that decays. Cursor’s rules work the same way. Aider’s conventions file is explicit about this: put your constraints in a file the model rereads.

Subagents are the other half of the response. The Agent SDK pattern of spawning a fresh subagent for a bounded task is, mechanically, a constraint-decay reset. The subagent’s context is short, its constraints are at the top, and it returns a summary rather than dragging its full transcript back into the parent. Anthropic’s multi-agent research post frames this as a token-efficiency win, but the constraint-adherence win might matter more.

The other emerging mitigation is post-hoc enforcement: linters, type checkers, custom validators, and hooks that run after every edit. If the model cannot be trusted to remember the rule, you can at least make the rule machine-checkable and replay the failure into the next turn. This is the philosophy behind DSPy’s assertions and the broader move toward programmatic constraints. The harness becomes the memory the model lacks.

Where the paper underclaims

My main reservation reading the paper is that it treats constraint decay as a property of LLM agents in the abstract, but the magnitude depends heavily on how the constraints are encoded. A constraint expressed as a TypeScript type, a Pydantic model, or a lint rule decays much less, because every generation step exposes the model to the encoded form via the files it reads. A constraint expressed as English in a system prompt is exactly the worst case the paper measures.

This suggests the real lesson is less agents are fragile and more English constraints are the wrong substrate for long-running agents. The constraints that survive are the ones embedded in the artifacts the model touches: a base class it has to extend, an interface it has to implement, a test it has to pass, a hook that rejects its output. Everything else is wishful thinking dressed up as a system prompt.

The paper’s experiments also stop at around ten steps. Real backend work runs to hundreds. I would bet the decay curve flattens at some floor rather than going to zero, because some constraints are reinforced by the existing code (you cannot easily rename a column the agent just wrote queries against). The interesting question, which the paper does not answer, is which constraints have natural reinforcement in the artifact and which need scaffolding.

What I am changing in my own bots

I run Discord bots and a few small backend services where agents do a meaningful chunk of the work. After reading this, three things are worth pulling forward.

First, every non-functional constraint I care about gets a check. A style.json is worthless if the agent silently drifts away from it; a pre-commit hook that diffs against a schema and fails loudly is not. The cost of writing the check is usually less than the cost of one bad merge.

Second, I am more aggressive about decomposing into subagents with narrow briefs. The temptation to let one agent carry a long task is real because the context feels useful, but the constraint-decay numbers say that context is also poison. A fresh agent with three constraints will outperform a stale agent with thirty, even if the stale one knows more.

Third, I am rewriting my conventions docs to assume the model rereads them sporadically, not continuously. Short, declarative, positively phrased rules at the top. Examples of the target pattern, not the anti-pattern. The negation result alone is reason enough to flip every do not X into a prefer Y.

Constraint decay is not a new phenomenon, but having a name for it is useful, and having a paper that quantifies the slope is more useful still. The takeaway I land on is that LLM agents are not unreliable in some mystical sense; they are unreliable in a specific, measurable way that the harness can compensate for if you stop pretending the system prompt is durable memory.

Was this interesting?