· 6 min read ·

The Enforcement Layer: Why Declarative Context Instructions Fall Short

Source: martinfowler

Looking back at the Martin Fowler article from February 2026 on context engineering for coding agents, it functions well as a taxonomy of options for enriching what a coding agent knows. Reading it a few months on, the more practically significant story seems to be about constraint rather than expansion.

The Structural Limitation of Declarative Instructions

CLAUDE.md is genuinely useful. Dropped at the root of a project, it tells the agent about build commands, naming conventions, libraries to prefer or avoid, directories it shouldn’t touch. Claude Code reads it at session start and treats its contents as authoritative. Cursor’s .cursorrules, GitHub Copilot’s .github/copilot-instructions.md, Aider’s configuration layer: all of these serve the same function, encoding standing instructions into the context window before any conversation begins.

The limitation is not that these files don’t work. They do. The limitation is that they work through natural language, and natural language has edge cases that no amount of careful writing eliminates. “Do not modify files in src/generated/” reads as clear instruction. It will be followed correctly in most situations. In a long session where the agent has accumulated significant context, or when following one instruction seems to conflict with another, or when the edge case looks genuinely ambiguous, the instruction can fail. A well-phrased prohibition in a markdown file is not a hard constraint; it is a strong suggestion with occasional exceptions.

The consequences are practical. The promise of an autonomous coding agent is consistent behavior across an entire session, without constant supervision. An instruction that holds in routine cases but breaks under pressure is the kind of inconsistency that makes agents unreliable for production work. This is also not a solvable problem by writing better instructions. The research on LLM reliability demonstrates that models perform measurably worse on information in the middle of long contexts, meaning that a constraint stated clearly at session start may lose effective weight by the fifteenth tool call.

Hooks as the Programmatic Answer

Claude Code’s hooks system addresses this directly. Hooks fire at specific lifecycle events in an agent session: PreToolUse (before any tool call), PostToolUse (after any tool call), Notification (when the agent sends a message), and Stop (when a session ends). They receive structured JSON via stdin describing what happened, and they can return output or exit with a non-zero code to block the tool call.

The critical difference from CLAUDE.md instructions is that hooks are code: they execute deterministically against structured input and cannot misinterpret an edge case.

Consider a representative example. The CLAUDE.md instruction “run ESLint after every file change” works most of the time. A PostToolUse hook that fires after every write to a .ts or .js file, runs ESLint on the written file, and injects the results back into context enforces this completely. The agent sees lint output as a tool result, which it treats with higher fidelity than a standing instruction from a configuration file.

#!/bin/bash
# PostToolUse hook: runs eslint after any JS/TS file write
# Receives tool call info as JSON on stdin

INPUT=$(cat)
TOOL=$(echo "$INPUT" | jq -r '.tool')
FILE=$(echo "$INPUT" | jq -r '.args.file_path // empty')

if [[ "$TOOL" == "Write" ]] && [[ "$FILE" =~ \.(ts|js|tsx|jsx)$ ]]; then
  RESULT=$(npx eslint "$FILE" --format compact 2>&1)
  if [[ -n "$RESULT" ]]; then
    echo "$RESULT"
    exit 1
  fi
fi

The hook receives the write event, runs the linter synchronously, and if there are errors, returns them as output while exiting non-zero. Claude Code treats this as an error from the tool call itself, so the lint feedback enters context automatically, at exactly the right moment, without relying on the model to remember an instruction.

The PreToolUse hook enables enforcement in the other direction: blocking actions before they execute. An allowlist of safe shell commands, validated before any execution happens, replaces the natural language instruction “only run test commands” with a hard gate. The instruction can be forgotten or overridden by plausible-sounding reasoning mid-conversation; the hook cannot.

#!/bin/bash
# PreToolUse hook: restricts shell execution to an approved set

INPUT=$(cat)
TOOL=$(echo "$INPUT" | jq -r '.tool')
CMD=$(echo "$INPUT" | jq -r '.args.command // empty')

if [[ "$TOOL" == "Bash" ]]; then
  SAFE_PATTERNS=("^npm test" "^pnpm test" "^just test" "^cargo test" "^go test")
  for PATTERN in "${SAFE_PATTERNS[@]}"; do
    [[ "$CMD" =~ $PATTERN ]] && exit 0
  done
  echo "Command not in approved list: $CMD"
  exit 1
fi

Hooks are configured in .claude/settings.json alongside other project settings, which means they live in version control, get reviewed in PRs, and can be scoped per project.

The Allowed Tools List as Negative Space

A related and underused lever is restricting which tools the agent can call at all. Claude Code’s project settings expose an allowedTools configuration that, scoped to a specific task, reduces the agent’s surface area significantly. A read-only audit workflow has no legitimate use for Write or Bash. Removing those tools means the agent cannot call them regardless of what any part of the conversation instructs.

{
  "allowedTools": ["Read", "Glob", "Grep"],
  "disallowedTools": ["Write", "Bash", "Edit"]
}

This is context engineering moving in an unusual direction: rather than enriching what the model sees, you narrow what it can do. For tightly defined tasks, this predictability is more valuable than flexibility. A bug investigation workflow that can only read files and search output is faster to reason about, easier to audit, and impossible to misconfigure into making unintended changes.

The pattern generalizes. Claude Code’s non-interactive CLI mode (invoked with claude -p "prompt" file.js) combined with a restricted tool set enables scripting the agent as a pipeline component: read code, analyze it, produce structured output, done. The agent cannot wander into unrelated files or run cleanup scripts because the tool set makes wandering impossible.

MCP and the Expanding Surface

The Model Context Protocol represents the most significant recent expansion of the context engineering surface. MCP servers expose tools, resources, and prompt templates to any compatible client. An agent fixing a bug can query the GitHub issue describing it, the CI run that caught it, and the live database schema the affected code operates on, all within the same tool-call framework it uses to read source files. The standard defines three primitives: tools (callable functions), resources (readable data), and prompts (reusable templates).

A typical MCP configuration in .claude/settings.json looks like:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_TOKEN": "ghp_..." }
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/mydb"]
    }
  }
}

The tradeoff is that a larger context surface is also a larger attack surface. Prompt injection through tool results is a concrete risk when agents consume content from external systems. A GitHub issue with a crafted body, a database record with embedded instructions, a documentation page that has been quietly modified: all of these become potential vectors when an agent reads them without treating external content as untrusted. The risk is not theoretical; security researchers have demonstrated injection attacks that redirect agent behavior by embedding instructions in files the agent is asked to summarize.

MCP expansion and hook-based constraint are therefore complementary, not competing. As MCP servers broaden what the agent can access at runtime, hooks and tool restrictions need to enforce what it can do with that information. A PreToolUse hook that validates arguments before write operations, combined with an explicit allowlist of MCP tools the workflow is permitted to call, creates a meaningful defense-in-depth posture even when the agent is consuming untrusted external content.

Context Engineering as Infrastructure

The discipline the Martin Fowler article frames as “becoming a necessity” looks, from several months on, more like a permanent operational concern. Teams working with coding agents at any scale have begun treating the configuration environment with the same seriousness they give CI pipelines: CLAUDE.md in version control with normal PR review, hook scripts maintained alongside build tooling, allowed tool sets defined per workflow rather than improvised per session.

The layering has become clearer with practice. Static files encode standing knowledge: conventions, prohibited patterns, build commands, the information that is always relevant and expensive to re-derive. Dynamic retrieval through tool calls and MCP servers expands what the agent can access at runtime. Hooks and tool restrictions enforce constraints that natural language cannot reliably maintain.

Each layer addresses what the previous one cannot. CLAUDE.md instructs; hooks enforce; tool restrictions eliminate entire failure modes from consideration. Treating only the first layer as serious context engineering, while leaving the others to chance, produces agents that perform well under observation and fail unpredictably in production. That is the same failure mode that motivated CI pipelines in the first place, and the solution has the same shape: automate the enforcement rather than relying on consistent human vigilance.

Was this interesting?