· 7 min read ·

Claude Code Hooks: The Enforcement Layer That CLAUDE.md Can't Be

Source: martinfowler

The Martin Fowler piece on context engineering from February 2026 catalogs the expanding toolkit for configuring coding agents: CLAUDE.md files, MCP servers, dynamic retrieval, conversation compaction. What it touches on but does not develop is a distinction that becomes important once you have been running a coding agent on real tasks for a while. There is a difference between instructions the agent tries to follow and guarantees that hold regardless of what the agent decides to do. Claude Code’s hooks system is the mechanism for the latter.

Why CLAUDE.md Is Advisory, Not Mandatory

A CLAUDE.md file is a set of natural language instructions the model reads at session start and tries to follow. It conveys intent, context, and preference well. It is not a reliable policy enforcement mechanism.

The reasons are structural. Language models predict tokens based on context. They follow instructions because following instructions is what well-trained models do in those contexts, not because anything prevents deviation. A model told “always run the linter after editing a file” will usually run the linter, but in a long agentic task, after context compaction, in an edge case, or under an unusual prompt structure, it might not. The instruction is advisory in the same way that a comment in code is advisory: informative but not machine-enforced.

This is not a criticism of any particular model; it describes what natural language instructions fundamentally are. The same model that reliably follows a lint instruction in a short conversation may skip it at step 28 of a 40-step agentic task, when its attention is distributed across a different sub-goal. The “lost in the middle” research from Stanford and UC Berkeley gives the structural explanation: models have measurably worse recall for information positioned in the middle of a long context. As a session grows, earlier instructions drift toward the middle, and their effective weight decreases accordingly.

The Hook Architecture

Claude Code’s hooks are executables that fire in response to agent lifecycle events. There are four event types:

  • PreToolUse: fires before a tool call executes
  • PostToolUse: fires after a tool call completes
  • Notification: fires when the agent sends a notification message
  • Stop: fires when the session ends

Each hook receives structured JSON over stdin. For a PreToolUse event on a Bash tool call, the input looks like this:

{
  "session_id": "abc123",
  "hook_event_type": "PreToolUse",
  "tool_name": "Bash",
  "tool_input": {
    "command": "rm -rf ./dist",
    "description": "Clean build artifacts"
  }
}

A PreToolUse hook can block the tool call by exiting with a non-zero status, or pass additional context to the agent by writing to stdout before exiting zero. This is interposition: the hook sits between the model’s intention and the tool’s execution, and the model cannot bypass it by choosing different wording or reasoning differently about the task.

Here is a minimal command validation hook that blocks destructive patterns:

#!/usr/bin/env bash
input=$(cat)
tool=$(echo "$input" | jq -r '.tool_name')

if [ "$tool" != "Bash" ]; then
  exit 0
fi

command=$(echo "$input" | jq -r '.tool_input.command')

if echo "$command" | grep -qE 'rm\s+-rf\s+/[^.]'; then
  echo '{"decision":"block","reason":"refusing rm -rf on absolute paths outside working directory"}'
  exit 1
fi

exit 0

The hook can return JSON with a reason field. That reason gets injected into the agent’s context, so the agent understands why the action was blocked and can adapt rather than loop on the same blocked request.

PostToolUse for Automatic Context Injection

The PostToolUse hook enables a more important pattern for context engineering. Rather than waiting for the agent to remember to validate its own work, a PostToolUse hook fires after every file write and can check the output immediately, injecting results into context before the agent proceeds to the next step.

#!/usr/bin/env bash
input=$(cat)
tool=$(echo "$input" | jq -r '.tool_name')

if [ "$tool" != "Write" ] && [ "$tool" != "Edit" ]; then
  exit 0
fi

file=$(echo "$input" | jq -r '.tool_input.file_path // .tool_input.path // empty')

if [ -z "$file" ] || ! echo "$file" | grep -qE '\.(js|ts|jsx|tsx)$'; then
  exit 0
fi

lint_output=$(npx eslint "$file" --format compact 2>&1)

if [ $? -ne 0 ]; then
  echo "ESLint found issues in $file that need to be addressed:"
  echo "$lint_output"
fi

exit 0

Without this hook, the flow depends on the agent’s decision-making: write file, possibly run lint if it remains salient, continue. With the hook, the flow is deterministic: write file, lint automatically, lint output appears in context before the next step. The lint check is no longer an action the model chooses to take. It is a property of every file write.

This distinction matters at scale. A ten-step task with three file writes where the model consistently remembers to lint is fine. A forty-step task with fifteen file writes, some occurring while the model is mid-reasoning about an unrelated sub-problem, is where the difference between advisory and mandatory shows up in the actual output.

The Two-Tier Architecture

The productive framing is a two-tier system: CLAUDE.md provides context and guidance, expressing what you want and why. Hooks enforce the invariants that must hold regardless of what the agent decides in the moment.

This pattern appears throughout systems programming under different names. In web frameworks, middleware sits between the request and the handler, enforcing authentication and rate limiting regardless of what any individual handler implements. In version control, pre-commit hooks validate changes before they enter the repository, independent of whether the developer remembered the checklist. In operating systems, capability-based security uses hardware enforcement for boundaries that should not depend on software cooperation. The hook is always the same idea: policy that executes at a layer the component being constrained cannot reach.

For coding agents, the cases that hooks should cover typically fall into a few categories:

Post-edit validation: running the relevant linter, type checker, or test subset after file modifications. These feedback loops should not require the agent to remember to close them on every file write across a long session.

Command filtering: blocking categories of shell commands that carry unacceptable risk in the project context. The list differs per project, but common patterns include preventing writes outside the working directory, refusing to install dependencies without explicit confirmation, and blocking operations on production environment variables.

Resource protection: preventing writes to generated files, vendor directories, or migration files that should never be hand-edited. A hook that inspects the destination path and blocks writes to src/generated/ removes an entire class of mistakes regardless of whether CLAUDE.md mentions it.

Session documentation: writing a structured log when a session ends. A Stop hook that appends a brief record of what changed and why provides continuity across sessions without depending on the model to remember to document its own work.

Where Hooks Fit in the Broader Stack

The Fowler article frames context engineering as expansion: more ways to get the right information into the model’s context window at the right time. Hooks are the complementary operation, constraining what the model can do based on the policies that context has established.

Together, CLAUDE.md and hooks form a feedback architecture. The instruction file tells the agent what good behavior looks like. The hooks verify that output conforms to it, injecting corrective feedback when it does not. Lint errors, type check failures, and blocked command explanations all appear as context, and the agent adjusts subsequent behavior based on that feedback rather than operating on unchecked assumptions about whether the prior step succeeded.

MCP servers extend this architecture further. Where hooks operate at tool-call boundaries, MCP servers expand the space of tools the agent can call, including reads from live external systems. The resulting setup has three distinct layers: persistent instructions in CLAUDE.md, dynamic external context through MCP, and machine-enforced policy through hooks. Each layer operates in a space the others do not cover, and the composition is more robust than any single layer alone.

The Structural Insight

The hooks system applies the same principle that motivates input validation at system boundaries, TLS termination at the transport layer, and write-ahead logging in databases: enforcement mechanisms should operate at the level of the execution environment, not inside the component whose behavior they are constraining.

A language model is not a reliable policy enforcement component. It is a capable reasoning component that generally follows instructions and can be guided by good context. The failure mode is not that models actively ignore instructions; it is that the guarantee is probabilistic rather than categorical. Long autonomous tasks, context compaction, and the inherent variability of token prediction mean that “the model will usually do this” and “this will always happen” are not equivalent claims.

Hooks provide the categorical layer. Claude Code’s implementation is relatively early; the hook scripts are bare shell, there is no structured policy language, and observability into what hooks are doing requires your own logging. But the structural insight is correct, and it represents the part of context engineering that receives the least attention relative to its practical importance. As coding agents take on longer tasks with more tool integrations and higher autonomy, that gap between probabilistic and categorical enforcement will matter more than it does today.

Was this interesting?