· 6 min read ·

The Agent Library You Build for Codex Is Infrastructure, Not Configuration

Source: simonwillison

Simon Willison noted the addition of subagent support and custom agent definitions to Codex, OpenAI’s open-source CLI tool. The mechanics are covered well at this point: an orchestrating model can spawn specialist agents with isolated context windows, and you can define your own agents with custom instructions, tool access, and input schemas. The part that gets less attention is what it means to maintain a set of custom agents over the lifetime of a project as the codebase evolves and the team’s conventions shift.

Custom agents in Codex are not throwaway configuration. They are encoded knowledge about your codebase, expressed in natural language, that guides how an AI assistant approaches domain-specific tasks. A migration-writer agent encodes your team’s conventions for database migrations. A security-reviewer agent encodes your security posture and the specific risks relevant to your stack. A docs-sync agent encodes the relationship between your code structure and your documentation hierarchy.

That encoded knowledge has a shelf life, and when it goes stale, it fails quietly.

What a custom agent actually encodes

A custom agent definition in Codex consists of several components, but the ones that matter for maintenance are the instructions (what the agent is supposed to do and how) and the description (the text the orchestrating model reads when deciding whether to invoke it). Both encode assumptions about the codebase.

Consider a concrete example. You are working on a backend service using PostgreSQL, and you define a migration-writer agent:

{
  "name": "migration-writer",
  "description": "Generates and validates SQL migration scripts. Use when PostgreSQL schema changes are required.",
  "instructions": "Generate up/down migration files following Flyway naming conventions: V{version}__{description}.sql. All migrations should be idempotent. Use CREATE TABLE IF NOT EXISTS and ALTER TABLE ADD COLUMN IF NOT EXISTS. Run psql --check to validate before returning.",
  "capabilities": ["file-read", "file-write", "shell-exec"],
  "input_schema": {
    "task": "string",
    "current_schema": "string"
  }
}

Six months later, the project migrates to Liquibase. The instructions still reference Flyway conventions. The validation command is wrong. The orchestrating model still routes migration tasks to this agent because the description remains accurate, but the agent generates Flyway-style files in a Liquibase project and the validation step either fails silently or produces misleading results.

This is not a dramatic failure. The agent still runs. It might even produce plausible-looking output. The failure is that the specialized knowledge the agent was supposed to embody has drifted from reality, and there is no obvious signal that this happened.

The sync problem

Every codebase has conventions that are partly implicit: naming patterns, migration tooling, test organization, deployment assumptions. Custom agents turn those conventions into explicit, runnable specifications. That is one of their best properties. It is also what makes them require maintenance.

The sync problem is specific: when a convention changes in the code, the corresponding agent definition needs to change too. But agent definitions are typically not co-located with the code they serve. A migration-writer agent definition lives in some config directory, not next to the migration files. A code reviewer that knows your security conventions is not in the same file as your authentication middleware.

Co-locating agent definitions with the code they serve is one mitigation. If agent definitions live in version control alongside the codebase, they are at least visible during code review. A pull request that changes the migration framework can include a corresponding update to the migration-writer definition. This is not a guarantee that updates happen, but it makes the connection between code changes and agent definition changes explicit rather than depending on someone remembering to check a separate config.

The deeper question is ownership. On a small team, this is the same kind of ownership question as keeping runbooks up to date: it gets neglected unless someone explicitly treats it as infrastructure responsibility rather than documentation hygiene.

Writing instructions that age well

Some parts of an agent definition go stale faster than others. Instructions that reference specific tool names, file paths, version numbers, or external service details are high-maintenance. Instructions that describe intent and general approach are more durable.

A fragile instruction:

Run `flyway migrate --check` to validate the migration file before returning.

A more durable instruction:

Validate the migration file is syntactically correct and follows the project's existing migration conventions before returning. Check the migrations directory for examples of the expected format and naming pattern.

The second version tells the agent to look at the codebase for guidance rather than encoding a specific command. It degrades gracefully when conventions change because the agent examines the actual migrations and adapts to whatever format it finds. The cost is slightly less precision at definition time. The benefit is an agent that stays useful across framework migrations, tooling upgrades, and convention changes.

The same principle applies to description text. A description that encodes specific implementation details (“Use when Flyway SQL migrations need to be generated”) is narrower and more fragile than one that describes the problem domain (“Use when database schema changes need to be expressed as versioned migration scripts”).

The OpenAI Agents SDK tracing facilities can surface when routing has gone wrong. If your migration-writer is being invoked for tasks it should not handle, or is being skipped for tasks it should handle, the span data makes that observable. Treating invocation patterns as a monitoring signal, rather than an implementation detail, turns agent maintenance from reactive to proactive.

from agents import Runner, trace

with trace(workflow_name="schema-update") as t:
    result = Runner.run_sync(orchestrator, "Add audit columns to the orders table")

for span in t.spans:
    print(span.agent_name, span.input[:120])

If that output shows the migration-writer being skipped in favor of a general-purpose agent, the description is probably too narrow. If it shows invocations for tasks that have nothing to do with migrations, it is too broad.

What the open-source angle enables

Codex is open-source, which has an interesting consequence for custom agents: you can ship agent definitions with a project and allow any contributor to benefit from them. A well-maintained open-source project could include a set of custom agent definitions that encode the project’s conventions for contribution: how to write tests, how to format commit messages, what the security considerations are for the specific domain.

A new contributor cloning the repository gets a Codex setup that already knows the project. They do not have to discover that unit tests go in one directory and integration tests go in another. They do not have to learn the PR title format from a contributing guide they may not have read. The agent definitions encode that institutional knowledge and apply it during actual coding sessions.

This shifts custom agents from a power-user feature to a contributor experience feature. The team’s most experienced members can encode their understanding of the codebase into agent definitions that help everyone work consistently with the project’s conventions. The knowledge becomes operational rather than just documented.

The maintenance cost does not go away. But for projects where contributor onboarding is a significant overhead, maintained agent definitions are qualitatively different from documentation: the documentation tells contributors what to do, and the agents help them do it.

The comparison with fixed agent types

Claude Code ships with a fixed set of built-in agent types, Explore, Plan, and general-purpose, defined and maintained by Anthropic. Users invoke these types but do not define new ones. The capabilities are predetermined, and the maintenance responsibility stays with Anthropic.

Codex’s custom agents push that responsibility to teams. You get flexibility, project-specific specialization, and the ability to encode domain knowledge that a general-purpose tool could not anticipate. You also take on the maintenance burden that comes with it.

For most projects, the right initial approach is probably fewer agents defined more broadly, rather than many narrow agents each requiring independent upkeep. A single well-maintained code-reviewer that knows your security and style conventions will serve better over time than six narrowly defined reviewers each going stale independently. Specialization can be achieved through instructions within a single agent rather than through agent proliferation.

As the project matures and the patterns become clearer, specific agents for specific workflows make more sense. A migration-writer and a test-writer emerge naturally from repeated use and observable friction in the general case, rather than being designed upfront before you know which workflows will actually benefit from delegation.

Custom agents are a genuinely useful capability. The operational discipline they require is just less visible in the announcement than the feature itself, and teams that treat them as one-time configuration rather than ongoing infrastructure tend to discover this the hard way.

Was this interesting?