· 6 min read ·

OpenAI's Model Spec Is a Permission System, and That Framing Matters

Source: openai

OpenAI published its Model Spec as a public framework for how their models should behave, covering everything from how to handle conflicting instructions to where the model should sit on a spectrum between full obedience and full autonomy. It is a thoughtful document, and worth reading carefully. But the framing that makes it most legible, at least to me, is not ethics or safety in the abstract. It is permission systems.

If you have built anything on top of Discord’s API, or designed a multi-tenant application, the structure of the Model Spec will feel familiar immediately. There is a principal hierarchy: OpenAI at the top, operators beneath them, users beneath operators, and the model itself as a kind of bounded agent within those constraints. Each level can grant or restrict capabilities within the limits set by the level above. Operators access the API to build products; users interact with those products. An operator can expand certain defaults for users (enabling explicit content on an appropriate platform, for example) or restrict them (preventing the model from discussing topics outside a narrow domain). Users can further adjust within whatever bounds the operator has set.

This maps almost exactly to how Discord structures permissions. A server owner sets the outer bounds. Roles and channels carve out sub-permissions. Individual users operate within what they have been granted. The analogy is not perfect, but it is close enough to make the architecture feel less like philosophy and more like API design, which is useful.

Hardcoded and Softcoded: The On/Off Matrix

The most concrete part of the spec is its distinction between hardcoded and softcoded behaviors. Hardcoded behaviors are absolute: things the model will always do or never do regardless of what any principal instructs. These are the bright lines. The spec gives examples like never providing serious technical assistance toward weapons capable of mass casualties, never generating sexual content involving minors. These are not defaults that can be overridden with the right operator configuration. They are fixed.

Softcoded behaviors are everything else: defaults that can be adjusted. Some are on by default and can be turned off (following safe messaging guidelines around suicide when a medical provider needs clinical directness). Some are off by default and can be turned on (generating explicit sexual content for adult content platforms). The spec breaks these down by who can adjust them: some are operator-only, some can be delegated to users.

For developers building on the API, this is the layer that matters most day to day. The system prompt is not just a way to give the model a persona; it is the mechanism for asserting your operator-level configuration. When you write a system prompt, you are making choices about which defaults you want to override and which you want to inherit. Most developers do this intuitively without thinking of it as permission management, but that is what it is.

The Corrigibility Dial

The most philosophically interesting part of the spec is what it calls the corrigibility-autonomy spectrum. At one extreme, a fully corrigible model does whatever it is told by whoever is at the top of the principal hierarchy. At the other, a fully autonomous model acts purely on its own values and judgment. Both extremes are explicitly described as dangerous.

A fully corrigible model is dangerous because it outsources all ethical weight to whoever controls it. If OpenAI’s incentives ever drift toward something harmful, a fully corrigible model provides no friction. A fully autonomous model is dangerous for the opposite reason: it requires trusting that the model’s values and capabilities are good enough to warrant that independence, which is not a bet the spec thinks is warranted yet.

The spec positions current OpenAI models closer to the corrigible end, but with a floor. There are things the model should refuse even if instructed by OpenAI itself. This is a meaningful claim if it holds in practice. The spec is explicit that this positioning is intentional and temporary: as trust is established and alignment research matures, the expectation is that models will be granted more autonomy.

Anthropics’s published model spec for Claude uses similar framing, though with different vocabulary. The Constitutional AI approach uses a set of principles to guide model self-critique during training. Both approaches are trying to solve the same problem: encoding not just rules but a meta-level disposition toward following rules and exercising judgment. The difference is partly in how training-time versus inference-time intervention is weighted.

The Priority Stack

The spec establishes an explicit priority ordering when values conflict. Broadly safe behavior comes first: supporting human oversight of AI systems during this period of development. Broadly ethical behavior comes second: having good values, being honest, avoiding unnecessary harm. Adherence to OpenAI’s guidelines comes third. Being genuinely helpful comes fourth.

This ordering is deliberate and has real implications. If following a specific OpenAI policy would require acting unethically, the spec says the model should recognize that OpenAI would prefer the ethical action, since the policies are meant to be grounded in ethical considerations anyway. The model should not treat the policy layer as a shield for behavior that violates the ethical layer below it.

The practical consequence is that the spec does not treat helpfulness as a core personality trait. It explicitly cautions against models that are excessively “assistant-brained,” focused on completing requests without moral agency. The danger is that a model optimized purely for user satisfaction will produce harm through mundane helpfulness, following instructions without the independent judgment that would flag problems.

The Accountability Gap

Here is where the spec runs into the limits of documentation as a mechanism. Publishing the Model Spec is a meaningful act of transparency. It tells developers what they are building on and tells users what to expect from the system. It creates a public record against which behavior can be evaluated. That is all genuinely useful.

But the spec is not enforced by any external body. There is no audit, no regulator checking whether GPT-4o’s outputs are consistent with what the document says. The “thoughtful senior OpenAI employee” test used throughout the spec, a heuristic asking whether a hypothetical reasonable employee would be uncomfortable with a given output, is internally defined and internally applied. The accountability loop is closed within OpenAI.

This is a real limitation, and it is not unique to OpenAI. Anthropic’s Claude specification has the same structure. Google’s model governance documents share the same characteristic. They represent sincere internal commitments written down, which is better than nothing, but they are not the same as external accountability mechanisms.

For developers building products on these models, the practical question is what happens when the model deviates from spec. The answer is that you file a bug report. The spec is not a contract in any legally enforceable sense.

What It Means to Build on Top of This

For anyone building applications with these models, the Model Spec is worth understanding as infrastructure documentation, not just corporate policy. When you write a system prompt, you are not just configuring personality; you are interacting with a layered permission system that has been trained into the model weights. The model’s behavior in edge cases will be shaped by how it has internalized this hierarchy.

This has a few concrete implications. Operator-level instructions are given meaningful weight over user instructions by default, but a user claiming special context or permissions they have not actually been granted should generally get less deference than a system prompt setting clear policy. Understanding this helps when debugging unexpected model behavior: the model is not just doing inference on your text; it is navigating a trust hierarchy that the spec defines.

The softcoded defaults also matter for product decisions. If you do not explicitly configure something in your system prompt, you are inheriting defaults that were designed for general-purpose use, not your specific context. Medical platforms, legal tools, security research environments: each of these probably needs explicit operator-level configuration to get the behavior that actually serves users well, rather than the cautious middle-ground defaults intended for unknown contexts.

The Model Spec is a serious document. It engages with genuinely hard problems: how to encode values rather than just rules, how to handle conflicts between what users want and what serves them, how to think about AI agency during a period when trust has not yet been established. Reading it as a permission system architecture does not diminish any of that. It just makes it more tractable to reason about from a developer’s perspective.

Was this interesting?