OpenAI announced this week that it is launching Codex Labs and partnering with Accenture, PwC, Infosys, and other major IT services firms to deploy Codex across enterprise software development lifecycles. The milestone number attached to the announcement is 4 million weekly active users.
The product milestone matters. But the partner list is the more interesting story.
What Codex Actually Is Now
The name Codex has a history worth unpacking. The original OpenAI Codex model, released in August 2021, was a GPT-3 derivative fine-tuned on billions of lines of public code. It is what powered the early versions of GitHub Copilot, and it worked primarily as an autocomplete engine: you write a comment or a partial function signature, the model fills in the rest, you review and accept.
The Codex that OpenAI is now scaling to enterprises is a fundamentally different thing. Launched in mid-2025, it is an agentic system built on codex-1, a variant of the o3 reasoning model optimized for software engineering tasks. It does not wait for you to position your cursor. You give it a task description, it reads your repository, writes code, runs tests, and opens a pull request, all inside an isolated cloud sandbox while you work on something else.
The architectural shift is significant. When Copilot suggests a line, a human reviews it before it lands in the codebase. Codex operating as an agent produces a diff you review after the fact. The trust model is inverted. You are reviewing output rather than guiding generation, and the delta in judgment required from the human in the loop is substantial.
The cloud sandbox design is deliberate: Codex instances run with no outbound internet access by default, with filesystem access scoped to the repository you provide. This addresses one of the core enterprise concerns around agentic tools, which is that an agent with broad access can cause damage that a suggestion tool cannot. A contained environment with a pull request as the only output artifact is a reasonable first approximation of a safe blast radius.
The 4M WAU Number in Context
Four million weekly active users is not a small number. For comparison, GitHub Copilot reported around 1.3 million paid subscribers in early 2023, roughly two years after its launch, and had grown to somewhere north of 1.8 million by end of that year. Copilot is now embedded across IDEs used by tens of millions of developers globally, so direct comparisons are not clean, but 4M WAU for Codex, accessed primarily through ChatGPT’s interface and the API, in its first year of availability, is a credible signal of developer appetite for agentic coding tools.
The metric also reflects something about how people are actually using Codex. WAU captures weekly engagement rather than installed base. Developers are returning to it repeatedly, which suggests the tool is doing work they find worth repeating, not just satisfying curiosity.
The Partner List Is the Strategy
Accenture, PwC, and Infosys are not software companies in the product sense. They are IT services firms. Together they employ something in the range of 800,000 to 900,000 people, a large fraction of whom are software engineers working on delivery projects for enterprise clients.
Those firms have an obvious concern with AI coding tools: if the tools are good enough, fewer human engineers are needed to deliver the same output. The straightforward framing of this is that OpenAI is entering a market by partnering with incumbents whose business model it threatens. That framing is mostly right, but it undersells the complexity of how this actually plays out.
Large enterprises do not buy SaaS tools the way individual developers do. They have procurement processes, security reviews, vendor risk assessments, compliance requirements, and multi-year contracts. The path from “this model is impressive in a demo” to “this is running on our codebase in production” passes through legal, InfoSec, and probably a QBR with a VP. Accenture, Infosys, and PwC have existing seats at those tables. OpenAI does not, yet.
For the IT services firms, the calculus is different. The risk is that if they do not integrate Codex into their delivery model, a competitor will, and will bid projects at lower cost by using the productivity gains. The safer bet is to be the firm that deploys Codex well and captures margin from doing so, rather than the firm that ignored it and lost contracts. Co-opting potential disruption into a service offering is a pattern these firms have run before.
Codex Labs is the formal program enabling this. It gives enterprises a structured path to deployment, with presumably dedicated support, integration tooling, and contract terms that address IP concerns.
What Enterprise Deployment Actually Requires
Deploying an agentic coding tool across a software development lifecycle is harder than it sounds because the SDLC is not one thing. There is code writing, but also code review, testing, documentation, dependency management, incident investigation, and refactoring. Codex as described is currently strongest at the code writing and testing phases. The connective tissue between those phases and the rest of how enterprises ship software involves JIRA tickets, GitHub Actions, deployment pipelines, and a lot of institutional knowledge about which parts of the codebase are load-bearing and which are safe to touch.
The SWE-bench benchmark gives a useful reference point. Codex-1 reportedly scores well on verified SWE-bench tasks, which measure the ability to resolve GitHub issues against real-world open source repositories. SWE-bench verified is harder than the original benchmark because it uses human-validated test suites to confirm that a proposed fix is actually correct, not just that it passes the existing tests. Scoring well there is meaningful, because the task structure mirrors what enterprise deployment actually requires: read a codebase, understand a problem description, produce a fix that works.
But SWE-bench measures individual task completion. Enterprise deployment introduces questions that the benchmark does not. How does the agent handle a task that requires understanding an internal naming convention documented in a Confluence page it cannot access? What happens when the fix it produces is technically correct but violates an architectural principle the team has not written down? Who is responsible for a security regression introduced by AI-generated code that passed CI?
These are not arguments against deployment. They are the integration problems that Accenture and Infosys are presumably being paid to solve.
The Audit Trail Problem
One underappreciated requirement for enterprise AI coding tools is auditability. In regulated industries, software changes need to be traceable: who made this change, when, and why. When the author is an AI agent, the traceability question becomes more complex. Pull requests opened by Codex will have a clear attribution in the git history, but the decision chain that led to the task being assigned, the reasoning the model used to produce the specific implementation, and the verification steps it took are all ephemeral unless the system explicitly surfaces and stores them.
OpenAI’s Codex interface does show the agent’s reasoning during task execution. Whether enterprises can export and retain that reasoning in a form that satisfies compliance requirements is a practical question that will shape how broadly Codex can be deployed in finance, healthcare, and similar verticals.
A Tool That Works vs. Infrastructure That Ships
The original Codex model in 2021 was a capability demonstration. GitHub Copilot turned it into a product. What OpenAI is now attempting is the next step: turning Codex into infrastructure, something that enterprises embed into their delivery processes rather than something individual developers opt into.
That transition is harder than building the capability. It requires sales motion, support organizations, integration work, SLAs, and trust built over time through demonstrated reliability. The Accenture and Infosys partnerships are a bet that partnering with firms that already have those relationships is faster than building them from scratch.
Whether the 4M WAU figure grows into the kind of enterprise penetration OpenAI is targeting depends less on model capability at this point and more on whether the integration story holds up against the full complexity of real development organizations. The model is good enough to do the work. The question is whether the surrounding infrastructure is good enough to be trusted with it.