OpenAI Codex Grows Up: What Computer Use, Memory, and Plugins Mean for Developer Workflows
Source: openai
OpenAI’s Codex has had several lives. It started in 2021 as a fine-tuned descendant of GPT-3, trained on roughly 54 million public GitHub repositories and released via API to developers building code-completion products. GitHub Copilot was its most visible consumer. Then came the Codex CLI in 2025, an open-source terminal agent that could read your codebase, plan changes, and execute shell commands with your approval. Now there’s a native desktop application for macOS and Windows that adds computer use, in-app browsing, image generation, memory, and a plugin system.
Each of those additions is individually significant. Together they represent a meaningful architectural shift in what a coding assistant is expected to be.
Computer Use Changes the Unit of Work
Computer use, in the context of AI systems, means the model can observe the screen, move the cursor, click buttons, type into fields, and interact with applications directly rather than just producing text that a human must then act on. Anthropic shipped this capability in Claude in late 2024. OpenAI followed with similar functionality in its Operator product. The Codex app’s inclusion of computer use brings it into the desktop application context, which is where most developers actually spend their time.
For coding workflows specifically, computer use matters because many of the friction points in software development are not about writing code. They are about navigating between code, test output, documentation, issue trackers, CI dashboards, and browser-based tooling. A model that can only emit text still requires the developer to be the connective tissue between all those surfaces. A model with computer use can close that loop.
Consider a common scenario: you are chasing a failing test. With a text-only assistant, you copy the stack trace, paste it into the chat, get a hypothesis, try the fix, copy the new output, paste again. With computer use, the agent can observe the test runner output directly, propose a fix, apply it to the file, re-run the tests, and report back, without any manual copy-paste. The developer supervises rather than relays.
This is not a small UX improvement. It changes what kinds of tasks are worth delegating.
In-App Browsing as a Documentation Layer
Documentation lookup is one of the most common interruptions in a development session. You hit an unfamiliar API, wonder about a flag in a library you haven’t used in six months, or need to check whether a package supports the version of Node you’re on. Every one of those lookups is a context switch.
In-app browsing lets the Codex agent perform that lookup as part of the task rather than requiring the developer to do it separately and feed the result back. This is more than a convenience feature. It means the agent can work with current information rather than relying entirely on its training data, which has a cutoff date and may not reflect the latest library versions, deprecation notices, or changelog entries.
The combination of computer use and browsing is particularly interesting. An agent that can observe the browser, navigate to a documentation page, and extract relevant information is doing something qualitatively different from an agent that answers questions based on memorized training data. It can verify. It can check. It can follow a link that a Stack Overflow answer referenced.
Memory Across Sessions
The memory feature addresses one of the persistent frustrations with AI coding assistants: they forget everything when the session ends. Every new conversation starts cold. You re-explain your project structure, your conventions, your preferences, why you made certain architectural decisions six months ago.
OpenAI has been iterating on memory in ChatGPT since early 2024, allowing the model to retain facts across conversations. Bringing that capability into Codex means the agent can accumulate knowledge about a specific codebase over time. It can remember that your project uses a custom error type, that you prefer explicit over implicit returns, that the db package has a known quirk around transaction rollbacks that you’ve worked around in a specific way.
This is distinct from in-context memory, where you might load a CLAUDE.md or a AGENTS.md file to give the agent project-specific context at the start of each session. File-based context loading is a good pattern and has been adopted widely, but it requires the developer to maintain those files. Automatic memory that persists from actual interactions is lower friction.
The open question is what Codex actually retains and how it surfaces or edits those memories. Memory systems for AI assistants involve real tradeoffs: stale memories can mislead, over-eager retention can surface irrelevant noise, and opaque memory stores make debugging confusing behavior difficult. How well the Codex memory system handles those tradeoffs will determine whether developers find it useful or frustrating.
Plugins and the Ecosystem Question
The plugin system is the feature with the longest potential tail. OpenAI launched ChatGPT plugins in early 2023, then deprecated the original system in favor of GPTs and the broader tool-calling API patterns that have become standard. A plugin system in Codex would allow third-party tools to integrate directly into the coding agent workflow.
For developers, the obvious candidates are the tools already central to the development loop: GitHub for repository and pull request management, Jira or Linear for issue tracking, CI systems like GitHub Actions or CircleCI, observability platforms, and internal developer portals. An agent that can file a pull request, link it to a ticket, watch the CI run, and surface failures without the developer leaving the Codex interface would collapse a significant amount of workflow friction.
The architecture of that plugin system matters. If Codex plugins are built on OpenAI’s standard function calling and tool use patterns, then building integrations will be straightforward for developers already familiar with the API. If there’s a proprietary plugin format, the ecosystem will grow more slowly and depend more heavily on OpenAI’s own first-party integrations.
How This Fits Against the Current Landscape
The competitive space here is crowded and moving fast. Cursor built a substantial following by integrating AI deeply into a VS Code fork, with context-aware completion and an agent mode that can edit multiple files. Windsurf (formerly Codeium) followed a similar path. GitHub Copilot has been expanding from inline completion toward Copilot Workspace, which supports multi-step agentic tasks within the browser. Claude Code from Anthropic operates in the terminal with deep filesystem and shell access.
What distinguishes the Codex app’s approach is the bundling of modalities. Most of the competitors focus on code editing as the primary surface and treat browsing, image generation, and memory as secondary concerns or integration points. Codex is explicitly positioning itself as a broader development environment where those capabilities are first-class.
Image generation is the most unusual inclusion. For most backend or systems work, it’s beside the point. But for developers building interfaces, working on design systems, or producing documentation with diagrams, having image generation in the same tool is genuinely useful. A developer prototyping a UI could generate a rough mockup, use it as a reference, and implement against it without leaving the agent environment.
The Tension in ‘Almost Everything’
The hedged phrasing in the product name is worth taking seriously. ‘Almost everything’ implies scope without claiming completeness, and that framing is appropriate for where these tools actually are.
Computer use remains imperfect. Models making direct OS interactions introduce new categories of failure that are harder to anticipate and harder to reverse than text output. Memory systems can surface incorrect or outdated information with the same confidence as accurate information. Plugins introduce integration surface area and, by extension, new failure modes. In-app browsing adds latency to tasks that were previously instant.
None of that makes the features bad, but it does mean the developer still needs to be in the loop, reviewing what the agent is doing rather than simply delegating entire task chains. The Codex CLI from 2025 already operated this way, with approval prompts before destructive shell operations. The desktop application’s approach to human oversight will matter as much as the raw capability.
Developers who have been using AI coding tools long enough have generally learned to treat them as fast, occasionally wrong collaborators rather than infallible automation. The expanded capabilities in the updated Codex app make that collaborator more capable across more surfaces. Whether they make it more reliable is a separate question, and one that will take time and real use to answer.