GPT-5.4 Lands: A Million Tokens and Actual Computer Use

OpenAI just dropped GPT-5.4, and the headline numbers are hard to ignore: a million-token context window, computer use, tool search, and what they’re calling their best coding performance yet. For those of us who spend our days wiring up APIs and chasing down race conditions in async code, this is worth paying attention to.

The Context Window Is the Real Story

A million tokens is not a gimmick. That’s roughly 750,000 words, or the entire source of a medium-sized codebase plus documentation in a single prompt. I’ve hit context limits mid-refactor more times than I’d like to admit — you’re mid-conversation with a model that finally understands your architecture, and then it forgets the first file you showed it. GPT-5.4 makes that significantly less likely to happen on large projects.

For Discord bot development specifically, this opens up interesting possibilities for passing full conversation histories, guild configs, and skill definitions without chopping things up. Whether the model actually uses all that context effectively is a separate question — but having the headroom matters.

Computer Use Is Here, Quietly

The computer use capability is getting undersold in the announcement framing. The ability for a model to actually operate a UI — clicking, typing, navigating — is a genuine shift in what you can automate. It’s not new to the industry (Anthropic has had it for a while), but OpenAI bringing it into GPT-5.4 as a first-party feature with their infrastructure behind it means it’s going to show up in a lot more places fast.

The obvious concern is reliability. Computer use demos well but can be brittle in practice. I’d want to see how it handles unexpected UI states before building anything production-critical on top of it.

Coding Benchmarks and What They Actually Mean

OpenAI claims state-of-the-art coding performance. Benchmark claims are always worth being skeptical about — they’re optimized for, not generalized from. That said, the trend is real: these models are getting meaningfully better at the kind of code tasks that actually come up in daily work. Not just generating boilerplate, but understanding why something is broken, suggesting idiomatic fixes, and reasoning through multi-file dependencies.

If the coding improvements hold up in practice, the delta between using a model as a search engine versus using it as a genuine pair programmer continues to close.

Tool Search

This one is interesting for agentic use cases. Tool search implies the model can discover and select from available tools dynamically rather than relying on a fixed, pre-declared set. That’s a meaningful step toward agents that can compose workflows without a human pre-wiring everything.

For bot development, this could mean more flexible command routing and less hand-holding in skill definitions. Still early days for agentic reliability, but the direction is right.

The Efficiency Angle

OpenAI is also positioning this as an efficient model, not just a capable one. Frontier efficiency matters if you’re running high-volume workloads or building something that needs to stay within a cost budget. The best model in the world is useless if the API bill makes the product unviable.

Overall, GPT-5.4 looks like a serious release rather than an incremental bump. The million-token context and computer use are the features I’ll actually test first. The proof is always in the production run.