Chatbot No Longer

OpenAI throws down the gauntlet to Anthropic

Mar 18, 2026

Between March 3 and March 10, OpenAI released GPT-5.4, ChatGPT for Excel, the Codex desktop app, Codex Security, write actions for Google and Microsoft apps, and interactive visual learning modules. Six product launches in eight days is unusual even by the current pace of the industry, and the coverage has understandably treated each as a separate story.

Taken together, though, the releases share a structural trait that individual coverage tends to obscure. In each case, ChatGPT is no longer generating text that requires a human to carry out the next step. The model is operating inside another application, building formulas in a workbook, validating vulnerabilities in a sandboxed codebase, drafting an email in Outlook, updating a graph as a student adjusts a variable. The output is not a message but a change in the state of the application itself.

The common thread between all of these releases suggests that OpenAI is rebuilding ChatGPT as an execution layer, a system that takes action rather than simply describing what should be done. The ambition is legible in the underlying model, in the productivity and developer tools built on top of it, and in the education features that extend the same logic to the free tier.

The model foundation

On March 5, OpenAI released GPT-5.4 across ChatGPT, Codex, and its API. The model is the first general-purpose release from the company with native computer-use capabilities, being able to see screens, move cursors, click interface elements, and type within desktop applications. Previous models could generate code that interacted with software programmatically, but GPT-5.4 can also issue mouse and keyboard commands in response to screenshots, operating applications as a human would. The model scored 75% on OSWorld-Verified, a benchmark for autonomous computer interaction, beating the 72.4% that human testers typically achieve. Everything OpenAI subsequently shipped is built on these new capabilities.

The second enabling feature is tool search. Until GPT-5.4, developers building on OpenAI’s API had to prepare a detailed list of every tool that their application could use and include that list in each request. GPT-5.4 introduces a search mechanism that allows the model to discover relevant tools on its own, reducing prompt sizes and inference costs in the process. The practical consequence is that an agent built on GPT-5.4 can navigate an unfamiliar software environment without having been told in advance what is available to it. Combined with computer use, this means that the model can both find the right tool and operate it, a pairing that closes the gap between receiving a task and completing it without human intermediation.

These capabilities arrive alongside efficiency gains that make sustained execution sessions economically viable. OpenAI reports that GPT-5.4 uses 47% fewer tokens on some tasks than its predecessors and supports up to one million tokens of context in Codex and the API. The model is also more factual, producing individual claims that are 33% less likely to be false and full responses that are 18% less likely to contain errors compared to GPT-5.2. The individual gains are incremental, but they compound. A model that operates inside applications for extended periods accumulates cost with every action and compounds harm with every hallucination. The execution-layer strategy only works if the model can sustain long sessions cheaply and reliably, and the efficiency and accuracy gains in GPT-5.4 exist to make that possible.

Embedded in the workflow

ChatGPT for Excel, launched March 5 in beta, is the most literal expression of this approach. The add-in embeds GPT-5.4 directly inside workbooks, where it can build financial models from plain-language descriptions, trace formula logic across sheets, and run scenario analysis. The model preserves the native structure of the workbook throughout, operating within Excel’s own formulas and assumptions rather than extracting data into a separate conversation. When a user inherits an unfamiliar spreadsheet, the model can explain how assumptions flow through the workbook and identify where outputs changed, work that previously required either the original author or hours of manual tracing. The beta is available to Plus, Pro, Business, Enterprise, and Edu subscribers in the United States, Canada, and Australia, with Google Sheets support listed as coming soon.

The same week, OpenAI enabled write actions for Google and Microsoft apps connected to ChatGPT. The feature allows the model to draft emails in Outlook, create documents and spreadsheets in Google Workspace, and schedule meetings through the respective calendar apps. Earlier integrations allowed ChatGPT to read information from these services, but write actions let it create and modify objects within them. These features are accordingly disabled by default, requiring workspace administrators to enable each app individually and, in some Microsoft environments, to provide Entra admin approval before new users can connect. The layered opt-in structure treats generating text and taking action on behalf of a user as different categories of risk.

The financial data integrations announced alongside Excel reveal the commercial logic beneath the product strategy. OpenAI launched connections to FactSet, Moody’s, MSCI, Dow Jones Factiva, and several other providers, allowing ChatGPT to pull market data, company filings, and research directly into workflows. A model that can build a financial model in Excel is useful, but a model that can build one while drawing on proprietary data feeds that analysts already pay for is considerably harder to replace. These integrations bind ChatGPT to the data infrastructure of specific industries, creating switching costs that a better model alone cannot overcome.

The robotic repository

The Codex desktop app, which launched for macOS in late February and arrived on Windows on March 4, changes the relationship between a developer and an AI coding tool. Previous iterations of Codex operated as a pair programmer, with one human working alongside one agent on a single task. The desktop app allows a single human developer to supervise multiple agents working in parallel on different parts of the project. One agent can build a feature while another writes tests and a third conducts code review. OpenAI reports that more than a million developers used Codex in the month preceding the launch and that overall usage has doubled since GPT-5.2-Codex shipped in mid-December. The adoption numbers suggest that developers are finding value not just in faster code generation but in the ability to distribute their judgment across several concurrent workstreams.

Codex Security, released March 6 in research preview, applies the execution-layer logic to application security. Traditional scanning tools match code against known vulnerability patterns and report everything that resembles a problem, leaving security teams to sort signal from noise. Codex Security begins by analyzing a repository to build a project-specific threat model that captures what the system does, what it trusts, and where it is most exposed, and then uses that model as context when searching for vulnerabilities. When it finds a candidate, the agent validates the finding in a sandboxed environment before surfacing it, and proposes a patch that accounts for the surrounding codebase. During beta, false positives fell by more than 50% and over-reported severity findings dropped by more than 90%. The agent has already been credited with 14 CVEs across widely used open-source projects including OpenSSH, GnuTLS, and Chromium.

Both products reward sustained use in a way that earlier ChatGPT interactions did not. The Codex app manages long-running agent sessions in which context accumulates across tasks rather than resetting with each prompt. Codex Security maintains an editable threat model that refines itself as teams adjust the criticality of findings, improving precision on every subsequent scan. The longer these tools run, the more useful they become, which is a retention mechanism as well as a technical achievement.

Informative implementations

On March 10, OpenAI introduced dynamic visual explanations for more than 70 math and science concepts in ChatGPT. When a user asks about the Pythagorean theorem or Ohm’s law, the model responds with both a text explanation and an interactive module in which variables can be adjusted and graphs updated in real time. A student exploring the ideal gas law can change temperature and volume and watch the relationship resolve itself visually, forming an intuition that a written explanation alone is unlikely to produce. The feature is available to all logged-in users, including those on the free tier, and OpenAI reports that 140 million people already use ChatGPT weekly for math and science help.

The learning modules are the clearest illustration of what the rest of the March releases are doing at larger scale. Where users previously received written explanations in response to their questions, they can now manipulate variables directly and watch the entire system respond. OpenAI cites research suggesting that interaction-based learning leads to stronger conceptual understanding than static instruction, and that claim is reflected in the design of these modules. Rather than quizzing users or walking them through a solution step by step, they give learners direct control over the inputs and let their interplay emerge from experimentation.

The learning tools also reveal how far OpenAI’s execution-layer strategy extends beyond the enterprise use cases dominating the rest of the March releases. ChatGPT for Excel targets paying professionals, while Codex Security and the Codex app serve developers on subscription plans. By contrast, the interactive learning modules are available to every logged-in user, including those who pay nothing. If the execution layer were confined to premium tiers, it would be just another product strategy. Its presence on the free plan suggests something closer to a platform shift, in which the default mode of interacting with ChatGPT is no longer conversation but direct manipulation of the thing being discussed.

Silicon showdown

The execution-layer strategy carries risks that a text-generation tool does not. While a bad paragraph can be rewritten, a sent email or a modified financial model cannot easily be recalled. The governance features threaded through these launches suggest that OpenAI is alert to the distinction, even if the safeguards are still maturing alongside the capabilities they are meant to constrain.

OpenAI is not the first company to pursue this strategy. Anthropic released Claude Code over a year ago while its Cowork platform has been available since late January, complete with workflow automation plugins and add-ons for Excel and PowerPoint. OpenAI’s March blitz covering the same ground in a single week demonstrates a clear intent to challenge Anthropic’s dominance in the agentic workspace.

Anthropic wagered months ago that the advantage in this market would belong to the company most deeply embedded in the applications where work already happens. OpenAI’s recent flurry of activity suggests that it has arrived at the same conclusion. The window of opportunity in which that kind of integration can be achieved is very finite, and both companies appear to have caught on.

AI Central

Discussion about this post

Ready for more?