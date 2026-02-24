There was a time, not long ago, when choosing an AI model meant choosing a single fixed level of intelligence. The choice was GPT-4 or Claude or Gemini, and the models delivered whatever quality of reasoning they could muster. The model thought exactly as hard about “what’s the capital of France” as it did about a graduate-level physics problem, because it had no mechanism to do otherwise.

That era is functionally over. Over the past several months, every major AI lab has converged on the same architectural insight: reasoning depth should be a variable, not a constant. The result is what amounts to a “thinking dial,” a user-facing control that lets both developers and end users specify how much computational effort a model should invest before answering. And the most recent releases suggest this is no longer an experimental toggle but the foundational interface for how AI inference will work going forward.

What changed, and when

The concept traces back to OpenAI’s o1 model, released in late 2024, which introduced “test-time compute” as a distinct paradigm. Rather than relying solely on the knowledge baked into the model during training, o1 could spend additional processing time reasoning through a problem before responding. The o1 family proved, sometimes in dramatic fashion, that the concept worked particularly well on math, coding, and scientific reasoning tasks.

But o1 was a separate model. If the goal was reasoning, the move was to switch to the reasoning model; for speed and fluency, switch back to GPT-4o. This created a clumsy workflow and a conceptual divide: “thinking” models over here, “regular” models over there.

Anthropic broke from this pattern in February 2025 with Claude 3.7 Sonnet, which introduced extended thinking as a toggle within a single model rather than a separate product. The same model could answer instantly or deliberate at length, and developers could set a “thinking budget” in tokens to control the tradeoff. Anthropic’s framing was explicit: just as humans use one brain for both snap judgments and careful analysis, reasoning should be an integrated capability rather than a separate system.

That philosophical position has since become industry consensus. OpenAI’s GPT-5 family, released mid-2025, shipped with what OpenAI calls a “real-time router” that automatically decides whether a query needs quick processing or deeper reasoning, effectively folding the o-series capability into the mainline model. The GPT-5.2 update in December went further, exposing five discrete reasoning levels through the API: none, low, medium, high, and extra-high. ChatGPT users got a thinking-level toggle in September 2025 that OpenAI has been quietly adjusting ever since, tweaking default reasoning times based on usage data.

Then, this month, Google made the convergence explicit. On February 12, Google released a major upgrade to Gemini 3 Deep Think, its specialized reasoning mode that had previously existed as a separate, slower capability available only to Ultra subscribers. The upgraded Deep Think scored 48.4% on Humanity’s Last Exam without tools, hit 84.6% on the ARC-AGI-2 benchmark, and, in collaboration with researchers, helped resolve 18 previously unsolved problems across mathematics, physics, and computer science, including disproving a decade-old mathematical conjecture.

A week later, Google followed with Gemini 3.1 Pro, which introduced three-tier adjustable thinking levels: low, medium, and high. As VentureBeat reported, the high setting effectively turns 3.1 Pro into a lightweight version of Deep Think, making this the first time Google has offered graduated reasoning intensity within a single production model. The shift from “pick a model” to “pick a reasoning intensity” is now complete across all three major providers.

Anthropic’s own Claude Opus 4.6, released February 5, pushes the concept further still with what Anthropic calls “adaptive thinking,” where the model itself decides how deeply to reason based on the difficulty of the task and the surrounding context. Developers retain override control through four explicit effort levels (low, medium, high, and max), but the default behavior is now autonomous: the model reads the room and allocates its own cognitive budget.

Why it matters beyond benchmarks

The benchmark numbers are impressive in isolation, but the real significance of the thinking dial is economic and architectural.

On the economic side, reasoning costs compute. When a model “thinks,” it generates internal tokens that consume processing power and time, even though the user never sees them. A query that takes 200 milliseconds at low reasoning effort might take 30 seconds or more at maximum effort, with proportionally higher costs. OpenAI’s GPT-5.2 Pro, the tier designed for maximum reliability, costs $21 per million input tokens, roughly 12 times the price of the standard Thinking tier. Google’s Deep Think responses take minutes rather than seconds to generate. The thinking dial, in other words, is also a cost dial, and giving developers explicit control over it transforms inference economics from a fixed expense into something one can tune per query, per endpoint, and per use case.

This is not a trivial shift. For enterprises running AI at production scale, the difference between routing every query through maximum reasoning and intelligently matching reasoning depth to task complexity can mean orders-of-magnitude variation in monthly compute bills. The architectural pattern that is emerging, where a lightweight router or the model itself triages incoming queries by difficulty before allocating reasoning resources, may prove as consequential for AI deployment economics as the models’ raw capabilities.

On the architectural side, adjustable reasoning changes what a single model deployment can do. Instead of maintaining separate model endpoints for different task types (a fast model for chat, a powerful model for analysis, a specialized model for code), a single model can increasingly be served with reasoning effort as a parameter. This simplifies infrastructure, reduces the surface area for deployment errors, and makes it easier to build applications that gracefully handle a wide range of task complexity without requiring the application layer to choose between models.

The convergence, and the remaining differences

Despite the broad convergence toward adjustable reasoning, the implementations differ in ways that matter.

OpenAI’s approach leans most heavily on explicit user and developer control. Five API reasoning levels, separate model tiers (Instant, Thinking, Pro) within the GPT-5.2 family, and a consumer-facing toggle all put the human in charge of the tradeoff. The router in GPT-5 automates some of this, but the overall design philosophy emphasizes user agency.

Anthropic’s adaptive thinking in Opus 4.6 takes the opposite bet: the model should figure out when to think hard, with human overrides available but not required by default. This reflects Anthropic’s broader design philosophy around agentic AI, where models that can manage their own cognitive resources are better suited for long-running, multi-step tasks where a human cannot reasonably supervise every reasoning decision.

Google’s approach, particularly with Deep Think, carves out a distinct niche by positioning maximum reasoning as a specialized capability for scientific and research applications rather than a general-purpose dial. The gated API access and the emphasis on peer-reviewed research problems suggest Google sees the highest reasoning tiers as tools for domain experts rather than everyday users, at least for now.

These are genuine philosophical differences about who should control the thinking dial and when it should be turned up, not just branding distinctions. And they will likely shape which provider different categories of users and developers gravitate toward as the technology continues to mature.

What to watch

The thinking dial is still new enough that several important questions remain open. Faithfulness is among them: Anthropic has been transparent that a model’s visible chain of thought may not accurately represent its actual internal reasoning process, which complicates any attempt to use thinking traces for safety monitoring or auditing. Cost predictability is another, since variable reasoning effort means variable costs, and applications that rely on tight per-query budgets will need robust controls to prevent reasoning from spiraling.

Perhaps most interesting is the question of whether reasoning effort will eventually become fully autonomous. Anthropic’s adaptive thinking and OpenAI’s automatic router both gesture toward a future where the model silently decides how hard to think about each query, with the user never knowing or needing to know. That future is convenient, but it also makes AI behavior less predictable and harder to audit. The tension between autonomous intelligence and human oversight, a theme that runs through essentially every AI policy debate, turns out to have a very concrete expression in the design of the thinking dial.

For now, the practical takeaway is straightforward: the AI model is no longer a fixed capability. It is a variable one, and the variable is how much it thinks. Learning to work with that variable, whether as a developer setting API parameters or an end user choosing between “quick” and “deep” in a chat interface, is becoming a core competency for anyone who uses these tools seriously.