Gemini 2.5 Flash With ‘thinking Budget’ Rolling Out To Devs, Gemini App

Contents

After briefly detailing last week, Google is rolling out Gemini 2.5 Flash in preview today. A “thinking budget” lets developers control how much reasoning occurs depending on the prompt and use case.

All models in the Gemini 2.5 family have reasoning capabilities that think “through their thoughts before responding” for “enhanced performance and improved accuracy.” This is ideal for prompts that require multi-step reasoning, like math problems and analyzing research questions

Instead of immediately generating an output, the model can perform a “thinking” process to better understand the query, break down complex tasks, and plan its response.

For developers

Gemini’s Flash models are known for their speed and lower cost. That’s not changing with 2.5 Flash, but Google is introducing reasoning capabilities where developers are able to “set thinking budgets to control cost vs quality.”

Key specifications for Gemini 2.5 Flash in preview (gemini-2.5-flash-preview-04-17):

Rate Limits: 1000 RPM / 10,000 RPD (Paid Tier), 10 RPM / 500 RPD (Free Tier)
Knowledge Cutoff: January 2025
Input Modalities: Text, Images, Video, Audio
Output Modalities: Text
Context Window: 1 million tokens
Max Output Length: 64K tokens

Specifically, developers control the “number of tokens a model can generate while thinking” from 0 to 24,576 tokens. There’s a slider in Google AI Studio and Vertex AI, as well as an API parameter. In the graphs below, you can see how reasoning quality improves as the budget increases.

If the thinking budget is set to zero, this new model will match 2.0 Flash’s cost & latency.

If a budget isn’t specified, Gemini 2.5 Flash “automatically decides how much to think based on the perceived task complexity.” Google provides examples of minimal, medium, and high reasoning:

Prompts with minimal reasoning:

“Thank you” in Spanish
How many provinces does Canada have?

Prompts with medium reasoning:

You roll two dice. What’s the probability they add up to 7?
My gym has pickup hours for basketball between 9-3pm on MWF and between 2-8pm on Tuesday and Saturday. If I work 9-6pm 5 days a week and want to play 5 hours of basketball on weekdays, create a schedule for me to make it all work.

Prompts with high reasoning:

In the context of agents, another example is how quick summaries would involve a low thinking budget, while detailed analysis requires a higher one.

Gemini 2.5 Flash is available to preview for developers in Google AI Studio and Vertex AI. Google says it will “continue to improve Gemini 2.5 Flash, with more coming soon, before we make it generally available for full production use.”

Gemini app

2.5 Flash (experimental) is also coming to the Gemini app with the ability to automatically adjust how much reasoning occurs based on the prompt’s complexity. End users don’t get any sort of manual adjustment in the app.

At launch, the various Gemini app capabilities, like apps/Extensions, file upload, etc., are supported, while this model will replace 2.0 Flash Thinking (experimental), which was last updated in March.