ModeleCenyPrzedsiębiorstwo
Ponad 500 API modeli AI, wszystko w jednym API. Tylko w CometAPI
API modeli
Deweloper
Szybki startDokumentacjaPanel API
Firma
O nasPrzedsiębiorstwo
Zasoby
Modele Sztucznej InteligencjiBlogDziennik zmianWsparcie
Warunki korzystania z usługiPolityka Prywatności
© 2026 CometAPI · All rights reserved
Home/Models/Google/Gemini 2.5 Flash
G

Gemini 2.5 Flash

Wejście:$0.3/M
Wyjście:$7/M
Kontekst:1M
Maks. wyjście:65K
Gemini 2.5 Flash is an AI model developed by Google, designed to provide fast and cost-effective solutions for developers, especially for applications requiring enhanced Inference capabilities. According to the Gemini 2.5 Flash preview announcement, the model was released in preview on April 17, 2025, supports Multimodal input, and has a context window of 1 million tokens. This model supports a maximum context length of 65,536 tokens.
Nowy
Użycie komercyjne
Playground
Przegląd
Funkcje
Cennik
API
Wersje

Gemini 2.5 Flash is engineered to deliver rapid responses without compromising on the quality of output. It supports multimodal inputs, including text, images, audio, and video, making it suitable for diverse applications. The model is accessible through platforms like Google AI Studio and Vertex AI, providing developers with the tools necessary for seamless integration into various systems.


Basic Information (Features)

Gemini 2.5 Flash introduces several stand-out features that distinguish it within the Gemini 2.5 family:

  • Hybrid Reasoning: Developers can set a thinking_budget parameter to finely control how many tokens the model dedicates to internal reasoning before output .
  • Pareto Frontier: Positioned at the optimal cost-performance point, Flash offers the best price-to-intelligence ratio among 2.5 models .
  • Multimodal Support: Processes text, images, video, and audio natively, enabling richer conversational and analytical capabilities .
  • 1 Million-Token Context: Unmatched context length allows deep analysis and long document understanding in a single request .

Model Versioning

Gemini 2.5 Flash has transitioned through the following key versions:

  • gemini-2.5-flash-lite-preview-09-2025: Enhanced tool usability: Improved performance on complex, multi-step tasks, with a 5% increase in SWE-Bench Verified scores (from 48.9% to 54%). Improved efficiency: When enabling reasoning, higher-quality output is achieved with fewer tokens, reducing latency and costs.
  • Preview 04-17: Early access release with “thinking” capability, available via gemini-2.5-flash-preview-04-17.
  • Stable General Availability (GA): As of June 17, 2025, the stable endpoint gemini-2.5-flash replaces the preview, ensuring production-grade reliability with no API changes from the May 20 preview .
  • Deprecation of Preview: Preview endpoints were scheduled for shutdown on July 15, 2025; users must migrate to the GA endpoint before this date .

As of July 2025, Gemini 2.5 Flash is now publicly available and stable (no changes from the  gemini-2.5-flash-preview-05-20 ).If you are using gemini-2.5-flash-preview-04-17, the existing preview pricing will continue until the scheduled retirement of the model endpoint on July 15, 2025, when it will be shut down. You can migrate to the generally available model "gemini-2.5-flash" .

Faster, cheaper, smarter:

  • Design goals: low latency + high throughput + low cost;
  • Overall speedup in reasoning, multimodal processing, and long text tasks;
  • Token usage is reduced by 20–30%, significantly reducing reasoning costs.

Technical Specifications

Input Context Window: Up to 1 million tokens, allowing for extensive context retention.

Output Tokens: Capable of generating up to 8,192 tokens per response.

Modalities Supported: Text, images, audio, and video.

Integration Platforms: Available through Google AI Studio and Vertex AI.

Pricing: Competitive token-based pricing model, facilitating cost-effective deployment.


Technical Details

Under the hood, Gemini 2.5 Flash is a transformer-based large language model trained on a mixture of web, code, image, and video data. Key technical specifications include:

Multimodal Training: Trained to align multiple modalities, Flash can seamlessly mix text with images, video, or audio, useful for tasks like video summarization or audio captioning .

Dynamic Thinking Process: Implements an internal reasoning loop where the model plans and breaks down complex prompts before final output .

Configurable Thinking Budgets: The thinking_budget can be set from 0 (no reasoning) up to 24,576 tokens, allowing trade-offs between latency and answer quality .

Tool Integration: Supports Grounding with Google Search, Code Execution, URL Context, and Function Calling, enabling real-world actions directly from natural language prompts .


Benchmark Performance

In rigorous evaluations, Gemini 2.5 Flash demonstrates industry-leading performance:

  • LMArena Hard Prompts: Scored second only to 2.5 Pro on the challenging Hard Prompts benchmark, showcasing strong multi-step reasoning capabilities .
  • MMLU Score of 0.809: Exceeds average model performance with a 0.809 MMLU accuracy, reflecting its broad domain knowledge and reasoning prowess .
  • Latency and Throughput: Achieves 271.4 tokens/sec decoding speed with a 0.29 s Time-to-First-Token, making it ideal for latency-sensitive workloads.
  • Price-to-Performance Leader: At \$0.26/1 M tokens, Flash undercuts many competitors while matching or surpassing them on key benchmarks .

These results indicate Gemini 2.5 Flash's competitive edge in reasoning, scientific understanding, mathematical problem-solving, coding, visual interpretation, and multilingual capabilities:

Gemini 2.5 Flash


Limitations

While powerful, Gemini 2.5 Flash carries certain limitations:

  • Safety Risks: The model can exhibit a “preachy” tone and may produce plausible-sounding but incorrect or biased outputs (hallucinations), particularly on edge-case queries. Rigorous human oversight remains essential.
  • Rate Limits: API usage is constrained by rate limits (10 RPM, 250,000 TPM, 250 RPD on default tiers), which can impact batch processing or high-volume applications.
  • Intelligence Floor: While exceptionally capable for a flash model, it remains less accurate than 2.5 Pro on the most demanding agentic tasks like advanced coding or multi-agent coordination.
  • Cost Trade-Offs: Although offering the best price-performance, extensive use of the thinking mode increases overall token consumption, raising costs for deeply reasoning prompts .

Funkcje dla Gemini 2.5 Flash

Poznaj kluczowe funkcje Gemini 2.5 Flash, zaprojektowane w celu zwiększenia wydajności i użyteczności. Odkryj, jak te możliwości mogą przynieść korzyści Twoim projektom i poprawić doświadczenie użytkownika.

Cennik dla Gemini 2.5 Flash

Poznaj konkurencyjne ceny dla Gemini 2.5 Flash, zaprojektowane tak, aby pasowały do różnych budżetów i potrzeb użytkowania. Nasze elastyczne plany zapewniają, że płacisz tylko za to, czego używasz, co ułatwia skalowanie w miarę wzrostu Twoich wymagań. Odkryj, jak Gemini 2.5 Flash może ulepszyć Twoje projekty przy jednoczesnym utrzymaniu kosztów na rozsądnym poziomie.
Cena Comet (USD / M Tokens)Oficjalna cena (USD / M Tokens)Zniżka
Wejście:$0.3/M
Wyjście:$7/M
Wejście:$0.375/M
Wyjście:$8.75/M
-20%

Przykładowy kod i API dla Gemini 2.5 Flash

Gemini 2.5 Flash API is Google's latest multimodal AI model, designed for high-speed, cost-efficient tasks with controllable reasoning capabilities, allowing developers to toggle advanced "thinking" features on or off via the Gemini API
Python
JavaScript
Curl
from google import genai
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com"

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": BASE_URL},
    api_key=COMETAPI_KEY,
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Tell me a three sentence bedtime story about a unicorn.",
)

print(response.text)

Python Code Example

from google import genai
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com"

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": BASE_URL},
    api_key=COMETAPI_KEY,
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Tell me a three sentence bedtime story about a unicorn.",
)

print(response.text)

JavaScript Code Example

// Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY;
const base_url = "https://api.cometapi.com/v1beta";
const model = "gemini-2.5-flash";
const operator = "generateContent";

async function main() {
  const response = await fetch(`${base_url}/models/${model}:${operator}`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: api_key,
    },
    body: JSON.stringify({
      contents: [
        {
          parts: [
            { text: "Tell me a three sentence bedtime story about a unicorn." },
          ],
        },
      ],
    }),
  });

  const data = await response.json();
  console.log(data.candidates[0].content.parts[0].text);
}

await main();

Curl Code Example

curl "https://api.cometapi.com/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "Authorization: $COMETAPI_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "contents": [
      {
        "parts": [
          {
            "text": "Tell me a three sentence bedtime story about a unicorn."
          }
        ]
      }
    ]
  }'

Wersje modelu Gemini 2.5 Flash

Powody, dla których Gemini 2.5 Flash posiada wiele migawek, mogą obejmować takie czynniki jak: różnice w wynikach po aktualizacjach wymagające starszych migawek dla zachowania spójności, zapewnienie programistom okresu przejściowego na adaptację i migrację, oraz różne migawki odpowiadające globalnym lub regionalnym punktom końcowym w celu optymalizacji doświadczenia użytkownika. Aby poznać szczegółowe różnice między wersjami, zapoznaj się z oficjalną dokumentacją.
version
gemini-2.5-flash-image

Więcej modeli

O

GPT-5.2 Chat

O

GPT-5.2 Chat

Wejście:$1.75/M
Wyjście:$14/M
gpt-5.2-chat-latest is the Chat-optimized snapshot of OpenAI’s GPT-5.2 family (branded in ChatGPT as GPT-5.2 Instant). It is the model for interactive/chat use cases that need a blend of speed, long-context handling, multimodal inputs and reliable conversational behaviour.
O

GPT-5.2

Wejście:$1.75/M
Wyjście:$14/M
GPT-5.2 is a multi-flavored model suite (Instant, Thinking, Pro) engineered for better long-context understanding, stronger coding and tool use, and materially higher performance on professional “knowledge-work” benchmarks.
O

GPT-5.1 Chat

O

GPT-5.1 Chat

Wejście:$1.25/M
Wyjście:$10/M
GPT-5.1 Chat is an instruction-tuned conversational language model for general-purpose chat, reasoning, and writing. It supports multi-turn dialogue, summarization, drafting, knowledge-base QA, and lightweight code assistance for in-app assistants, support automation, and workflow copilots. Technical highlights include chat-optimized alignment, controllable and structured outputs, and integration paths for tool invocation and retrieval workflows when available.
O

GPT-5.1

O

GPT-5.1

Wejście:$1.25/M
Wyjście:$10/M
GPT-5.1 is a general-purpose instruction-tuned language model focused on text generation and reasoning across product workflows. It supports multi-turn dialogue, structured output formatting, and code-oriented tasks such as drafting, refactoring, and explanation. Typical uses include chat assistants, retrieval-augmented QA, data transformation, and agent-style automation with tools or APIs when supported. Technical highlights include text-centric modality, instruction following, JSON-style outputs, and compatibility with function calling in common orchestration frameworks.
G

Gemini 2.5 Pro DeepSearch

G

Gemini 2.5 Pro DeepSearch

Wejście:$10/M
Wyjście:$80/M
Deep search model, with enhanced deep search and information retrieval capabilities, an ideal choice for complex knowledge integration and analysis.
G

Gemini 2.5 Pro (All)

G

Gemini 2.5 Pro (All)

Wejście:$1.25/M
Wyjście:$2.5/M
Gemini 2.5 Pro (All) is a multimodal model for text and media understanding, designed for general-purpose assistants and grounded reasoning. It handles instruction following, analytical writing, code comprehension, and image/audio understanding with reliable tool/function calling and RAG-friendly behavior. Typical uses include enterprise chat agents, document and UI analysis, visual question answering, and workflow automation. Technical highlights include unified image‑text‑audio inputs, long-context support, structured JSON output, streaming responses, and system-instruction control.