Skip to main content

Try Gemini 2.5 Flash Lite Preview in the Workbench

Run this model interactively, tune parameters, and compare outputs.
Model ID: gemini-2-5-flash-lite-preview-09-2025 Gemini 2.5 Flash Lite Preview is a multimodal LLM designed for cost-efficient, high-volume, and latency-sensitive tasks. It excels in rapid processing of large contexts—such as entire codebases or extensive document collections—at a significantly reduced cost compared to other models, while maintaining strong performance in coding, math, science, logic, and high-throughput enterprise operations. Some other noteworthy features of Gemini 2.5 Flash Lite Preview include native support for text, code, image, audio, and video inputs, and a 1 million-token context window, making it suitable for tasks like translation, classification, intelligent routing, and real-time multimodal analysis.
MetricValue
Parameter CountUnknown
Mixture of ExpertsUnknown
Context Length1,000,000 tokens
MultilingualYes
Quantized*Yes
Precision*FP8
*Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.

Example request

Use the Workbench as a request builder: configure parameters for this model in the UI, then open the API tab to copy the exact cURL or Python call.
curl -X POST https://hub.oxen.ai/api/ai/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OXEN_API_KEY" \
  -d '{
  "model": "gemini-2-5-flash-lite-preview-09-2025",
  "messages": [
    {
      "role": "user",
      "content": "Hello, what can you do?"
    }
  ]
}'

Fetch model details

The models endpoint returns the full model object, including its json_request_schema.
curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/gemini-2-5-flash-lite-preview-09-2025

Request parameters

This model follows the standard OpenAI chat completions request body. See the chat completions reference for the full parameter list.