What is the Inference API?
The Inference API gives you access to hundreds of AI models through a single, consistent interface. Generate text, images, and videos without managing infrastructure or juggling multiple provider SDKs. Capabilities:- Text Generation: Chat completions, tool calling, vision, structured output
- Image Generation: Text-to-image, image-to-image editing
- Video Generation: Text-to-video, image-to-video, reference-to-video, video-to-video editing
Quick Starts
Chat
Text generation in minutes
Image Generation
Text-to-image in minutes
Video Generation
Text-to-video in minutes
API Reference
Chat Completions
Text generation, vision, tool calling
Image Generation
Text-to-image generation
Image Editing
Edit images with text prompts
Video Generation
Text-to-video, image-to-video, multi-shot
Async Queue
Background image/video generation
Models
List, search, and manage models
Model References
Kling O3 Pro: Reference to Video
Multi-shot video with reference images, elements, and audio
Kling O3 Pro: Video to Video Edit
Edit existing videos with text instructions and reference images
Seedance 2.0: Reference to Video
Generate video from text, images, video, and audio references
Topaz Starlight Precise 2.5
Video upscaling and restoration up to 4K
Authentication
All requests require a bearer token:Base URL
All inference endpoints live under:https://hub.oxen.ai/api/ai. The SDK appends /chat/completions automatically.
Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ai/chat/completions | POST | Text generation (chat, vision, tool use) |
/ai/images/generate | POST | Image generation |
/ai/images/edit | POST | Image editing |
/ai/videos/generate | POST | Video generation |
/ai/queue | POST | Async image/video generation |
/ai/queue | GET | List queued generations |
/ai/queue/:generation_id | GET | Get generation status |
/ai/queue/:generation_id | DELETE | Cancel a queued generation |
/ai/models | GET | List available models |
/ai/models/:id | GET | Get model details and parameter schema |
/ai/models/search | GET | Search models by name |
/ai/models/:id/activate | POST | Activate a custom model deployment |
/ai/models/:id/deactivate | POST | Deactivate a custom model deployment |
Common Parameters
These parameters are accepted across multiple endpoints:| Parameter | Type | Description |
|---|---|---|
model | string | Required. The model to use (e.g. claude-sonnet-4-6, flux-2-dev, kling-video-o3-pro-reference-to-video). |
response_format | string | "url" (default) returns a hosted URL. "b64_json" returns base64-encoded bytes inline. Supported on image and video endpoints. |
target_namespace | string | Namespace to save results and bill to. Defaults to your user. Can be an organization name. |
Discovering Models
List all models, optionally filtered by developer:request_schema field with the complete parameter definitions, types, defaults, and constraints for that model.
Pricing
Pricing varies by model:| Method | How it works | Examples |
|---|---|---|
token | Per input/output token | GPT, Claude, Gemini |
time | Per second of compute time | Custom models, Llama, Qwen |
per_image | Fixed cost per image | FLUX, DALL-E |
per_video_output_second | Cost per second of output video | Kling, Sora |
input_cost_per_token, output_cost_per_token, cost_per_image, cost_per_second, cost_per_second_with_audio, cost_per_second_high_res.
Error Format
Errors use one of two formats:unauthenticated, invalid_params, resource_not_found, unknown_error.
Need help? Join our Discord community.