Documentation Index
Fetch the complete documentation index at: https://docs.oxen.ai/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
POST /api/ai/chat/completions
Compatible with the OpenAI chat completions format. Supports streaming, multimodal input (images and video), tool calling, and structured output.
Request Parameters
| Parameter | Type | Required | Default | Description |
|---|
model | string | yes | — | Model name (e.g. claude-sonnet-4-6, gpt-5-4-2026-03-05, gemini-3-1-flash-lite-preview) |
messages | array | yes | — | Array of message objects. Must not be empty. |
stream | boolean | no | false | Stream the response as server-sent events. |
max_tokens | integer | no | varies | Maximum tokens in the response. |
temperature | number | no | varies | Sampling temperature (0-2). |
top_p | number | no | — | Nucleus sampling parameter. |
frequency_penalty | number | no | — | Penalize repeated tokens. |
presence_penalty | number | no | — | Penalize tokens already present. |
tools | array | no | — | Tool/function definitions for tool calling. |
tool_choice | string/object | no | — | Control tool selection behavior. |
parallel_tool_calls | boolean | no | — | Allow parallel tool calls. |
response_format | object | no | — | Constrain response format (e.g. {"type": "json_object"}). Support varies by provider. |
Each message has a role and content:
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there!"}
]
Vision (multimodal)
Use a content array to include images or video:
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}
Video input:
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this video"},
{"type": "video_url", "video_url": {"url": "https://example.com/clip.mp4"}}
]
}
Image and video URLs must be publicly accessible.
Examples
Basic text generation
from openai import OpenAI
client = OpenAI(
base_url="https://hub.oxen.ai/api/ai",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Say hello in exactly 3 words."}],
max_tokens=50,
temperature=0.1,
)
print(response.choices[0].message.content)
Response
{
"id": "chatcmpl-97eab7db-fe67-4b29-900c-ed5260c654d4",
"object": "chat.completion",
"created": 1775090332,
"model": "claude-sonnet-4-6",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello, how are you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 5,
"total_tokens": 20
}
}
Streaming
from openai import OpenAI
client = OpenAI(
base_url="https://hub.oxen.ai/api/ai",
api_key="YOUR_API_KEY",
)
stream = client.chat.completions.create(
model="gemini-3-1-flash-lite-preview",
messages=[{"role": "user", "content": "Say hello"}],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
print()
Returns server-sent events. Each chunk has a delta instead of a full message:
data: {"choices":[{"delta":{"content":"Hello"},"finish_reason":null,"index":0}],"created":1775090334,"id":"chatcmpl-...","model":"gemini-3-1-flash-lite-preview","object":"chat.completion.chunk"}
data: {"choices":[{"delta":{"content":" there"},"finish_reason":null,"index":0}],...}
data: [DONE]
from openai import OpenAI
client = OpenAI(
base_url="https://hub.oxen.ai/api/ai",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="gpt-5-4-2026-03-05",
messages=[
{"role": "system", "content": "Use tools when appropriate."},
{"role": "user", "content": "What is the weather in San Francisco?"},
],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"],
},
},
}],
)
tool_call = response.choices[0].message.tool_calls[0]
print(f"{tool_call.function.name}({tool_call.function.arguments})")
When the model uses a tool, finish_reason is "tool_calls":
{
"choices": [{
"finish_reason": "tool_calls",
"message": {
"content": null,
"role": "assistant",
"tool_calls": [{
"id": "call_GRNwPXnbuQW4Sa3QNB3FYkYw",
"index": 0,
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"San Francisco\"}"
}
}]
}
}]
}
Structured output (JSON mode)
from openai import OpenAI
client = OpenAI(
base_url="https://hub.oxen.ai/api/ai",
api_key="YOUR_API_KEY",
)
response = client.chat.completions.create(
model="gpt-5-4-2026-03-05",
messages=[{"role": "user", "content": "List 3 colors as a JSON array"}],
response_format={"type": "json_object"},
max_tokens=100,
)
print(response.choices[0].message.content)
Errors
| Condition | Error |
|---|
| No model specified | "You must specify a model to call" |
| Model not found | "Model not found: <name>" |
| Empty messages | "Messages array cannot be empty" |
| Insufficient credits | Credit-related error message |