What is an Evaluation?
In Oxen.ai, an evaluation lets you test a model on a dataset row by row to see how well it performs. You provide a prompt, choose a model, and run the evaluation on any dataset file in your repository. The system uses column values and inserts them into variables marked with{variable_name}
to give the model context.


How to Run an Evaluation
Navigate to the dataset file you want to evaluate the model on. Click the “Actions” button and select “Run Inference”.
Run Sample
button.




Programmatically Run an Evaluation
If you need to run a model as part of a larger workflow, you can use the Oxen.ai API to programmatically run a model on a dataset. Currently the API is only exposed over HTTP requests and requires a valid api key in the header. To kick off a model inference job, you can send a POST request to the/api/repos/:namespace/:repo_name/evaluations/:resource
endpoint.
For example if the file you want to process is at:
https://oxen.ai/ox/customer-intents/main/data.parquet
The parameters should be:
:namespace
->ox
:repo_name
->customer-intents
:resource
->main/data.parquet
(combination of branch name andfile_name
)
evaluation.id
from the response as you will need it to check the status of the job or retrieve the results later.
To check the status of the job, you can send a GET request to the /api/repos/:namespace/:repo_name/evaluations/:evaluation_id
endpoint.
We will be adding Python SDK support for this in the near future.
Supported Models
Oxen.ai supports foundation models from Anthropic, Google, Meta, and OpenAI or the ability to evaluate your own fine-tuned models. The models may have multi-modal inputs and outputs such as text, images or embeddings. To see which model would best suit your task, visit our models page.