Data Type Detection
By default, Oxen.ai will detect the data type of a file based on the file extension and content type. The default data types are:tabular
->csv
,tsv
,jsonl
,parquet
,arrow
text
->txt
image
->png
,jpg
,jpeg
,gif
,bmp
,tiff
,webp
video
->mp4
,mov
audio
->mp3
,wav
,m4a
,ogg
,flac
Tabular Data
When you add a tabular file to Oxen, it automatically detects and versions the schema of any tabular data. This is done by using Polars under the hood to infer the column names and datatypes. To list all the schemas that have been detected and committed, you can use theoxen schemas
subcommand.
View Schema
To view a specific schema, you can pass in a schema hash, name, or path to theoxen schemas
command.
Add Schema
Schemas are automatically detected when you addcsv
, tsv
, jsonl
, parquet
, and arrow
files to Oxen. Before a schema is committed, you can see the detected schemas in the oxen status
command.
--staged
flag on the oxen schemas
command.
oxen df
command with the --schema
flag.
Additional Metadata
You can also add additional information to the schema. This is useful if you want to provide context about the data for a UI, data fetching, or any other reason. Notice the empty columnmetadata
in the schema above. You can add arbitrary JSON blobs to the schema itself, as well as each column.
Metadata may provide useful information for your end application:
- Transforms you want to perform.
- How you want to render the data.
- Information about the data itself, such as a description of the schema or colun.
Schema Metadata
At the root of each schema is anOptional<json::Value>
metadata value. This is useful for adding information about the schema itself. For example, you can add a description of the schema or a json blob that gives context to a data renderer.
Column Metadata
You can also add metadata to specific columns. Say you wanted to add information to thefile
column about the root directory of the images, you could do the following:
-c
flag stands for column
and the -m
flag stands for metadata
. The metadata is a JSON blob that can be used to store any information you want.
The OxenHub UI uses schema metadata to render more complex datatypes in the UI. For example viewing inline images directly in a dataframe.

Commit The Schema
Schemas changes will not be saved until you commit them. To view the schemas staged for commit, you can use the--staged
flag.
commit
subcommand.
Name Schema
It is nice to have human readable names to refer to schemas by. Use theoxen schemas name
command to name a schema.
Remove Schema
If you have accidentally staged a schema, you can remove it with theoxen schemas rm
command.