๐ Data Frames
Oxen makes it easy to work with tabular data as well as other data formats.
Upload Your Data
You can either upload data directly to a repository using the web interface or CLI. On the web, simply click the Add Files
button in the repository and select your file.
If using the command line, you can add files to a repository with the oxen add
command.
Look At Your Data
Oxen.ai makes it easy to look at your data in a tabular format.
When files are committed to a repository, Oxen automatically detects the format of your data and loads it into a DataFrame if it is a csv
, tsv
, parquet
, json
, jsonl
, ndjson
, or arrow
file. Behind the scenes, Oxen uses the Polars library to load your data in a performant and efficient manner.
Query Your Data
All Oxen data frames can be queried with SQL. When using the UI, we also provide a Text2SQL interface to help you get started. We automatically translate natural language questions into SQL queries and return the results in a tabular format.
Edit Your Data
You can also edit data frames directly in the UI. Double click on a cell to edit it, and use the buttons in the side panel to add, delete, and modify rows. You can also rename columns, add new columns, and remove columns.
When you feel good about your changes, you can commit them to the repository with the Commit
button in the top right.
Oxen CLI
Oxen also provides command line tools to interact with data frames. This makes it easy to manipulate data files before committing them to the repository.
oxen df
oxen df
is a handy subcommand to interact with data frames locally. For example, oxen df <FILENAME>
displays the contents of tabular data files.
Here, we see that SpamOrHamโs dataset consists of 4,774 rows and 2 columns. The output is automatically truncated to 10 entries. To display the entire data set, you can use the --full
flag.
You can also use oxen df
options to view your data with modifications. These changes wonโt be written anywhere unless you use the --write
or --output
flags.
Uploading Data
Before modifying your data, add it to a repository to preserve its history. This can be done in the UI, Python, or CLI.
If youโve pushed to the Oxen Hub, you can view, edit, and query your data directly using the UI.
Editing Data Frames
Once youโve added your data to an Oxen repository, you can interact with data frames even if theyโre not downloaded locally. Oxen exposes a CRUD interface that makes this possible.
All of these operations are exposed over HTTP, so you are not limited to using the Python library. Check out all our HTTP reference docs to see how to interact with your data programatically.
You can also edit data files locally with oxen df --write
. Any modifications you make with this flag set will be written back to the original file and register as โmodifiedโ in your Oxen repository.
Oxen uses a combination of polars and duckdb under the hood, and uses the Apache Arrow data format to provide powerful cross application functionality.
Useful Commands
There are many ways you might want to view, transform, and filter your data on the command line before committing changes to the dataset. oxen df
provides several options that can help with this.
For these examples, weโll use our CatDogBBox repository.
Convert Dataset Format
Oxen allows you to quickly transform data files between data formats. When you run oxen df
with --output
, the resulting data frame will be written to disk as a new file of the specified type.
Some formats like parquet and arrow are more efficient for different tasks, but are not human readable like tsv or csv. These are tradeoffs youโll have to decide on for your application. Oxen currently supports the following file extensions: csv
, tsv
, parquet
, arrow
, json
, jsonl
.
SQL Query
Oxen has a powerful SQL query engine built in to the CLI. You can run SQL queries on your data frames with the โsql flag.
View Specific Columns
If you only need a subset of your data frameโs columns, you can specify them in a comma separated list with --columns
.
Take Indices
You can also view particular rows using --take
Unique
Oxen can efficiently compute all the unique values of a given column or set of columns using the --unique
option.
Concatenate (vstack)
If youโve filtered down your data and want to stack it back into a single frame. The --vstack
option takes a variable length list of files youโd like to concatenate.
Add Column
Your data might not match the schema of a data frame you want to combine with, in which case you may need to add a column to match it. You can do this and project default values with --add-col 'col:val:dtype'
Add Row
You can also append new rows to the data frame. The --add-row
option takes in a comma separated list of values and automatically parses the correct dtypes.
Randomize
Often, youโll want to randomize data before splitting into train and test sets, or just to peek at different data values. This can be done with the --randomize
flag.
Sort
You can sort your data with the sort
flag. You can sort the data by the values of any column in your data frame.
Reverse
You can also reverse the order of a data table. By default --sort
sorts in ascending order, but this can be switched with the --reverse
flag.