Look At Your Data
oxen df
Oxen uses thedf
command for all CLI actions involving data frames. For example, oxen df <FILENAME>
displays the contents of tabular data files.
--full
flag.
You can also use oxen df
options to view your data with modifications. These changes wonβt be written anywhere unless you use the --write
or --output
flags.
Uploading Data
Before modifying your data, add it to a repository to preserve its history. This can be done in the UI, Python, or CLI.
Editing Data Frames
Once youβve added your data to an Oxen repository, you can interact with data frames even if theyβre not downloaded locally. Oxen exposes a CRUD interface that makes this possible.oxen df --write
. Any modifications you make with this flag set will be written back to the original file and register as βmodifiedβ in your Oxen repository.
Useful Commands
There are many ways you might want to view, transform, and filter your data on the command line before committing changes to the dataset.oxen df
provides several options that can help with this.
For these examples, weβll use our CatDogBBox repository.
Convert Dataset Format
Oxen allows you to quickly transform data files between data formats. When you runoxen df
with --output
, the resulting data frame will be written to disk as a new file of the specified type.
Some formats like parquet and arrow are more efficient for different tasks, but are not human readable like tsv or csv. These are tradeoffs youβll have to decide on for your application. Oxen currently supports the following file extensions: csv
, tsv
, parquet
, arrow
, json
, jsonl
.
SQL Query
Oxen has a powerful SQL query engine built in to the CLI. You can run SQL queries on your data frames with the βsql flag.View Specific Columns
If you only need a subset of your data frameβs columns, you can specify them in a comma separated list with--columns
.
Take Indices
You can also view particular rows using--take
Unique
Oxen can efficiently compute all the unique values of a given column or set of columns using the--unique
option.
Concatenate (vstack)
If youβve filtered down your data and want to stack it back into a single frame. The--vstack
option takes a variable length list of files youβd like to concatenate.
Add Column
Your data might not match the schema of a data frame you want to combine with, in which case you may need to add a column to match it. You can do this and project default values with--add-col 'col:val:dtype'
Add Row
You can also append new rows to the data frame. The--add-row
option takes in a comma separated list of values and automatically parses the correct dtypes.
Randomize
Often, youβll want to randomize data before splitting into train and test sets, or just to peek at different data values. This can be done with the--randomize
flag.
Sort
You can sort your data with thesort
flag. You can sort the data by the values of any column in your data frame.
Reverse
You can also reverse the order of a data table. By default--sort
sorts in ascending order, but this can be switched with the --reverse
flag.