๐ฎ Learn The Basics
Oxen makes versioning your datasets as easy as versioning your code. You can install through homebrew or pip or from our releases page.
brew tap Oxen-AI/oxen
brew install oxen
Clone Dataset
Clone your first Oxen repository from the OxenHub.
oxen clone https://hub.oxen.ai/ox/CatDogBBox
Initialize User
Each change you make will be associated with a name and email. Set them before you get started so you know who changed what. The user data is saved by default in ~/.config/oxen/user_config.toml
.
oxen config --name "Bessie Oxington" --email "bessie@yourcomany.com"
Create Repository
Initialize your first Oxen repository, and commit the first version of your data.
# Initialize the repository
oxen init
# Write data to a file
printf '%s\n' 'name,age' 'bob,12' 'jane,13' > people.csv
# Stage the data for commit
oxen add people.csv
# Commit the changes with a message
oxen commit -m "Adding my data"
Version Your Data
Once your data has been committed, you can always return to that version.
Confidently overwrite the file, move the file, delete the file, it doesnโt matter. Oxen will always have a copy of the data at the time of the previous commit.
Create Branch
It is good practice to create a new branch for changes you make to your data. This will allow you to easily compare the parallel versions of your data over time.
# Checkout a branch named `modify-data`
oxen checkout -b modify-data
# Overwrite data in existing file
printf '%s\n' 'name,age' 'bob,12' 'jane,13' 'joe,14' > people.csv
Compare Changes
View the change you made with the oxen diff
command. This will show you the changes you made to your data since the last commit.
oxen diff people.csv
Once you push you changes to OxenHub, you can view the changes you made in your commit history.
Revert Changes
If you are not happy with the changes you made to your data, you can revert them to the previous commit with the oxen restore
command.
oxen restore people.csv
Commit Changes
Once you are happy with the changes you have made to your data, you can commit them to the repository with a new message.
oxen add people.csv
oxen commit -m "Adding Joe to the dataset"
View History
To see the commit history of your repository, you can use the oxen log
command.
oxen log
Checkout Main Branch
Once you are done making changes to your data, you can return to the main branch with the oxen checkout
command.
Never fear, the file now has now been reverted to the inital commit again, but your changes will be saved in the branch you created.
oxen checkout main
List Branches
To see the branches in your repository, you can use the oxen branch
command.
oxen branch
Push Data
Once your data has been committed locally, you can sync it to the OxenHub.
OxenHub is a free service that allows you to collaborate on your data in the cloud. You can create a free account at https://oxen.ai.
# Go create repo at https://oxen.ai
# ...
oxen config --set-remote origin https://hub.oxen.ai/<namespace>/<repo_name>
oxen config --auth hub.oxen.ai <your_auth_token>
oxen push origin main
# to push your other branch simply change the branch name from `main` to `modify-data`
Clone Data
Clone your data faster than ever before. Oxen has been optimized to the core to make pulling large datasets as fast as possible.
oxen clone https://hub.oxen.ai/ox/CatDogBBox
Pull Changes
Only pull the changes you need. Oxen will only pull the files that have changed since the last time you pulled.
oxen pull origin main
Download Individual Files
With Oxen you do not need to download the entire dataset to your local machine. You can download only the subset of files or directories you need.
oxen clone http://hub.oxen.ai/ox/CatDogBBox --shallow
cd CatDogBBox
oxen remote download annotations/test.csv