Setup User

For your commit history, you will have to set up your local Oxen user name and email. This is what will show up in oxen log or in the OxenHub dashboard for who changed what.

oxen config --name "YOUR_NAME" --email "YOUR_EMAIL"

In order to push to a remote or clone private repos you will have to setup your API Key. You can obtain an API Key by creating an account on Oxen.ai and going to your profile.

oxen config --auth hub.oxen.ai $YOUR_API_KEY

Clone

There are a few ways that you can clone an Oxen repository, depending on the level of data transfer you want to incur. The default oxen clone with no flags will download the latest commit from the main branch.

oxen clone https://hub.oxen.ai/ox/CatDogBBox

To fetch the latest commit from a specific branch you can use the -b flag.

oxen clone https://hub.oxen.ai/ox/CatDogBBox -b my-pets

Shallow Clone

Downloading all the data may still be a more expensive operation than you need. You can download the minimal metadata to still interact with the remote by using the --shallow flag.

oxen clone https://hub.oxen.ai/ox/CatDogBBox --shallow -b my-pets

This is especially handy for appending data via the remote workspace. When downloading by using the --shallow flag you will notice no data files in your working directory. You can still see the data on the branch on the remote with the oxen remote subcommands.

oxen remote ls

You can also download any subset of the data by using oxen remote download. This is useful if you only need a specific set of files and directories for training or testing.

oxen remote download test.csv

Clone All

Lastly, if you want to clone the entire commit history locally, you can use the --all flag. This is handy if you want to pull a full history and push to a new remote, or have a workflow where you need to quickly swap between commits locally. Often for running experiments, training, or testing, all you need is a subset of the data.

oxen clone https://hub.oxen.ai/ox/CatDogBBox --all

Initialize Local Repository

If you do not have a remote dataset, you can initialize one locally.

Similar to git: create a new directory, navigate into it, and perform

oxen init

Stage Data

You can stage changes that you are interested in committing with the oxen add command and giving a full file path or directory.

oxen add images/

View Status

To see what data is tracked, staged, or not yet added to the repository you can use the status command.

Note: since we are dealing with large datasets with many files, status rolls up the changes and summarizes them for you.

oxen status
On branch main -> e76dd52a4fc13a6f

Directories to be committed
  added: images with added 8108 files

Files to be committed:
  new file: images/000000000042.jpg
  new file: images/000000000074.jpg
  new file: images/000000000109.jpg
  new file: images/000000000307.jpg
  new file: images/000000000309.jpg
  new file: images/000000000394.jpg
  new file: images/000000000400.jpg
  new file: images/000000000443.jpg
  new file: images/000000000490.jpg
  new file: images/000000000575.jpg
  ... and 8098 others

Untracked Directories
  (use "oxen add <dir>..." to update what will be committed)
  annotations/ (3 items)

You can always paginate through the changes with the -s (skip) and -l (limit) params on the status command. Run oxen status --help for more info.

Commit Changes

To commit the changes that are staged with a message you can use

oxen commit -m "Some informative commit message"

Log

You can see the history of changes on your current branch by running:

oxen log
commit 6b958e268656b0c5

Author: Ox
Date:   Fri, 21 Oct 2022 16:08:39 -0700

    adding 10,000 training images

commit e76dd52a4fc13a6f

Author: Ox
Date:   Fri, 21 Oct 2022 16:05:22 -0700

    Initialized Repo 🐂

Reverting To Commit

If ever you want to change your working directory to a point in your commit history, you can simply supply the commit id from your history to the checkout command.

oxen checkout COMMIT_ID

Restore Working Directory

The restore command comes in handy if you made some changes locally and you want to revert the changes. This can be used for example if you accidentally delete or modify or stage a file that you did not intend to.

oxen restore path/to/file.txt

Restore defaults to restoring the files to the current HEAD. For more detailed options, as well as how to unstage files refer to the restore documentation.

Removing Data

To stage a file to be removed from the next commit, use the oxen rm command. Removing data from a commit can be useful if you find errors or simply want to create a smaller subset of data on a separate branch for debugging or testing.

oxen rm path/to/file.txt

Note: the file must be committed in the history for this to work. If you want to remove a file that has not been committed yet, simple use your /bin/rm command.

To recursively remove a directory use the -r flag.

oxen rm -r path/to/dir

If you accidentally staged a file that you do not want to commit, you can also use oxen rm with the --staged flag to unstage the file or directory.

oxen rm --staged -r path/to/dir

Once data has been committed, a version of it always lives in the .oxen/versions directory. As of right now there is no way to completely remove it from the repository history, this functionality is in our backlog for sensitive data that was accidentally committed.

Advanced Features

Oxen has many more advanced features such as computing diffs between tabular data as well as convenient Data Frame manipulation through the oxen df command.