Versioning 101
The first thing you need to know about Oxen.ai is that it has both remote and local workflows. Remote workflows allow you to add files directly to the remote without pulling any data locally. Say we wanted to add a file to a dataset like ImageNet with 1 Million Files, you do not want to wait to clone all the files locally just to add yours.Client and Server
The open source version control tools come with a server to sync data to and a client that can interact with data locally and remotely. The client and server share a common core library that is written in Rust and is used to quickly sync data between the two. The server exposes a REST API that can be used to interact with data. Oxen.aiβs clients include a command line interface, as well as bindings for Rust π¦, Python π, and HTTP interfaces π to make it easy to integrate into your workflow.Installation
Oxen makes versioning your datasets as easy as versioning your code. You can install through homebrew or pip or from our releases page.Remote vs Local Workflow
In the world of version control, there are two main paradigms: centralized and decentralized. Centralized version control systems allow you to have remote first workflows where you do not need to have a fully copy of the data on your local machine. Decentralized version control systems like git by default duplicate all the data to every node in your network.
Remote Workflow
To get started with the remote workflow, you need to setup anoxen-server
. Oxen.ai provides both an open source server and a hosted solution that can be used to sync data between your local machine and the cloud. To try the hosted solution, you can create a free account at https://oxen.ai.
To learn how to setup the open source server, check out the server documentation.
Remote Repository
If a remote repository already exists, you simply have to pass in the namespace/name of the remote repository you want to connect to.Create a Remote Repository
If you do not already have a remote repository, you can create one directly from Pyhton. You may want to start with an empty remote repository and add your data later.README.md
file to the repository with an initial commit. If you want to create an empty repository without adding a README.md
you can pass empty=True
to the create
method.
Add Files
You can add files to the remote repository by passing the path to the file and the destination directory. This will upload the file to the remote repository and stage it for commit.Python
Commit Changes
You can commit changes to the remote repository by passing a message.Python
File Exploration
To see the files in the remote repository you can usels
.
ls
method.
Note: the directories are paginated so you will need to use the page_num
parameter to view the next page of results.
There are also total_pages
, page_number
, and total_entries
attributes that give you information about the pagination.
Downloading Data
You can download individual files and folders if you do not need the entire data repository for your job.Checkout a Branch
If you have a data on a separate branch that you want to view you can checkout a branch by passing the branch name to thecheckout
method.
Python
Create a New Branch
Thecheckout
method also allows you to create a new branch if the branch does not exist.
Python
View Branches
To see all the branches in the remote repository you can use thebranches
method.
Python
Workspaces
Under the hood, the way that we enable remote collaboration is through a concept called a workspace. A workspace can be thought of as a working copy of changes, that is stored on the remote server. Just like you canadd
files before committing locally, you can add
files to a workspace on the remote server before committing. This allows you to build up a set of changes remotely before committing them in bulk.
RemoteRepo.add
method is a shortcut for creating a workspace and adding files to it. It creates a ephemeral workspace and adds the files to it, and deletes the workspace after committing.
To learn more about workspaces, check out the workspaces documentation.
Connect Local to Remote
Remote repositories are identified by a remote URL. This is the URL that you can use to clone the repository.Python
Python
Python
Local Workflow
Local workflow looks a lot like git. The downside is that you have to duplicate all the data locally. The upside is that oxen is optimized to make local workflows fast.Clone Dataset
Clone your first Oxen repository from the OxenHub.Initialize User
Each change you make will be associated with a name and email. Set them before you get started so you know who changed what. The user data is saved by default in~/.config/oxen/user_config.toml
.
Create Repository
Initialize your first Oxen repository, and commit the first version of your data.Version Your Data
Once your data has been committed, you can always return to that version. Confidently overwrite the file, move the file, delete the file, it doesnβt matter. Oxen will always have a copy of the data at the time of the previous commit.Create Branch
It is good practice to create a new branch for changes you make to your data. This will allow you to easily compare the parallel versions of your data over time.Delete Branch
Once finished with a branch, you can delete it.Diff Changes
View the change you made with theoxen diff
command. This will show you the changes you made to your data since the last commit.
CLI
Restore Changes
If you are not happy with the changes you made to your data, you can restore them to the previous commit with theoxen restore
command.
Commit Changes
Once you are happy with the changes you have made to your data, you can commit them to the repository with a new message.View History
To see the commit history of your repository, you can use theoxen log
command.
Checkout Main Branch
Once you are done making changes to your data, you can return to the main branch with theoxen checkout
command.
Never fear, the file now has now been reverted to the inital commit again, but your changes will be saved in the branch you created.
List Branches
To see the branches in your repository, you can use theoxen branch
command.