Repository Classes

There are a few basic python classes you can use to interact with Oxen repositories. The full list of Python APIs can be found in the API Documentation.

Remote vs Local

Oxen has the concept of Remote Repositories and Local Repositories. One of the core tenets of Oxen is that data should feel like it is local, even if it is not. Hence the APIs for Local vs Remote are very similar, the only difference is where you are performing the operation.

Remote Repositories

Remote Repositories only download pointers and metadata, so that you can interact with the data as if it was local.

Here is the full documentation for the RemoteRepo.

Integrate with Pandas

The fastest way to integrate Oxen into your existing workflow is to use the fact that Oxen gives you direct access to files and directories given a specific revision.

For example, let’s load a data file given a specific commit into Pandas

Python
from oxen import RemoteRepo
import pandas as pd

# Connect to the remote repo
repo = RemoteRepo("ox/CatDogBBox")
# Specify the version of the file you want to download
branch = repo.get_branch("my-pets")
# Download takes a file or directory a commit id
repo.download("annotations", revision=branch.commit_id)
# Once you have the data locally, use whatever library you want to explore the data
df = pd.read_csv("annotations/train.csv")
print(df.head())

All the files are also accessible directly over http, which removes some of the boilerplate as long as the files are of a reasonable size.

The url structure is https://hub.oxen.ai/api/repos/:namespace/:repo_name/file/:revision/:file_path

Python
import pandas as pd

df = pd.read_csv("https://hub.oxen.ai/api/repos/ox/CatDogBBox/file/main/annotations/test.csv")
print(df.head())

Add Files

Oxen has the concept of Remote Workspaces that make it easy to add data to a remote repository without ever downloading it locally.

Python
from oxen import RemoteRepo

# Connect to the Remote
repo = RemoteRepo("ox/CatDogBBox")
# Create a branch on the remote and check it out
# similar to oxen checkout -b add-images
repo.create_checkout_branch("add-images")

Local Repositories

(Local) Repos have all the files versioned and accessible on your local machine. They duplicate the data between your working directory and a hidden .oxen directory so that you can quickly swap between versions and run experiments.

If you are creating a new repository from scratch, this is a great place to start. The workflow is very similar to git in terms of initializing a repository, adding data, committing, and pushing to a remote.

Let’s walk through some basic operations.

Init

Assuming you are creating a brand new repository, first you will have to create an empty directory, point your LocalRepo to it and run init().

import os
from oxen import Repo

# Create an empty directory named CatsAndDogs
directory = "CatsAndDogs"
os.makedirs(directory)

# Initialize the Oxen Repository
repo = Repo(directory)
repo.init()

Add Files

Now let’s create a README.md file and add it to the local staging area.

import os
from oxen import Repo

# write a file called README.md to disk
directory = "CatsAndDogs"
file_name = "README.md"
file_path = os.path.join(directory, file_name)

# Open the file in write mode
with open(file_path, "w") as file:
    # Write the title to the file
    file.write("# " + directory + "\n")

# Assuming the Repo is already initialized
repo = Repo(directory)
# add the path relative to the dir
repo.add(file_name)
# list added files
status = repo.status()
print(status.added_files())

You should see that we have one file added [README.md]

Commit Staged Files

With your README.md staged you can now commit with a message

from oxen import Repo

# Assuming you have already added the data
repo = Repo(directory)
repo.commit("Adding README.md")

🎉 Congratulations you have just versioned your first file! Now to sync it with the rest of your team.

Configure Remote

The easiest way to create a remote is in the Oxen Hub web interface.

Then once you have a remote created, set the remote on the repo object.

from oxen import Repo

# Once you have data committed that you want to sync
repo = Repo(directory)
# You can have multiple named remotes
username = "YOUR_USERNAME"
repo_name = "REMOTE_REPO_NAME"
remote_name = "origin"
remote_url = f"https://hub.oxen.ai/{username}/{repo_name}"
repo.set_remote(remote_name, remote_url)

Push to Remote

With your remote set and auth key configured, you are ready to push the data!

from oxen import Repo

# Once you have committed data and set the remote, it's time to push your branch
repo = Repo(directory)
remote_name = "origin"
remote_branch = "main"
repo.push(remote_name, remote_branch)