Oxenโ€™s interface mirrors git, but shines in many areas that git or git-lfs fall short. Oxen is built from the ground up for data, and is optimized to handle large datasets, and large files.

Oxen is comprised of a command line interface, as well as bindings for Rust ๐Ÿฆ€, Python ๐Ÿ, and HTTP interfaces ๐ŸŒŽ to make it easy to integrate into your workflow.

๐ŸŽฅ Oxen 101

To learn more about Oxen, you can watch our 101 video below, or work through the getting started guides in Python or CLI.

๐ŸŒพ What kind of data?

Oxen is designed to efficiently manage large datasets, including those with large individual files, for example CSV files with millions of rows. It also handles datasets comprising millions of individual files and directories such as the complete collection of ImageNet images.

๐Ÿš€ Built for speed

One of the main reasons datasets are hard to maintain is the pure performance of indexing the data and transferring the data over the network. We wanted to be able to index hundreds of thousands of images, videos, audio files, and text files in seconds.

Watch below as we version hundreds of thousands of images in seconds ๐Ÿ”ฅ

But speed is only the beginning.

โœ… Features

Oxen is built around ergonomics, ease of use, and it is easy to learn. If you know how to use git, you know how to use Oxen.

  • ๐Ÿ”ฅ Fast (efficient indexing and syncing of data)
  • ๐Ÿง  Easy to learn (same commands as git)
  • ๐Ÿ’ช Handles large files (images, videos, audio, text, parquet, arrow, json, models, etc)
  • ๐Ÿ—„๏ธ Index lots of files (millions of images? no problem)
  • ๐Ÿ“Š Native DataFrame processing (index, compare and serve up DataFrames)
  • ๐Ÿ“ˆ Tracks changes over time (never worry about losing the state of your data)
  • ๐Ÿค Collaborate with your team (sync to an oxen-server)
  • ๐ŸŒŽ Workspaces to interact with the data without downloading it
  • ๐Ÿ‘€ Better data visualization on OxenHub

โš’๏ธ Installation

โฌ‡๏ธ Cloning Datasets

The fastest way to get up and running with oxen is by cloning a dataset. Explore the many public datasets we have today on the OxenHub.

โฌ†๏ธ Pushing Datasets

Create and share your own repository to share your datasets with your team or the world by pushing them to OxenHub.

๐Ÿ“š Learn The Basics

There are many ways to use Oxen. You can use the command line interface, the python library, or the OxenHub web interface. Learn the basics of each below.

๐Ÿ•ต๏ธ Explore Use Cases

See examples repositories for inspiration.

โญ๏ธ Every GitHub Star Gives an Ox its Wings

No really.

We hooked up the GitHub webhook for stars to an OxenHub repository. Using a combination of Oxenโ€™s python library, remote workspaces, and Stable Diffusion XL we generate a unique Ox for each user, and attempt to give them wings.

Go find your own in our ox/FlyingOxen repository.

๐ŸŒพ Why Build Oxen?

Oxen was build by a team of machine learning engineers, who have spent countless hours in their careers managing datasets. We have used many different tools, but none of them were as easy to use and as ergonomic as we would like.

If you have ever tried git lfs to version large datasets and became frustrated, we feel your pain. Solutions like git-lfs are too slow when it comes to the scale of data we need for machine learning.

If you have ever uploaded a large dataset of images, audio, video, or text to a cloud storage bucket with the name:

s3://data/images_july_2022_final_2_no_really_final.tar.gz

We built Oxen to be the tool we wish we had.

๐Ÿค– Built for AI

If you are building an AI application, data is the lifeblood. Data is constantly changing over time, and data differentiates your model from the competition.

Whether you are building your own model from scratch, fine-tuning a pre-trained model, or using a model as a service, you will need to manage and compare the inputs and outputs over time to ensure your model is improving.

We version our code, why not our data?

Versioning your data means you can experiment on models in parallel with different data. The more experiments you run, the smarter your model becomes, and more robust models lead to better products.

๐Ÿ‚ Why the name Oxen?

โ€œOxenโ€ comes from the fact that we will plow, maintain, and version your data like a good farmer tends to their fields ๐ŸŒพ. During the agricultural revolution, the plow and offloading work to Oxen helped people specialize and start working on other important societal tasks. Let Oxen take care of the grunt work of your infrastructure so you can focus on the higher-level ML problems that matter to your product.