Oxen.AIโ€™s interface mirrors git, but shines in many areas that git or git-lfs fall short. Oxen is built from the ground up for data, and is optimized to handle large datasets, and large files.

Oxen.AIโ€™s comprised of a command line interface, as well as bindings for Rust ๐Ÿฆ€, Python ๐Ÿ, and HTTP interfaces ๐ŸŒŽ to make it easy to integrate into your workflow.

โœ… Features

Oxen is built around ergonomics, ease of use, and it is easy to learn. If you know how to use git, you know how to use Oxen.

Oxen Hub Features

  • ๐Ÿš€ Model Inference: No code model inference.

  • ๐Ÿท๏ธ Labeling Images: Edit any of your datasets straight from our UI.

  • ๐Ÿ“ Text2SQL: Instant Text to SQL generation to ask your data questions.

  • ๐Ÿ” Embeddings Search: Instant Text to SQL generation to ask your data questions.

Oxen Open Source Features

  • ๐Ÿ”ฅ Fast: Efficient indexing and syncing of any dataset size (millions of images? no problem)

  • ๐ŸŒŽ Workspaces: Interact with your data without downloading it

  • ๐Ÿง  Intuitive: Same commands as git

  • ๐Ÿ’ช Handles large, unstructured files: images, videos, audio, text, parquet, arrow, json, models, etc

  • ๐Ÿ“Š Native DataFrame processing: index, compare and serve up DataFrames

  • ๐Ÿ“ˆ Versioning: Never worry about losing the state of your data

  • ๐Ÿค Distributed Collaboration: sync to an oxen-server

๐ŸŒพ What kind of data?

Oxen.ai is designed to efficiently manage large datasets, including those with large individual files, for example CSV files with millions of rows. It also handles datasets comprising millions of individual files and directories such as the complete collection of ImageNet images.

The backend is agnostic to data type, so feel free to add any binary blobs. We automatically detect certain data types on upload so that we can render them within the UI. Specifically filetypes such as csv, tsv, jsonl, parquet, arrow turn into beautiful data tables. Images, audio, and video files will also play natively.

๐Ÿš€ Built for speed

One of the main reasons datasets are hard to maintain is the pure performance of indexing the data and transferring the data over the network. We wanted to be able to index hundreds of thousands of images, videos, audio files, and text files in seconds.

Watch below as we version hundreds of thousands of images in seconds ๐Ÿ”ฅ

But speed is only the beginning. Think of Oxen.ai as a set of building blocks to build your dream workflow on top of.

โš’๏ธ Installation

โฌ‡๏ธ Cloning Datasets

The fastest way to get up and running with oxen is by cloning a dataset. Explore the many public datasets we have today on the OxenHub.

โฌ†๏ธ Pushing Datasets

Create and share your own repository to share your datasets with your team or the world by pushing them to OxenHub.

๐Ÿ“š Learn The Basics

There are many ways to use Oxen. You can use the command line interface, the python library, or the OxenHub web interface. Learn the basics of each below.

๐Ÿ•ต๏ธ Explore Use Cases

See examples repositories for inspiration.

๐ŸŒพ Why Build Oxen?

Oxen was build by a team of machine learning engineers, who have spent countless hours in their careers managing datasets. We have used many different tools, but none of them were as easy to use and as ergonomic as we would like.

If you have ever tried git lfs to version large datasets and became frustrated, we feel your pain. Solutions like git-lfs are too slow when it comes to the scale of data we need for machine learning.

If you have ever uploaded a large dataset of images, audio, video, or text to a cloud storage bucket with the name:

s3://data/images_july_2022_final_2_no_really_final.tar.gz

We built Oxen to be the tool we wish we had.

๐Ÿค– Built for AI

If you are building an AI application, data is the lifeblood. Data is constantly changing over time, and data differentiates your model from the competition.

Whether you are building your own model from scratch, fine-tuning a pre-trained model, or using a model as a service, you will need to manage and compare the inputs and outputs over time to ensure your model is improving.

We version our code, why not our data?

Versioning your data means you can experiment on models in parallel with different data. The more experiments you run, the smarter your model becomes, and more robust models lead to better products.

๐Ÿ‚ Why the name Oxen?

โ€œOxenโ€ comes from the fact that we will plow, maintain, and version your data like a good farmer tends to their fields ๐ŸŒพ. During the agricultural revolution, the plow and offloading work to Oxen helped people specialize and start working on other important societal tasks. Let Oxen take care of the grunt work of your infrastructure so you can focus on the higher-level ML problems that matter to your product.