Oxen’s interface mirrors git, but shines in many areas that git or git-lfs fall short. Oxen is built from the ground up for data, and is optimized to handle large datasets, and large files.
oxen init oxen add images/ oxen add annotations/*.parquet oxen commit "Adding 200k images and their corresponding annotations" oxen push origin main
🎥 Oxen 101
🌾 What kind of data?
Oxen is designed to efficiently manage large datasets, including those with large individual files, for example CSV files with millions of rows. It also handles datasets comprising millions of individual files and directories such as the complete collection of ImageNet images.
🚀 Built for speed
One of the main reasons datasets are hard to maintain is the pure performance of indexing the data and transferring the data over the network. We wanted to be able to index hundreds of thousands of images, videos, audio files, and text files in seconds.
Watch below as we version hundreds of thousands of images in seconds 🔥
But speed is only the beginning.
Oxen is built around ergonomics, ease of use, and it is easy to learn. If you know how to use git, you know how to use Oxen.
- 🔥 Fast (efficient indexing and syncing of data)
- 🧠 Easy to learn (same commands as git)
- 💪 Handles large files (images, videos, audio, text, parquet, arrow, json, models, etc)
- 🗄️ Index lots of files (millions of images? no problem)
- 📊 Native DataFrame processing (index, compare and serve up DataFrames)
- 📈 Tracks changes over time (never worry about losing the state of your data)
- 🤝 Collaborate with your team (sync to an oxen-server)
- 🌎 Remote Workspaces to interact with the data without downloading it
- 👀 Better data visualization on OxenHub
brew tap Oxen-AI/oxen brew install oxen
⬇️ Cloning Datasets
The fastest way to get up and running with oxen is by cloning a dataset. Explore the many public datasets we have today on the OxenHub.
oxen clone https://hub.oxen.ai/ox/CatDogBBox
⬇️ Pushing Datasets
Create and share your own datasets with your team or the world by pushing them to OxenHub.
oxen clone https://hub.oxen.ai/ox/CatDogBBox
📚 Learn The Basics
There are many ways to use Oxen. You can use the command line interface, the python library, or the OxenHub web interface. Learn the basics of each below.
Command Line Interface
Learn how to use the Oxen command line interface
Get started with the python library
Use the OxenHub web interface
Host Oxen in your own infrastructure
🕵️ Explore Use Cases
See examples repositories for inspiration.
Classify images, detect objects, semantic segmentation and more.
Natural Language Processing
Build chatbots, analyze sentiment, answer questions and more.
Classify audio, detect speakers, transcribe speech and more.
Generate images, text, music and more.
⭐️ Every GitHub Star Gives an Ox its Wings
We hooked up the GitHub webhook for stars to an OxenHub repository. Using a combination of Oxen’s python library, remote workspaces, and Stable Diffusion XL we generate a unique Ox for each user, and attempt to give them wings.
Go find your own in our ox/FlyingOxen repository.
🌾 Why Build Oxen?
Oxen was build by a team of machine learning engineers, who have spent countless hours in their careers managing datasets. We have used many different tools, but none of them were as easy to use and as ergonomic as we would like.
If you have ever tried git lfs to version large datasets and became frustrated, we feel your pain. Solutions like git-lfs are too slow when it comes to the scale of data we need for machine learning.
If you have ever uploaded a large dataset of images, audio, video, or text to a cloud storage bucket with the name:
We built Oxen to be the tool we wish we had.
🤖 Built for AI
If you are building an AI application, data is the lifeblood. Data is constantly changing over time, and data differentiates your model from the competition.
Whether you are building your own model from scratch, fine-tuning a pre-trained model, or using a model as a service, you will need to manage and compare the inputs and outputs over time to ensure your model is improving.
Versioning your data means you can experiment on models in parallel with different data. The more experiments you run, the smarter your model becomes, and more robust models lead to better products.
🐂 Why the name Oxen?
“Oxen” comes from the fact that we will plow, maintain, and version your data like a good farmer tends to their fields 🌾. During the agricultural revolution, the plow and offloading work to Oxen helped people specialize and start working on other important societal tasks. Let Oxen take care of the grunt work of your infrastructure so you can focus on the higher-level ML problems that matter to your product.