Performance
Oxen is fast. Simple as that.
Core Principle
Oxen was designed from the ground up to be fast. Whether you have many small files, a few large files, or a mix of both, Oxen intelligently hashes, packages, and syncs the data as fast as an Ox physically can.
Food 101 Dataset
The Food 101 dataset has 100k images in many different sub directories. Here is the Food 101 Dataset on Oxen.ai.
~ TLDR ~
- ✅ Oxen syncs all the images in about 3 minutes
- 🦥 DVC backed by S3 took 16 minutes
- 🦥 git+git lfs syncing GitHub took over an hour
🐂 Oxen
oxen add images # 12.90 secs
oxen commit -m "adding images # 34.77 secs
oxen push origin main # 150.22 secs
Total time or ~3 min
to sync to Oxen.
Git + Git LFS
Compare this to a system like git lfs on the same dataset.
Git-LFS is also many more commands to keep track of in your head and easy to mess up.
git init
git lfs install
git lfs track "*.jpg"
git add .gitattributes
git add images # 132.82 secs
git commit -m "adding images"
git push origin main # 79.96 min
Total time pushing to hugging face: 82+ min
DVC + S3 Backend
DVC is built on top of git + an open source project and can be synced to S3 for storage.
You have to keep track of which commands are dvc and which are git, and the commands are not as intuitive as Oxen. It is easy to track the wrong things in your git repo.
git init
dvc init
dvc add images/ # Executed in 249.16 secs
git add images.dvc .gitignore
git commit -m "adding images"
git remote add origin https://github.com/owner/repository.git
dvc remote add --default datastore s3://my-bucket
git push origin main
dvc push # Executed in 719.79 secs
Total: 968.95 = 16 min
aws s3 cp
NOTE: This test was on CelebA dataset with 200k images, so not apples to apples with the ones above. We did the same test in oxen and it took ~6 minutes.
You may currently be storing your training data in AWS s3 buckets. Even is slower than syncing to Oxen. Not to mention it lacks other features you gain with Oxen.
The AWS S3 tool syncs each image sequentially and takes about 38 minutes to complete. Oxen optimizes the file transfer, compresses the data, and has a 5-10x performance improvement depending on your network and compute.
time aws s3 cp images/ s3://testing-celeba --recursive
________________________________________________________
Executed in 38.87 mins