Performance
Oxen is fast. Simple as that.
Core Principle
Oxen was designed from the ground up to be fast. Whether you have many small files, a few large files, or a mix of both, Oxen intelligently hashes, packages, and syncs the data as fast as an Ox physically can.
Food 101 Dataset
The Food 101 dataset has 100k images in many different sub directories. Here is the Food 101 Dataset on Oxen.ai.
~ TLDR ~
- ✅ Oxen syncs all the images in about 3 minutes
- 🦥 DVC backed by S3 took 16 minutes
- 🦥 git+git lfs syncing GitHub took over an hour
🐂 Oxen
Total time or ~3 min
to sync to Oxen.
Git + Git LFS
Compare this to a system like git lfs on the same dataset.
Git-LFS is also many more commands to keep track of in your head and easy to mess up.
Total time pushing to hugging face: 82+ min
DVC + S3 Backend
DVC is built on top of git + an open source project and can be synced to S3 for storage.
You have to keep track of which commands are dvc and which are git, and the commands are not as intuitive as Oxen. It is easy to track the wrong things in your git repo.
Total: 968.95 = 16 min
aws s3 cp
NOTE: This test was on CelebA dataset with 200k images, so not apples to apples with the ones above. We did the same test in oxen and it took ~6 minutes.
You may currently be storing your training data in AWS s3 buckets. Even is slower than syncing to Oxen. Not to mention it lacks other features you gain with Oxen.
The AWS S3 tool syncs each image sequentially and takes about 38 minutes to complete. Oxen optimizes the file transfer, compresses the data, and has a 5-10x performance improvement depending on your network and compute.