Oxen can be used to compare data frames and return a tabular diff.
There is more information about the diff in the Diff Getting Started Documentation.
For example comparing two data frames will give you an output data frame,
where the .oxen.diff.status
column shows if the row was added
, removed
,
or modified
.
Compares data from two paths and returns a diff respecting the type of data.
Arguments:
path
- os.PathLike
The path to diff. If to
is not provided,
this will compare the data frame to the previous commit.to
- os.PathLike
An optional second path to compare to.
If provided this will be the right side of the diff.repo_dir
- os.PathLike
The path to the oxen repository. Must be provided if compare_to
is
not provided, or if revision_left
or revision_right
is provided.
If not provided, the repository will be searched for in the current
working directory.revision_left
- str
The left revision to compare. Can be a commit hash or branch name.revision_right
- str
The right revision to compare. Can be a commit hash or branch name.output
- os.PathLike
The path to save the diff to. If not provided, the diff will not be saved.keys
- list[str]
Only for tabular diffs. The keys to compare on.
This is used to join the two data frames.
Keys will be combined and hashed to create a identifier for each row.compares
- list[str]
Only for tabular diffs. The compares to compare on.
This is used to compare the values of the two data frames.Diff class wraps many types of diffs and provides a consistent interface. For example the diff can be tabular or text. Eventually we will extend this to support other types of diffs such as images, audio, etc.
Returns the format of the diff. Ie. tabular, text, etc.
Returns the tabular diff if the diff is tabular.
Returns the text diff if the diff is text.
Resolves the diff type and returns the appropriate diff object.