The DataFrame class allows you to perform CRUD operations on a remote data frame.
If you pass in a Workspace or a RemoteRepo the data is indexed into DuckDB on an oxen-server without downloading the data locally.
Index a data frame in a workspace.
Initialize the DataFrame class. Will index the data frame into duckdb on init.
Will throw an error if the data frame does not exist.
Arguments:
remote
- str
, RemoteRepo
, or Workspace
The workspace or remote repo the data frame is in.path
- str
The path of the data frame file in the repository.host
- str
The host of the oxen-server. Defaults to โhub.oxen.aiโ.branch
- Optional[str]
The branch of the remote repo. Defaults to None.scheme
- str
The scheme of the remote repo. Defaults to โhttpsโ.Get the url of the data frame.
Get the size of the data frame. Returns a tuple of (rows, columns)
Get the page size of the data frame for pagination in list() command.
Returns:
The page size of the data frame.
Get the total number of pages in the data frame for pagination in list() command.
Returns:
The total number of pages in the data frame.
List the rows within the data frame.
Arguments:
page_num
- int
The page number of the data frame to list. We default to page size of 100 for now.Returns:
A list of rows from the data frame.
Insert a single row of data into the data frame.
Arguments:
data
- dict
A dictionary representing a single row of data.
The keys must match a subset of the columns in the data frame.
If a column is not present in the dictionary,
it will be set to an empty value.Returns:
The id of the row that was inserted.
Generate the SQL from the attributes.
Generate the SQL from the attributes.
Get the embedding from the data frame.
Check if the embeddings column is indexed in the data frame.
Index the embeddings in the data frame.
Sort the data frame by the embedding.
Get the nearest neighbors to the embedding.
Get a single row of data by attributes.
Get a single row of data by index.
Arguments:
idx
- int
The index of the row to get.Returns:
A dictionary representing the row.
Get a single row of data by id.
Arguments:
id
- str
The id of the row to get.Returns:
A dictionary representing the row.
Update a single row of data by id.
Arguments:
id
- str
The id of the row to update.data
- dict
A dictionary representing a single row of data.
The keys must match a subset of the columns in the data frame.
If a column is not present in the dictionary,
it will be set to an empty value.Returns:
The updated row as a dictionary.
Delete a single row of data by id.
Arguments:
id
- str
The id of the row to delete.Unstage any changes to the schema or contents of a data frame
Commit the current changes to the data frame.
Arguments:
message
- str
The message to commit the changes.branch
- str
The branch to commit the changes to. Defaults to the current branch.