๐ท๏ธ Build a Custom Labeling Tool
Rate examples from your dataset and write them back to a data frame before committing.
When building AI applications, looking at data and labeling successes and failures is an important part of the development process. This is an example of a labeling workflow using the Oxen Python API to fetch rows from a data frame one by one, then writing the results back to the same data frame. We will be labeling the text of an SMS message as โspamโ or โhamโ depending on the content. The interface is built with native Marimo UI components.
Feel free to download the code from this Notebook and run it in your own repository to follow along. The final result will look like this:
Fetching the Rows
The RemoteRepo
class along with the DataFrame
class make it easy to fetch and write data from a data frame in a repository. Specify the namespace, repository name, and path to the data frame in order to fetch data.
In order to write data back to the data frame, we need to specify a workspace_name
when instantiating the DataFrame
class. This is because the data frame will be written back to a temporary workspace before being committed. This allows you to see the changes in the UI before writing them to the commit history.
In order to fetch the rows, we can use the get_row
method. This will return a Row
object at the index specified.
To know the number of rows in the data frame, we can use the size()
function to determine the width and height of the data frame.
Iterating through the data frame
Letโs add some helper functions to increment and decrement the index, and get the row at the current index.
Updating the Rows
The label_picker
will call the update_category
function when the user selects a new label. This function will update the category in the data frame.
Setting up the UI
We will keep track of which row is being labeled using the mo.state
reactive state variable. This sets up a getter and setter for the state variable.
Then we can use a radio button for the categories and a few buttons to move between rows.
Viewing changes
The changes will be written back to a temporary workspace. We can view the changes by clicking the โView Changesโ link in the UI. This is populated with the workspace_url
method on the DataFrame
class.
Under the hood, we have indexed the data frame into a temporary read/write workspace. You can see the changes in the Oxen Diff UI and confirm them before committing.
Once you are happy with the changes, you can commit the changes to the data frame and they will be written back to the original data frame and added to the commit history.
Full Code
The full code is available here.