Embeddings are a way to represent data as vectors in a way that can be used for machine learning tasks like search, clustering, classification, and more.
title
column, you will see all of the top results have something in common with wild animals. The forth result is Musk ox, which is another large bovine animal 🐂. In the similarity
column, you will see a score between 0 and 1 that indicates how similar the two vectors are.
WHERE
clause that we use to filter the rows that we want to find similar rows to.
WHERE
clause here, using an id or primary key is a good idea. If there are multiple rows that match the query, the embeddings will be averaged together.
title_embeddings
column the underlying SQL that is generated will look like this.
csv
, parquet
, or jsonl
file into DuckDB. If the file contains embeddings, you can specify the column name with the embeddings.
oxen df index
without the --embeddings
flag will just index the data into DuckDB so that you can query it with SQL, but will not enable nearest neighbor search. When you pass in the --embeddings
flag, oxen will automatically run the following SQL commands to enable nearest neighbor search.