Skip to main content

Metadata Filtering

Shaped provides a way to filter the items returned by the Rank API based on their metadata (i.e. the columns found in the items fetch queries). You can do this both for a personalized ranking query or a non-personalized one. There's two primary use-cases that this is useful for:

  1. Personalized search, e.g. filtering out items based on a user defined keyword matching or metadata specific query.
  2. Category pages, e.g. filtering out items for a specific recommendation UI element (e.g. carousel or feed). This means that you can create one Shaped model and use it for a variety of different carousels based on prior domain knowledge you know will resonate with your customers.

In this guide we'll show you how to use the metadata filtering feature to power some of these use-cases.

Metadata Filtering in the Rank API

Our Rank API endpoints (rank and similar) allow an optional retrieval argument to be provided that defines the metadata filter predicate for your request. The metadata filter predicate language is a standard SQL expression, e.g. "category = 'sports'" or "publish_year >= 2023". Here are the currently supported operators:

* >, >=, <, <=, =
* AND, OR, NOT
* IS NULL, IS NOT NULL
* IS TRUE, IS NOT TRUE, IS FALSE, IS NOT FALSE
* IN
* LIKE, NOT LIKE
* regexp_match(column, pattern)
* CAST
* array_has(sequential_column, value)
* array_has_any(sequential_column, values)
* array_has_all(sequential_column, values)

For example, the following predicate string is acceptable:

((label IN [10, 20]) AND (note.email IS NOT NULL))
OR NOT note.created

Filtering by category:

shaped rank ---model-name personalized_video_search -user_id 3 --limit 5 \
--filter_predicate 'category IN ["sports", "news"]'

Filtering by set membership of sequence category column:

shaped rank --model-name personalized_video_search --user_id 3 --limit 5 \
--filter_predicate 'array_has_any(category_sequence, ["sports", "news"])'

Filtering by year:

shaped rank --model-name personalized_video_search --user_id 3 --limit 5 \
--filter_predicate 'publish_year >= 2023'

Note that if the user_id is provided the filtered results are personalized to that user. If the user_id is not provided the filtered results return trending non-personalized results by default.

Under the hood

Internally we use the Lance data format to support the metadata filter predicates, take a look at their the docs here for more information.

Conclusion

Metadata filtering is a powerful feature that can be used for many discovery and search use-cases. We hope you find it useful!