Filtering Results
Shaped provides a way to filter the items returned by the Rank API based on their metadata (i.e. the columns found in the items fetch queries). You can do this both for a personalized ranking query or a non-personalized one. There's two primary use-cases that this is useful for:
- Personalized search: Filtering out items based on a user defined keyword matching or metadata specific query.
- Category pages: Filtering out items for a specific recommendation UI element (e.g. carousel or feed). This means that you can create one Shaped model and use it for a variety of different carousels based on prior domain knowledge you know will resonate with your customers.
In this guide we'll show you how to use the filter predicate feature to power some of these use-cases.
Supported Operations
The filter predicate language is a standard SQL expression, e.g. "category = 'sports'"
or "publish_year >= 2023"
. Here are the currently supported operators:
* >, >=, <, <=, =
* AND, OR, NOT
* IS NULL, IS NOT NULL
* IS TRUE, IS NOT TRUE, IS FALSE, IS NOT FALSE
* IN
* LIKE, NOT LIKE
* regexp_match(column, pattern)
* CAST
* array_has(sequential_column, value)
* array_has_any(sequential_column, values)
* array_has_all(sequential_column, values)
For example, the following predicate string is acceptable:
((label IN [10, 20]) AND (note.email IS NOT NULL))
OR NOT note.created
Filter Predicate Examples
Filtering by Category
shaped rank ---model-name personalized_video_search -user_id 3 --limit 5 \
--filter_predicate 'category IN ["sports", "news"]'
Filtering a Sequence Category Column
shaped rank --model-name personalized_video_search --user_id 3 --limit 5 \
--filter_predicate 'array_has_any(category_sequence, ["sports", "news"])'
Filtering By Year
shaped rank --model-name personalized_video_search --user_id 3 --limit 5 \
--filter_predicate 'publish_year >= 2023'
Note that if the user_id
is provided the filtered results are personalized to that
user. If the user_id
is not provided the filtered results return trending
non-personalized results by default.
Internally we use the Lance data format to support the filter predicate, take a look at their the docs here for more information.