Metadata Filtering
Shaped provides a way to filter the items returned by the Rank API based on their metadata (i.e. the columns found in the items fetch queries). You can do this both for a personalized ranking query or a non-personalized one. There's two primary use-cases that this is useful for:
- Personalized search, e.g. filtering out items based on a user defined keyword matching or metadata specific query.
- Category pages, e.g. filtering out items for a specific recommendation UI element (e.g. carousel or feed). This means that you can create one Shaped model and use it for a variety of different carousels based on prior domain knowledge you know will resonate with your customers.
In this guide we'll show you how to use the metadata filtering feature to power some of these use-cases.
Metadata Filtering in the Rank API
Our Rank API endpoints (rank and similar) allow an optional
retrieval argument to be provided that defines the metadata filter predicate for your
request. The metadata filter predicate language is a standard SQL expression, e.g.
"category = 'sports'"
or "publish_year >= 2023"
. Here are the currently supported
operators:
* >, >=, <, <=, =
* AND, OR, NOT
* IS NULL, IS NOT NULL
* IS TRUE, IS NOT TRUE, IS FALSE, IS NOT FALSE
* IN
* LIKE, NOT LIKE
* regexp_match(column, pattern)
* CAST
* array_has(sequential_column, value)
* array_has_any(sequential_column, values)
* array_has_all(sequential_column, values)
For example, the following predicate string is acceptable:
((label IN [10, 20]) AND (note.email IS NOT NULL))
OR NOT note.created
Filtering by category:
shaped rank ---model-name personalized_video_search -user_id 3 --limit 5 \
--filter_predicate 'category IN ["sports", "news"]'
Filtering by set membership of sequence category column:
shaped rank --model-name personalized_video_search --user_id 3 --limit 5 \
--filter_predicate 'array_has_any(category_sequence, ["sports", "news"])'
Filtering by year:
shaped rank --model-name personalized_video_search --user_id 3 --limit 5 \
--filter_predicate 'publish_year >= 2023'
Note that if the user_id
is provided the filtered results are personalized to that user.
If the user_id
is not provided the filtered results return trending non-personalized
results by default.
Under the hood
Internally we use the Lance data format to support the metadata filter predicates, take a look at their the docs here for more information.
Conclusion
Metadata filtering is a powerful feature that can be used for many discovery and search use-cases. We hope you find it useful!