Skip to main content

Retrieval Configuration

If you recall, a 4 stage recommendation system is typically made up of the following steps:

  1. Retrieval: Retrieve an initial candidate set of relevant items.
  2. Filtering: Filters the retrieved items based on some criteria.
  3. Scoring: Scores the filtered items based on relevance scores to the input query.
  4. Ordering: Re-rank based on the scores and other business objectives.

In this guide we'll cover the first step, retrieval. We'll talk about how Shaped can be used to retrieve items based on a variety of different retrieval strategies and how to configure these strategies through our CLI and endpoints.

Retrieval Strategies

Shaped supports the following retrieval strategies:

  1. Personalized KNN: Use a k-nearest neighbor retrieval algorithm on the underlying user and item embeddings.
  2. Trending Retriever: Retrieve recently popular items, e.g. items with the most interactions in the last few days.
  3. Toplist Retriever: Retrieve the globally most popular items, e.g. items with the most interactions of all time.
  4. Cold Start Retriever: Retrieve items with few to no interactions.
  5. Chronological retriever: Retrieve the most recent items based on either the time they were created or their first interaction.
  6. Random Retriever: Retrieve a random set of items. Useful for testing.

How to configure these retrievers?

By default Shaped retrieves 300 items from the personalized knn, trending, toplist, chronological and cold-start retrievers. However, each of these retrieval strategies can be configured within a model definition or by using the config option of several of the rank endpoints. This can be useful if you want to add retrieval biases to the model, e.g. fetching more recent items or more popular items.

At Model Definition

Here's how you can configure a model's default retrievers at model creation. In this example, we'll configure a model to only use the chronological retriever:

$ cat create_model.yaml

name: your-model-name
knn: 0
chronological: 300
trending: 0
toplist: 0
random: 0
- type: Dataset
id: your-file-connector
name: your-file-connector
events: |
SELECT user_id, item_id, label, created_at
FROM your-file-connector

At Rank Time

Here's how you can configure a model's default retrievers at rank time. In this example we'll add in some candidate items from the trending retriever.

curl{model_name}/rank \
-H "x-api-key: <API_KEY>" \
-H "Content-Type: application/json"
-d '{
"user_id": "user_id",
"config": {
"inference_config": {
"knn": 0,
"chronological": 150,
"trending": 150,
"toplist": 0,
"random": 0