Skip to main content

Overview

Shaped provides as a declarative interface for defining your recommendation & search models. Within the definition you define:

  1. The data you want to use to train your ranking model
  2. Your objective function e.g. clicks, add-to-cart, etc.
  3. Any customization you want to make to the ranking model. For example, you can specify how much exploration vs. exploitation you want to do, or how much diversity vs. relevance you want to weight. What filters you want to apply. And, what retrievers and models you want to use.

A Simple Example

Here's an example of what a model definition file looks like:

model_config_template.yaml
model:
name: for_you_feed_v1
description: "Recommend items to show in the for you feed"
pagination_store_ttl: 0 # Seconds.
train_schedule: "@once"
inference_config:
exploration_factor: 0,
diversity_factor: 0,
boosting_factor: 0,
diversity_attributes:
- category_field
retrieval_k: 0
retriever_k_override:
knn: 0,
chronological: 0,
toplist: 0,
trending: 0,
random: 0,
cold_start: 0
limit: 0
connectors:
- id: connector1
name: connector1
type: Dataset
fetch:
events: | ...
users: | ..
items: | ...
global_filters: | ...
personal_filters: | ...
Shaped uses DuckDB SQL

syntax in the fetch section of the model config. Check out the DuckDB SQL Docs.

Under-the-hood Shaped performs SQL syntax tree parsing to ensure these queries are performant on the underlying source. If you do end up running into performance issues with our DuckDB transform layer, some connectors also provide native queries that can be used to limit the data that's fetched.

Model Config [SDK]

FieldDescription
nameAssigns a name to your model. It's common to describe the use case and append a version to help with development.
descriptionDescribes your model.
pagination_store_ttlShaped handles the pagination of results for you by adding all served item ids to a 'pagination store' and then filtering out these items from the candidate item set of subsequent requests. The pagination_store_ttl configures the seconds we keep items in the pagination_store for. Speak to your Solution Engineer to find the optimal value for your use case.
train_scheduleShaped enables continuous retrains of your model on a recurring cadence. By default, all models are trained once. For production use cases, we recommend a daily schedule to ensure your model is always up to date. Speak to your Solution Engineer if you need a more frequent train schedule than daily.
inference_configCustomizations you can make specific to how the backend ranks your results at inference time. More explanation can be found under Inference Config.

Inference Config

FieldDescription
exploration_factorRate of exploration vs. exploitation (between 0.0 and 1.0), used in our bandit algorithms.
diversity_factorHow much to weight diversity vs. relevance (between 0.0 and 1.0). This is used in the Maximal Marginal Relevance reorderer.
boosting_factorProbability of a boosted item at each item position (between 0.0 and 1.0), where a "boosted item" is defined more in depth here.
diversity_attributesString list of attributes to consider for diversity reordering. If left empty, defaults to allowing all attributes to be used for diversity reordering.
retrieval_kNumber of items retrieved by the model.
retriever_k_overrideConfig specifying the number of items to retrieve split by the type of retriever (KNN, chronological, toplist, trending, random, cold start).
limitThe number of items to return in the response.

Connectors

Add the datasets created in the previous step to the connectors section of your model config.

connectors:
- id: likes
name: likes
type: Dataset
- id: impressions
name: impressions
type: Dataset
- id: reported_posts
name: reported_posts
type: Dataset
- id: users
name: users
type: Dataset
- id: user_favourite_categories
name: user_favourite_categories
type: Dataset
- id: items
name: items
type: Dataset
- id: items_categories
name: items_categories
type: Dataset

Fetch Feature Data

Events

Events must include user_id, item_id, created_at, event_value, and label.

  • event_value: Describes the event and is useful for analysis.
  • label: Numerical values where anything greater than 0 is positive, and 0 or less is negative. This helps in weighting the events.

For simplicity, you might start with a binary label system.

events: |
SELECT
user_id,
item_id,
created_at,
1 as label,
'like' as event_value
FROM likes

UNION ALL

SELECT
user_id,
item_id,
created_at,
0 as label,
'impression' as event_value
FROM impressions

UNION ALL

SELECT
user_id,
item_id,
created_at,
0 as label,
'reported_post' as event_value
FROM reported_posts

Users and Items

In the users and items sections, include all the users and items on your platform along with attributes that will help the model understand draw connections between them.

users: |
SELECT
user_id,
created_at,
country,
occupation,
gender
FROM users
items: |
SELECT
i.item_id,
i.created_at,
i.price,
i.deleted,
i.public,
ARRAY_AGG(DISTINCT ic.category) AS categories
FROM items i
LEFT JOIN items_categories ic on ic.item_id = i.item_id
WHERE
i.deleted = false
AND i.public = true

Global Filters (Optional)

The global filters define the items that should be filtered out for all users. For example, you may want your rankings to guarantee that all of the following items are filtered out:

  • Products that are out of stock
  • Certain categories
  • Older items

The global filter requires only an item_id column. Say you wanted to exclude a certain category for all users:

global_filters: |
SELECT
item_id
FROM items_categories
WHERE
category = 'category_to_exclude'

Personal Filters (Optional)

The personal filters define items that should be filtered out for a specific user. For example, you may want your model to:

  • Filter out all videos a user has watched before
  • Remove posts from blocked and muted users
  • Remove items which are not available in the user's country

The personal filter must output a view containing user_id and item_id pairs to be filtered out. Specifically, if there is a user_id, item_id row, then that user_id will never be shown that item_id.

If you wanted filter any items a user has seen before:

personal_filters: |
SELECT
user_id,
item_id
FROM impressions

Creating Your Model With The Shaped CLI

Putting it all together, we get the following complete model config and can now use the Shaped CLI to create your model.

model_config.yaml
model:
name: for_you_feed_v1
description: "Recommend items to show in the for you feed"
pagination_store_ttl: 0 # Seconds.
connectors:
- id: likes
name: likes
type: Dataset
- id: impressions
name: impressions
type: Dataset
- id: reported_posts
name: reported_posts
type: Dataset
- id: users
name: users
type: Dataset
- id: user_favourite_categories
name: user_favourite_categories
type: Dataset
- id: items
name: items
type: Dataset
- id: items_categories
name: items_categories
type: Dataset
fetch:
events: |
SELECT
user_id,
item_id,
created_at,
1 as label,
'like' as event_value
FROM likes

UNION ALL

SELECT
user_id,
item_id,
created_at,
0 as label,
'impression' as event_value
FROM impressions

UNION ALL

SELECT
user_id,
item_id,
created_at,
0 as label,
'reported_post' as event_value
FROM reported_posts
users: |
SELECT
user_id,
created_at,
country,
occupation,
gender
FROM users
items: |
SELECT
i.item_id,
i.created_at,
i.price,
i.deleted,
i.public,
ARRAY_AGG(DISTINCT ic.category) AS categories
FROM items i
LEFT JOIN items_categories ic on ic.item_id = i.item_id
WHERE
i.deleted = false
AND i.public = true
global_filters: |
SELECT
item_id
FROM items_categories
WHERE
category = 'category_to_exclude'
personal_filters: |
SELECT
user_id,
item_id
FROM impressions

Use the Shaped CLI to create your model:

shaped create-model --file model_config.yaml

You'll be able to see your model build on the Shaped Dashboard.