Skip to main content

User, Item & Event Features

Now that you've created an initial model using your behavioural event data, we can start enriching it with user, item and event features. These features are particularly impactful for recommendation performance when you have cold-start users or items, i.e. users or items without many interaction events. It'll also help when your interactions have biases or if you just don't have many interactions. Adding user and item features is also one of our ways of ingesting users and items that aren't found in your interaction table (e.g. if they're new and don't have any interactions).

User and Item attribute model enrichment

To demonstrate how you can add the user and item features, let's continue from the video recommendation example we showed in the 'Your first model' guide. As well as the events table that you previously mapped, you may also have a user and item tables with the following columns:

User

  1. user_id: the unique user identifier (matching the user within your interaction events)
  2. created_at: the timestamp the user signed up.
  3. gender: the user's gender.
  4. age: the user's numerical age.

Item

  1. item_id: the unique video identifier (matching the video within your interaction events).
  2. created_at: the timestamp the video was published.
  3. description: the video's description
  4. hashtags: the video's categorical tags

To ingest these attributes with Shaped you just need to add the following user and item queries to your original model creation schema. The users query requires a user_id column, which should be the primary key for the user_id column within the interaction events. The items query requires an item_id column, which should be the primary key for the item-id column with the interaction events. All other columns are treated as features.

interaction_and_feature_video_recommendations.yaml
model:
name: interaction_and_feature_video_recommendations
description: My first model
connectors:
- type: Dataset
id: click_events
name: click_events
- type: Dataset
id: users
name: users
- type: Dataset
id: videos
name: videos
fetch:
events: |
SELECT user_id, item_id, created_at, (CASE WHEN event = 'click' THEN 1 ELSE 0 END) as label
FROM click_events
users: |
SELECT user_id, created_at, gender, age
FROM users
items: |
SELECT item_id, created_at, description, hashtags
FROM videos
$ shaped create-model --file interaction_and_feature_video_recommendations.yaml

Once Shaped ingests your user and item attributes, it'll derive and encode the features, and feed them to your downstream recommendation and retrieval models. Shaped will also use this item list to determine the cold-start item pool, allowing your rankings to contain new items without interactions proportionally with the chosen exploration_factor

How is Shaped understanding my features?

Deriving types

Under-the-hood we derive the types of given feature columns by looking at samples of your fetched data. For example, following types would be derived for the create model request above:

Column
Derived Type
users.genderCategorical
users.ageNumerical
users.created_atTimestamp
items.hashtagsSet[Categorical]
items.descriptionText
items.created_atTimestamp
tip

You can view Shaped's derived query types using the View Model API.

Feature encoding

Once the feature types have been derived we evaluate different feature encoders to generate the inputs for your downstream ranking and retrieval models. For the structured data types, e.g. Categorical, Numerical, Timestamp, Set[...] we use typical machine-learning encoding methods. For the unstructured data types we have a set of pre-trained language, vision and audio encoders that we fine-tune on your dataset to distill the best encoding for your use-case.

What about event features?

Although we didn't show how to add event context features into your model, it works in the same way. That is, any non-required columns found in your query (e.g. columns without the names: user_id, item_id, label and created_at) are treated as features and they're derived and encoded in the same way.

Next steps

You've just learned how to incorporate user, item and event attributes into your model. This will improve your models performance particularly when you don't have many interactions or the interactions are of low quality. In the next guide we'll show you how to continue improving your model by adding custom filters that define your business logic.