Adding User, Item & Event Features
Now that you've created an initial model using your behavioural event data, we can start enriching it with user, item and event features. These features are particularly impactful for recommendation performance when you have cold-start users or items, i.e. users or items without many interaction events. It'll also help when your interactions have biases or if you just don't have many interactions. Adding user and item features is also one of our ways of ingesting users and items that aren't found in your interaction table (e.g. if they're new and don't have any interactions).
User and Item attribute model enrichment
To demonstrate how you can add the user and item features, let's continue from the video recommendation example we showed in the 'Your first model' guide. As well as the events table that you previously mapped, you may also have a user and item tables with the following columns:
User
- user_id: the unique user identifier (matching the user within your interaction events)
- created_at: the timestamp the user signed up.
- gender: the user's gender.
- age: the user's numerical age.
Item
- item_id: the unique video identifier (matching the video within your interaction events).
- created_at: the timestamp the video was published.
- description: the video's description
- hashtags: the video's categorical tags
To ingest these attributes with Shaped you just need to add the following user and item queries to your original model creation schema. The users query requires a user_id column, which should be the primary key for the user_id column within the interaction events. The items query requires an item_id column, which should be the primary key for the item-id column with the interaction events. All other columns are treated as features.
model:
name: interaction_and_feature_video_recommendations
connectors:
- type: BigQuery
name: bigquery_connector
location: us-west1
project_id: rocket-ship-234123
dataset: video_db
fetch:
events: |
SELECT user_id, item_id, created_at, (CASE WHEN event = 'click' THEN 1 ELSE 0 END) as label
FROM bigquery_connector.click_events
users: |
SELECT user_id, created_at, gender, age
FROM bigquery_connector.users
items: |
SELECT item_id, created_at, description, hashtags
FROM bigquery_connector.videos
$ shaped create-model --file interaction_and_feature_video_recommendations.yaml
Once Shaped ingests your user and item attributes, it'll derive and
encode the features, and feed them to your downstream recommendation and retrieval
models. Shaped will also use this item list to determine the cold-start item pool,
allowing your rankings to contain new items without interactions proportionally with the
chosen exploration_factor
How is Shaped understanding my features?
Deriving types
Under-the-hood we derive the types of given feature columns by looking at samples of your fetched data. For example, following types would be derived for the create model request above:
Column | Derived Type |
---|---|
users.gender | Categorical |
users.age | Numerical |
users.created_at | Timestamp |
items.hashtags | Set[Categorical] |
items.description | Text |
items.created_at | Timestamp |
You can view Shaped's derived query types using the View Model API.
Feature encoding
Once the feature types have been derived we evaluate different feature encoders to
generate the inputs for your downstream ranking and retrieval models. For the
structured data types, e.g. Categorical
, Numerical
, Timestamp
, Set[...]
we use typical machine-learning encoding methods. For the unstructured data types we
have a set of pre-trained language, vision and audio encoders that we fine-tune on your
dataset to distill the best encoding for your use-case.
What about event features?
Although we didn't show how to add event context features into your model, it works
in the same way. That is, any non-required columns found in your query (e.g. columns
without the names: user_id
, item_id
, label
and created_at
) are treated
as features and they're derived and encoded in the same way.
Next steps
You've just learned how to incorporate user, item and event attributes into your model. This will improve your models performance particularly when you don't have many interactions or the interactions are of low quality. In the next guide we'll show you how to continue improving your model by adding custom filters that define your business logic.