Overview
To get the most from your data, modern recommendation and ranking methods need all the data they can get. As well as understanding traditional tabular data types such as categorical (enums) and numerical (scalars) variables, Shaped understands complex data types like image, audio, language and video. It does this by processing these data types into embeddings using pre-trained understanding models. These embeddings are then fed into the ranking models to improve its understanding of the input and the the performance of the final ranking.
For example, if you are building a social post recommendation model, the content of the post is crucial in understanding the relevance it has to a user. And these embedding models can understand that content. It is even more crucial when you lack interaction data for that post (e.g. say for a newly created post on the platform). Note, this is called the cold-start problem and will be discussed later.
Below are our the data types that Shaped supports:
🔠 Categoricals
Discrete category type data (e.g. categories, IDs, event types).
🔢 Numericals
Quantitative continuous typed data (e.g. price, ratings, counts).
🕒 Timestamps
Temporal data types such as timestamps and dates (e.g. created_at, last_signin).
🖼️ Images
Unstructured Image data types (e.g. product images, content thumbnails).
🔤 Language
Unstructured Text data types (e.g. bios, descriptions, reviews).
📍 Locations
Geospatial data types (e.g. store locations, user coordinates).
🔗 Cross Features
Derived features combining multiple inputs (e.g. user-item interactions, advanced embeddings).