Choose a model policy

This guide provides an overview of models available for training in Shaped, helping you choose the right model for your use case.

Model types

You can train two types of models in Shaped:

Embedding models, which learn dense vector representations for users and items for search and similarity
Scoring models, which take rich feature sets for a (user, item, context) pair and output a relevance score to rank candidates

These models can be configured in the training block of your config, under models:

training: 
  models: 
    - name: my_elsa_score_model
      policy_type: elsa
      strategy: early_stopping

You don't need a model policy!

You can get a lot of functionality out of an engine with no trained embedding or scoring models.

Add an interaction table: Any engine with an interaction_table (which must have user_id, item_id, label, and created_at columns) will rank items by popularity, trending, and so on. The items table will have columns that expose the rank of these items compared to other items: Derived columns

Use huggingface embeddings for similarity search: If you create an embedding in the index config, you can use that to find items that are similar to a single item, or items that are complementary to a set of items: Complement items

Use value modeling: All engines support value modeling by default, so you can begin by scoring items with expressions. Here's an example that uses popular rank with a time decay: ORDER BY score(expression='(1000 - item._derived_popular_rank) / ((((now_seconds() - item.published_at) / 3600) + 2) ** 1.8)', input_user_id='$user_id', input_interactions_item_ids='$interaction_item_ids')

Embedding models

SVD (Singular Value Decomposition)

Model Summary

SVD implements matrix factorization using iterative algorithms inspired by singular value decomposition (like Funk SVD or SVD++). It can incorporate user/item biases and is typically trained using Stochastic Gradient Descent (SGD).

When to Use This Model

You have explicit feedback data (ratings, reviews with scores)
You want to model user and item biases (mean-centering tendencies)
You prefer SGD-based optimization over ALS
You need a straightforward matrix factorization approach
You're working with medium-sized datasets

When to Not Use This Model

You have only implicit feedback (ALS is typically better for this)
You need to leverage item content features (use Two-Tower or BeeFormer)
You want the most scalable solution (ELSA may be better)
You need to model sequential patterns (use sequential models)

Sample Use Cases / Item Types

Movie recommendations with explicit ratings (e.g., MovieLens)
Product recommendations with review scores
Restaurant or service recommendations with ratings
Any domain with explicit user feedback

ELSA (Efficient Latent Sparse Autoencoder)

Model Summary

ELSA is a scalable shallow linear autoencoder for implicit feedback collaborative filtering that learns item-item relationships by reconstructing user interaction vectors. Unlike EASE, it uses a factorized hidden layer structure (low-rank plus sparse) to improve scalability.

When to Use This Model

You have large-scale datasets and need scalability
You're working with implicit feedback data
You want better scalability than EASE while maintaining similar performance
You need efficient item-item similarity computation
You want a modern autoencoder approach for collaborative filtering

When to Not Use This Model

You have very small datasets (EASE might be simpler)
You need to incorporate item content features (use Two-Tower or BeeFormer)
You want to model sequential patterns (use sequential models)
You have explicit feedback only (SVD might be more appropriate)

Sample Use Cases / Item Types

Large e-commerce platforms with millions of products
Content streaming services with extensive catalogs
Social media feed recommendations
Any large-scale implicit feedback recommendation system

EASE (Embarrassingly Shallow Autoencoder)

Model Summary

EASE is a simple linear autoencoder for item-based collaborative filtering that uses a closed-form solution, avoiding iterative optimization for computational efficiency.

When to Use This Model

You want the simplest possible autoencoder approach
You need fast training with a closed-form solution
You have implicit feedback data
You're working with medium-sized datasets
You want a quick baseline or prototype

When to Not Use This Model

You have very large-scale datasets (ELSA scales better)
You need to leverage item content features (use Two-Tower or BeeFormer)
You want to model sequential patterns (use sequential models)
You need the highest accuracy (more complex models may perform better)

Sample Use Cases / Item Types

Medium-sized e-commerce sites
Content recommendation systems
Product similarity for "customers who bought this also bought"
Quick prototypes and baselines

Two-Tower

Model Summary

The Two-Tower architecture separates user and item computation into two distinct neural networks that output embeddings in the same vector space. Item embeddings can be pre-computed offline, enabling efficient Approximate Nearest Neighbor (ANN) search at inference time.

When to Use This Model

You have rich item metadata (text descriptions, categories, images)
You need efficient large-scale vector search and retrieval
You want to leverage both collaborative and content signals
You need production-ready embeddings for real-time recommendations
You want the best performance for general item similarity tasks

When to Not Use This Model

You have only interaction data without item features (use ALS/ELSA)
You need to model strict sequential patterns (use SASRec/BERT4Rec)
You have very limited compute resources
You need a simple baseline model

Sample Use Cases / Item Types

E-commerce with product descriptions, categories, images (e.g., clothing, electronics)
Content platforms with rich metadata (articles, videos with descriptions)
Job recommendations with job descriptions and requirements
Real estate with property descriptions and features
Any domain where items have rich textual or categorical attributes

BeeFormer

Model Summary

BeeFormer fine-tunes pre-trained sentence Transformers using user-item interaction data to bridge semantic similarity (from language models) and interaction-based similarity (from collaborative patterns).

When to Use This Model

You have rich text content (product descriptions, titles, reviews)
You want to combine semantic understanding with behavioral signals
You need good cold-start performance for new items
You want to leverage pre-trained language model knowledge
You have items with detailed textual descriptions

When to Not Use This Model

You have minimal or no text content for items
You only have interaction data without item descriptions
You need the fastest training possible
You have very limited text data quality
You want a pure collaborative filtering approach

Sample Use Cases / Item Types

E-commerce with detailed product descriptions (fashion, furniture, electronics)
Content platforms (articles, blog posts, research papers)
Job boards with detailed job descriptions
Real estate with property descriptions
Books, movies, or media with synopses and reviews
Any domain where semantic understanding of text content matters

Item2Vec

Model Summary

Item2Vec adapts the Word2Vec algorithm (CBOW or Skip-gram) to learn item embeddings from user interaction sequences, capturing co-occurrence patterns within a context window.

When to Use This Model

You have sequential interaction data (user sessions, browsing history)
You want to capture item co-occurrence patterns
You need a simple, efficient sequential embedding approach
You're working with session-based recommendations
You want to model "items frequently viewed together"

When to Not Use This Model

You need to model strict temporal order and dependencies (use SASRec/BERT4Rec)
You want to leverage item content features (use Two-Tower or BeeFormer)
You only have non-sequential interaction data
You need the highest accuracy for sequential recommendations

Sample Use Cases / Item Types

E-commerce browsing sessions (items viewed in same session)
Music playlists and song sequences
Video watching sequences
Shopping cart co-occurrence patterns
Session-based web recommendations

SASRec (Self-Attentive Sequential Recommendation)

Model Summary

SASRec utilizes the Transformer architecture's self-attention mechanism to model user interaction sequences, capturing both short-term and long-range dependencies to predict the next item. It's unidirectional, processing sequences from past to future.

When to Use This Model

You have sequential interaction data with clear temporal order
You need to predict the "next item" in a sequence
You want to capture both short-term and long-term patterns
You need state-of-the-art sequential recommendation performance
You're building session-based or sequential recommendation systems

When to Not Use This Model

You don't have sequential or temporal data
You want general item similarity rather than next-item prediction
You need bidirectional context understanding (use BERT4Rec)
You have very short sequences (Item2Vec might be simpler)
You want to leverage item content features primarily (use Two-Tower)

Sample Use Cases / Item Types

E-commerce next-item prediction (what to buy next)
Video streaming next-video recommendations
Music playlist continuation
News feed article sequences
Gaming item progression recommendations
Any domain with clear sequential user behavior

BERT4Rec

Model Summary

BERT4Rec adapts the bidirectional Transformer architecture (BERT) for sequential recommendation, using bidirectional self-attention to learn context-aware item representations by considering both past and future context.

When to Use This Model

You have sequential data and want bidirectional context understanding
You need richer item representations than unidirectional models
You want to leverage both past and future context in sequences
You're working with sequences where context matters in both directions
You need state-of-the-art sequential recommendation with bidirectional attention

When to Not Use This Model

You need real-time next-item prediction (unidirectional is more natural)
You have very long sequences (computational complexity)
You want the simplest sequential model (Item2Vec or SASRec)
You don't have sequential data
You primarily need general item similarity (use Two-Tower or ALS)

Sample Use Cases / Item Types

Context-aware sequential recommendations
Playlist generation with full context
Reading sequences where future context matters
Educational content sequences
Any sequential recommendation where bidirectional understanding helps

GSASRec (Generalized Self-Attentive Sequential Recommendation)

Model Summary

GSASRec is an enhancement of SASRec designed to mitigate overconfidence issues from negative sampling by improving prediction calibration while retaining the core self-attention mechanism.

When to Use This Model

You're experiencing overconfidence issues with SASRec
You need better calibrated predictions for sequential recommendations
You want improved negative sampling strategies
You need the benefits of SASRec with better training stability
You're working on production systems where calibration matters

When to Not Use This Model

You don't have sequential data
You want the simplest sequential model (use SASRec or Item2Vec)
You need general item similarity (use Two-Tower or ALS)
You're just starting with sequential recommendations (SASRec is simpler)
You want to leverage item content primarily (use Two-Tower or BeeFormer)

Sample Use Cases / Item Types

Production sequential recommendation systems
E-commerce next-item prediction with calibrated scores
Video streaming with improved prediction confidence
Any sequential recommendation where prediction calibration is critical

Embedding Models - Quick Decision Guide

For general item similarity and vector search:

Best choice: Two-Tower (if you have item features) or ALS/ELSA (if you only have interactions)

For text-rich items:

Best choice: BeeFormer (combines semantic + behavioral) or Two-Tower (if you have other features too)

For sequential recommendations:

Best choice: SASRec (unidirectional) or BERT4Rec (bidirectional), or Item2Vec (simpler co-occurrence)

For large-scale implicit feedback:

Best choice: ELSA (scalable) or ALS (simple and effective)

For explicit feedback:

Best choice: SVD (with biases) or ALS (can work with explicit too)

Scoring models

LightGBM

Model Summary

LightGBM is a highly efficient Gradient Boosting Decision Tree (GBDT) framework optimized for speed and memory usage. It builds ensembles of decision trees with histogram-based splitting, leaf-wise growth, and techniques like Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB), making it well-suited for large, sparse, and high-cardinality feature spaces. It supports regression, classification, and learning-to-rank objectives such as LambdaRank/LambdaMART.

When to Use This Model

You have large-scale tabular datasets with many categorical and numerical features
You need fast training and inference for ranking or CTR prediction workloads
You want a learning-to-rank objective (e.g., LambdaRank) for slate or search result ranking
You need a production-ready, memory-efficient GBDT implementation as a strong baseline
You want to combine rich behavioral, content, and affinity features into a single scoring model

When to Not Use This Model

You primarily work with small or medium-sized datasets and prefer more conservative defaults (use XGBoost)
You want to explicitly model high-order feature interactions via deep networks (use DeepFM or Wide & Deep)
You have only sparse interaction data and no rich features (use ALS/ELSA or other embedding models)
You need sequence-aware modeling of user behavior (use SASRec, BERT4Rec, or other sequential models)
You only need a simple trending or heuristic baseline (use Rising Popularity or value-model expressions)

Sample Use Cases / Item Types

E-commerce CTR prediction and ranking for product search and browse results
Feed ranking and homepage personalization using mixed behavioral and content features
Ad ranking and sponsored content placement with learning-to-rank objectives
Re-ranking of retrieved candidates in multi-stage recommendation architectures

XGBoost

Model Summary

XGBoost is a widely used GBDT framework known for its performance, rich regularization options, and robustness on structured data. It builds ensembles of decision trees with level-wise growth, supports various objectives (classification, regression, ranking), and includes regularization (L1/L2) and advanced handling of missing values. It is often favored when you need strong baselines and interpretability on small to medium-sized tabular datasets.

When to Use This Model

You work with structured/tabular data and want a strong, regularized baseline model
You have small to medium-sized datasets where XGBoost’s conservative defaults work well
You need robust handling of noisy data, missing values, and mixed feature types
You want a well-understood, battle-tested framework for CTR prediction or ranking
You care about interpretability via tree-based feature importance and decision paths

When to Not Use This Model

You need maximum training and inference speed on very large datasets (use LightGBM)
You want neural models that can learn complex high-order feature interactions (use DeepFM or Wide & Deep)
You primarily have sparse interaction-only signals without many engineered features (use ALS/ELSA or other embedding models)
You need sequence-aware modeling of user behavior (use SASRec, BERT4Rec, or other sequential models)
You mostly care about non-personalized trending content (use Rising Popularity)

Sample Use Cases / Item Types

CTR prediction for recommendation or advertising on moderate-scale datasets
Ranking search and browse results using handcrafted and learned features
Scoring candidates in multi-stage recommendation pipelines where robustness matters
Rapid prototyping and A/B testing of feature sets before deploying more complex models

Wide & Deep

Model Summary

Wide & Deep jointly trains a wide linear model and a deep neural network to combine memorization and generalization. The wide part consumes raw sparse features and manually engineered cross-product features, excelling at memorizing frequent co-occurrence patterns, while the deep part consumes dense embeddings and continuous features to learn non-linear, high-level representations. Their outputs are combined to produce the final score.

When to Use This Model

You want to combine memorized feature interactions with generalization to unseen combinations
You can invest in engineering high-value cross-product features for the wide part
You have rich categorical features that can be embedded and fed into a deep network
You are building recommendation or search ranking similar to app stores or content feeds

When to Not Use This Model

You do not want to maintain manual feature engineering for a wide component (use DeepFM instead)
You prefer tree-based models with simpler feature workflows (use LightGBM or XGBoost)
You primarily need retrieval-stage embeddings rather than point-wise scoring (use Two-Tower, ALS, or BeeFormer)
You care mainly about sequential order in interactions (use SASRec, BERT4Rec, or other sequential models)

Sample Use Cases / Item Types

App recommendation and app store ranking (apps, games, extensions)
Personalized content ranking for homepages, feeds, and carousels
Search ranking where memorized query–item patterns and generalization both matter
CTR and conversion modeling in environments with rich, hand-crafted cross features

DeepFM

Model Summary

DeepFM combines a Factorization Machine (FM) component and a deep neural network (DNN), sharing the same input embeddings. The FM component efficiently models low-order feature interactions (linear and pairwise), while the deep component captures high-order, non-linear interactions. This removes the need for manual feature crossing as in Wide & Deep, while still covering a wide spectrum of interactions.

When to Use This Model

You need a strong CTR or conversion prediction model that captures both low- and high-order feature interactions
You have many sparse categorical features and continuous features, and want to avoid manual feature crossing
You care about modeling interaction terms between user, item, and context features
You want a neural scoring model that can outperform linear or pure FM-based approaches

When to Not Use This Model

You have relatively simple feature sets and want a fast, tree-based baseline (use LightGBM or XGBoost)
You do not have the infrastructure to train and serve neural models in production
You mainly need retrieval-stage embeddings rather than point-wise scoring (use Two-Tower, ALS, or BeeFormer)
You primarily need to capture sequential patterns in user behavior (use SASRec, BERT4Rec, or other sequential models)

Sample Use Cases / Item Types

CTR prediction for feeds, recommendations, and advertising with many categorical IDs
Ranking items in e-commerce or content platforms using rich user, item, and context features
Scoring candidates in ranking stages after retrieval from embeddings or lexical/vector search
Personalization tasks where complex feature interactions drive performance

Ngram

Model Summary

The Ngram policy is a simple, frequency-based sequential model. It estimates probabilities of the next item given the preceding (n-1) items by counting n-grams in historical interaction sequences, with optional Laplace smoothing to handle unseen patterns. It focuses on short-term co-occurrence and locality in sequences rather than long-range dependencies.

When to Use This Model

You want a simple, interpretable baseline for sequential or session-based recommendations
You have sparse interaction data where complex neural sequence models may overfit
You mostly care about short-term co-occurrence patterns in the latest interactions
You need a fast, low-compute sequential model with minimal training complexity

When to Not Use This Model

You need to capture long-range dependencies or rich temporal structure (use SASRec, GSASRec, or BERT4Rec)
You have dense sequential data and can benefit from more expressive neural models
You primarily need general item similarity or retrieval embeddings (use Two-Tower, ALS, or Item2Vec)
You mainly focus on global trending content rather than per-user sequences (use Rising Popularity)

Sample Use Cases / Item Types

Session-based recommendations on small or medium-sized sites
Next-click or next-view prediction from a short recent interaction history
Baseline models for evaluating more sophisticated sequential architectures
Educational or low-traffic environments where interpretability and simplicity are preferred

Rising Popularity

Model Summary

Rising Popularity is a rule-based policy that ranks items based on recent changes in engagement, emphasizing momentum rather than absolute volume. It compares engagement in a recent window against a previous baseline to surface items that are rapidly gaining traction, making it useful for trending and “what’s hot” experiences.

When to Use This Model

You want to surface items that are rapidly gaining engagement or momentum
You need a non-personalized trending or “Top rising” experience
You are handling cold-start or anonymous users where you lack rich history
You want a simple, robust baseline for time-sensitive content without training a complex model

When to Not Use This Model

You need highly personalized rankings driven by user history (use ALS, ELSA, Two-Tower, or sequential models)
You care more about long-term relevance than short-term spikes (use LightGBM, XGBoost, DeepFM, or Wide & Deep)
Your catalog or usage patterns change slowly and recent engagement is not a strong signal
You need fine-grained control over feature interactions and objectives (use learning-to-rank or neural scoring models)

Sample Use Cases / Item Types

Trending or “Top rising” sections on homepages and category pages
Surfacing viral or breaking news articles, videos, or social posts
Cold-start experiences for new or logged-out users before personalization is available
Fallback or blended signals in multi-signal ranking strategies (e.g., mixing trending with personalized scores)

Scoring Models - Quick Decision Guide

For general CTR and ranking on tabular data:

Best choice: LightGBM or XGBoost, depending on dataset size and performance needs

For rich feature interaction modeling without manual crosses:

Best choice: DeepFM (or Wide & Deep if you are already invested in feature engineering)

For simple or resource-constrained sequential setups:

Best choice: Ngram (baseline), or reuse sequential embedding models in a scoring stage

For trending and cold-start experiences:

Best choice: Rising Popularity, often blended with personalized scores from other models

Model types​

You don't need a model policy!​

Embedding models​

SVD (Singular Value Decomposition)​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

ELSA (Efficient Latent Sparse Autoencoder)​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

EASE (Embarrassingly Shallow Autoencoder)​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

Two-Tower​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

BeeFormer​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

Item2Vec​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

SASRec (Self-Attentive Sequential Recommendation)​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

BERT4Rec​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

GSASRec (Generalized Self-Attentive Sequential Recommendation)​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

Embedding Models - Quick Decision Guide​

Scoring models​

LightGBM​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

XGBoost​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

Wide & Deep​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

DeepFM​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

Ngram​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

Rising Popularity​

Model Summary​

When to Use This Model​

When to Not Use This Model​

Sample Use Cases / Item Types​

Model types

You don't need a model policy!

Embedding models

SVD (Singular Value Decomposition)

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

ELSA (Efficient Latent Sparse Autoencoder)

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

EASE (Embarrassingly Shallow Autoencoder)

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

Two-Tower

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

BeeFormer

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

Item2Vec

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

SASRec (Self-Attentive Sequential Recommendation)

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

BERT4Rec

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

GSASRec (Generalized Self-Attentive Sequential Recommendation)

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

Embedding Models - Quick Decision Guide

Scoring models

LightGBM

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

XGBoost

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

Wide & Deep

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

DeepFM

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

Ngram

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types

Rising Popularity

Model Summary

When to Use This Model

When to Not Use This Model

Sample Use Cases / Item Types