Skip to main content

Choose a model policy

This guide provides an overview of models available for training in Shaped, helping you choose the right model for your use case.

Model types

You can train two types of models in Shaped:

  • Embedding models, which learn dense vector representations for users and items for search and similarity
  • Scoring models, which take rich feature sets for a (user, item, context) pair and output a relevance score to rank candidates

These models can be configured in the training block of your config, under models:

training: 
models:
- name: my_elsa_score_model
policy_type: elsa
strategy: early_stopping

You don't need a model policy!

You can get a lot of functionality out of an engine with no trained embedding or scoring models.

Add an interaction table: Any engine with an interaction_table will rank items by popularity, trending, and so on. The items table will have columns that expose the rank of these items compared to other items: Derived columns

Use huggingface embeddings for similarity search: If you create an embedding in the index config, you can use that to find items that are similar to a single item, or items that are complementary to a set of items: Complement items

Use value modeling: All engines support value modeling by default, so you can begin by scoring items with expressions. Here's an example that uses popular rank with a time decay: ORDER BY score(expression='(1000 - item._derived_popular_rank) / ((((now_seconds() - item.published_at) / 3600) + 2) ** 1.8)', input_user_id='$user_id', input_interactions_item_ids='$interaction_item_ids')


Embedding models

SVD (Singular Value Decomposition)

Model Summary

SVD implements matrix factorization using iterative algorithms inspired by singular value decomposition (like Funk SVD or SVD++). It can incorporate user/item biases and is typically trained using Stochastic Gradient Descent (SGD).

When to Use This Model

  • You have explicit feedback data (ratings, reviews with scores)
  • You want to model user and item biases (mean-centering tendencies)
  • You prefer SGD-based optimization over ALS
  • You need a straightforward matrix factorization approach
  • You're working with medium-sized datasets

When to Not Use This Model

  • You have only implicit feedback (ALS is typically better for this)
  • You need to leverage item content features (use Two-Tower or BeeFormer)
  • You want the most scalable solution (ELSA may be better)
  • You need to model sequential patterns (use sequential models)

Sample Use Cases / Item Types

  • Movie recommendations with explicit ratings (e.g., MovieLens)
  • Product recommendations with review scores
  • Restaurant or service recommendations with ratings
  • Any domain with explicit user feedback

ELSA (Efficient Latent Sparse Autoencoder)

Model Summary

ELSA is a scalable shallow linear autoencoder for implicit feedback collaborative filtering that learns item-item relationships by reconstructing user interaction vectors. Unlike EASE, it uses a factorized hidden layer structure (low-rank plus sparse) to improve scalability.

When to Use This Model

  • You have large-scale datasets and need scalability
  • You're working with implicit feedback data
  • You want better scalability than EASE while maintaining similar performance
  • You need efficient item-item similarity computation
  • You want a modern autoencoder approach for collaborative filtering

When to Not Use This Model

  • You have very small datasets (EASE might be simpler)
  • You need to incorporate item content features (use Two-Tower or BeeFormer)
  • You want to model sequential patterns (use sequential models)
  • You have explicit feedback only (SVD might be more appropriate)

Sample Use Cases / Item Types

  • Large e-commerce platforms with millions of products
  • Content streaming services with extensive catalogs
  • Social media feed recommendations
  • Any large-scale implicit feedback recommendation system

EASE (Embarrassingly Shallow Autoencoder)

Model Summary

EASE is a simple linear autoencoder for item-based collaborative filtering that uses a closed-form solution, avoiding iterative optimization for computational efficiency.

When to Use This Model

  • You want the simplest possible autoencoder approach
  • You need fast training with a closed-form solution
  • You have implicit feedback data
  • You're working with medium-sized datasets
  • You want a quick baseline or prototype

When to Not Use This Model

  • You have very large-scale datasets (ELSA scales better)
  • You need to leverage item content features (use Two-Tower or BeeFormer)
  • You want to model sequential patterns (use sequential models)
  • You need the highest accuracy (more complex models may perform better)

Sample Use Cases / Item Types

  • Medium-sized e-commerce sites
  • Content recommendation systems
  • Product similarity for "customers who bought this also bought"
  • Quick prototypes and baselines

Two-Tower

Model Summary

The Two-Tower architecture separates user and item computation into two distinct neural networks that output embeddings in the same vector space. Item embeddings can be pre-computed offline, enabling efficient Approximate Nearest Neighbor (ANN) search at inference time.

When to Use This Model

  • You have rich item metadata (text descriptions, categories, images)
  • You need efficient large-scale vector search and retrieval
  • You want to leverage both collaborative and content signals
  • You need production-ready embeddings for real-time recommendations
  • You want the best performance for general item similarity tasks

When to Not Use This Model

  • You have only interaction data without item features (use ALS/ELSA)
  • You need to model strict sequential patterns (use SASRec/BERT4Rec)
  • You have very limited compute resources
  • You need a simple baseline model

Sample Use Cases / Item Types

  • E-commerce with product descriptions, categories, images (e.g., clothing, electronics)
  • Content platforms with rich metadata (articles, videos with descriptions)
  • Job recommendations with job descriptions and requirements
  • Real estate with property descriptions and features
  • Any domain where items have rich textual or categorical attributes

BeeFormer

Model Summary

BeeFormer fine-tunes pre-trained sentence Transformers using user-item interaction data to bridge semantic similarity (from language models) and interaction-based similarity (from collaborative patterns).

When to Use This Model

  • You have rich text content (product descriptions, titles, reviews)
  • You want to combine semantic understanding with behavioral signals
  • You need good cold-start performance for new items
  • You want to leverage pre-trained language model knowledge
  • You have items with detailed textual descriptions

When to Not Use This Model

  • You have minimal or no text content for items
  • You only have interaction data without item descriptions
  • You need the fastest training possible
  • You have very limited text data quality
  • You want a pure collaborative filtering approach

Sample Use Cases / Item Types

  • E-commerce with detailed product descriptions (fashion, furniture, electronics)
  • Content platforms (articles, blog posts, research papers)
  • Job boards with detailed job descriptions
  • Real estate with property descriptions
  • Books, movies, or media with synopses and reviews
  • Any domain where semantic understanding of text content matters

Item2Vec

Model Summary

Item2Vec adapts the Word2Vec algorithm (CBOW or Skip-gram) to learn item embeddings from user interaction sequences, capturing co-occurrence patterns within a context window.

When to Use This Model

  • You have sequential interaction data (user sessions, browsing history)
  • You want to capture item co-occurrence patterns
  • You need a simple, efficient sequential embedding approach
  • You're working with session-based recommendations
  • You want to model "items frequently viewed together"

When to Not Use This Model

  • You need to model strict temporal order and dependencies (use SASRec/BERT4Rec)
  • You want to leverage item content features (use Two-Tower or BeeFormer)
  • You only have non-sequential interaction data
  • You need the highest accuracy for sequential recommendations

Sample Use Cases / Item Types

  • E-commerce browsing sessions (items viewed in same session)
  • Music playlists and song sequences
  • Video watching sequences
  • Shopping cart co-occurrence patterns
  • Session-based web recommendations

SASRec (Self-Attentive Sequential Recommendation)

Model Summary

SASRec utilizes the Transformer architecture's self-attention mechanism to model user interaction sequences, capturing both short-term and long-range dependencies to predict the next item. It's unidirectional, processing sequences from past to future.

When to Use This Model

  • You have sequential interaction data with clear temporal order
  • You need to predict the "next item" in a sequence
  • You want to capture both short-term and long-term patterns
  • You need state-of-the-art sequential recommendation performance
  • You're building session-based or sequential recommendation systems

When to Not Use This Model

  • You don't have sequential or temporal data
  • You want general item similarity rather than next-item prediction
  • You need bidirectional context understanding (use BERT4Rec)
  • You have very short sequences (Item2Vec might be simpler)
  • You want to leverage item content features primarily (use Two-Tower)

Sample Use Cases / Item Types

  • E-commerce next-item prediction (what to buy next)
  • Video streaming next-video recommendations
  • Music playlist continuation
  • News feed article sequences
  • Gaming item progression recommendations
  • Any domain with clear sequential user behavior

BERT4Rec

Model Summary

BERT4Rec adapts the bidirectional Transformer architecture (BERT) for sequential recommendation, using bidirectional self-attention to learn context-aware item representations by considering both past and future context.

When to Use This Model

  • You have sequential data and want bidirectional context understanding
  • You need richer item representations than unidirectional models
  • You want to leverage both past and future context in sequences
  • You're working with sequences where context matters in both directions
  • You need state-of-the-art sequential recommendation with bidirectional attention

When to Not Use This Model

  • You need real-time next-item prediction (unidirectional is more natural)
  • You have very long sequences (computational complexity)
  • You want the simplest sequential model (Item2Vec or SASRec)
  • You don't have sequential data
  • You primarily need general item similarity (use Two-Tower or ALS)

Sample Use Cases / Item Types

  • Context-aware sequential recommendations
  • Playlist generation with full context
  • Reading sequences where future context matters
  • Educational content sequences
  • Any sequential recommendation where bidirectional understanding helps

GSASRec (Generalized Self-Attentive Sequential Recommendation)

Model Summary

GSASRec is an enhancement of SASRec designed to mitigate overconfidence issues from negative sampling by improving prediction calibration while retaining the core self-attention mechanism.

When to Use This Model

  • You're experiencing overconfidence issues with SASRec
  • You need better calibrated predictions for sequential recommendations
  • You want improved negative sampling strategies
  • You need the benefits of SASRec with better training stability
  • You're working on production systems where calibration matters

When to Not Use This Model

  • You don't have sequential data
  • You want the simplest sequential model (use SASRec or Item2Vec)
  • You need general item similarity (use Two-Tower or ALS)
  • You're just starting with sequential recommendations (SASRec is simpler)
  • You want to leverage item content primarily (use Two-Tower or BeeFormer)

Sample Use Cases / Item Types

  • Production sequential recommendation systems
  • E-commerce next-item prediction with calibrated scores
  • Video streaming with improved prediction confidence
  • Any sequential recommendation where prediction calibration is critical

Embedding Models - Quick Decision Guide

For general item similarity and vector search:

  • Best choice: Two-Tower (if you have item features) or ALS/ELSA (if you only have interactions)

For text-rich items:

  • Best choice: BeeFormer (combines semantic + behavioral) or Two-Tower (if you have other features too)

For sequential recommendations:

  • Best choice: SASRec (unidirectional) or BERT4Rec (bidirectional), or Item2Vec (simpler co-occurrence)

For large-scale implicit feedback:

  • Best choice: ELSA (scalable) or ALS (simple and effective)

For explicit feedback:

  • Best choice: SVD (with biases) or ALS (can work with explicit too)

Scoring models

LightGBM

Model Summary

LightGBM is a highly efficient Gradient Boosting Decision Tree (GBDT) framework optimized for speed and memory usage. It builds ensembles of decision trees with histogram-based splitting, leaf-wise growth, and techniques like Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB), making it well-suited for large, sparse, and high-cardinality feature spaces. It supports regression, classification, and learning-to-rank objectives such as LambdaRank/LambdaMART.

When to Use This Model

  • You have large-scale tabular datasets with many categorical and numerical features
  • You need fast training and inference for ranking or CTR prediction workloads
  • You want a learning-to-rank objective (e.g., LambdaRank) for slate or search result ranking
  • You need a production-ready, memory-efficient GBDT implementation as a strong baseline
  • You want to combine rich behavioral, content, and affinity features into a single scoring model

When to Not Use This Model

  • You primarily work with small or medium-sized datasets and prefer more conservative defaults (use XGBoost)
  • You want to explicitly model high-order feature interactions via deep networks (use DeepFM or Wide & Deep)
  • You have only sparse interaction data and no rich features (use ALS/ELSA or other embedding models)
  • You need sequence-aware modeling of user behavior (use SASRec, BERT4Rec, or other sequential models)
  • You only need a simple trending or heuristic baseline (use Rising Popularity or value-model expressions)

Sample Use Cases / Item Types

  • E-commerce CTR prediction and ranking for product search and browse results
  • Feed ranking and homepage personalization using mixed behavioral and content features
  • Ad ranking and sponsored content placement with learning-to-rank objectives
  • Re-ranking of retrieved candidates in multi-stage recommendation architectures

XGBoost

Model Summary

XGBoost is a widely used GBDT framework known for its performance, rich regularization options, and robustness on structured data. It builds ensembles of decision trees with level-wise growth, supports various objectives (classification, regression, ranking), and includes regularization (L1/L2) and advanced handling of missing values. It is often favored when you need strong baselines and interpretability on small to medium-sized tabular datasets.

When to Use This Model

  • You work with structured/tabular data and want a strong, regularized baseline model
  • You have small to medium-sized datasets where XGBoost’s conservative defaults work well
  • You need robust handling of noisy data, missing values, and mixed feature types
  • You want a well-understood, battle-tested framework for CTR prediction or ranking
  • You care about interpretability via tree-based feature importance and decision paths

When to Not Use This Model

  • You need maximum training and inference speed on very large datasets (use LightGBM)
  • You want neural models that can learn complex high-order feature interactions (use DeepFM or Wide & Deep)
  • You primarily have sparse interaction-only signals without many engineered features (use ALS/ELSA or other embedding models)
  • You need sequence-aware modeling of user behavior (use SASRec, BERT4Rec, or other sequential models)
  • You mostly care about non-personalized trending content (use Rising Popularity)

Sample Use Cases / Item Types

  • CTR prediction for recommendation or advertising on moderate-scale datasets
  • Ranking search and browse results using handcrafted and learned features
  • Scoring candidates in multi-stage recommendation pipelines where robustness matters
  • Rapid prototyping and A/B testing of feature sets before deploying more complex models

Wide & Deep

Model Summary

Wide & Deep jointly trains a wide linear model and a deep neural network to combine memorization and generalization. The wide part consumes raw sparse features and manually engineered cross-product features, excelling at memorizing frequent co-occurrence patterns, while the deep part consumes dense embeddings and continuous features to learn non-linear, high-level representations. Their outputs are combined to produce the final score.

When to Use This Model

  • You want to combine memorized feature interactions with generalization to unseen combinations
  • You can invest in engineering high-value cross-product features for the wide part
  • You have rich categorical features that can be embedded and fed into a deep network
  • You are building recommendation or search ranking similar to app stores or content feeds

When to Not Use This Model

  • You do not want to maintain manual feature engineering for a wide component (use DeepFM instead)
  • You prefer tree-based models with simpler feature workflows (use LightGBM or XGBoost)
  • You primarily need retrieval-stage embeddings rather than point-wise scoring (use Two-Tower, ALS, or BeeFormer)
  • You care mainly about sequential order in interactions (use SASRec, BERT4Rec, or other sequential models)

Sample Use Cases / Item Types

  • App recommendation and app store ranking (apps, games, extensions)
  • Personalized content ranking for homepages, feeds, and carousels
  • Search ranking where memorized query–item patterns and generalization both matter
  • CTR and conversion modeling in environments with rich, hand-crafted cross features

DeepFM

Model Summary

DeepFM combines a Factorization Machine (FM) component and a deep neural network (DNN), sharing the same input embeddings. The FM component efficiently models low-order feature interactions (linear and pairwise), while the deep component captures high-order, non-linear interactions. This removes the need for manual feature crossing as in Wide & Deep, while still covering a wide spectrum of interactions.

When to Use This Model

  • You need a strong CTR or conversion prediction model that captures both low- and high-order feature interactions
  • You have many sparse categorical features and continuous features, and want to avoid manual feature crossing
  • You care about modeling interaction terms between user, item, and context features
  • You want a neural scoring model that can outperform linear or pure FM-based approaches

When to Not Use This Model

  • You have relatively simple feature sets and want a fast, tree-based baseline (use LightGBM or XGBoost)
  • You do not have the infrastructure to train and serve neural models in production
  • You mainly need retrieval-stage embeddings rather than point-wise scoring (use Two-Tower, ALS, or BeeFormer)
  • You primarily need to capture sequential patterns in user behavior (use SASRec, BERT4Rec, or other sequential models)

Sample Use Cases / Item Types

  • CTR prediction for feeds, recommendations, and advertising with many categorical IDs
  • Ranking items in e-commerce or content platforms using rich user, item, and context features
  • Scoring candidates in ranking stages after retrieval from embeddings or lexical/vector search
  • Personalization tasks where complex feature interactions drive performance

Ngram

Model Summary

The Ngram policy is a simple, frequency-based sequential model. It estimates probabilities of the next item given the preceding (n-1) items by counting n-grams in historical interaction sequences, with optional Laplace smoothing to handle unseen patterns. It focuses on short-term co-occurrence and locality in sequences rather than long-range dependencies.

When to Use This Model

  • You want a simple, interpretable baseline for sequential or session-based recommendations
  • You have sparse interaction data where complex neural sequence models may overfit
  • You mostly care about short-term co-occurrence patterns in the latest interactions
  • You need a fast, low-compute sequential model with minimal training complexity

When to Not Use This Model

  • You need to capture long-range dependencies or rich temporal structure (use SASRec, GSASRec, or BERT4Rec)
  • You have dense sequential data and can benefit from more expressive neural models
  • You primarily need general item similarity or retrieval embeddings (use Two-Tower, ALS, or Item2Vec)
  • You mainly focus on global trending content rather than per-user sequences (use Rising Popularity)

Sample Use Cases / Item Types

  • Session-based recommendations on small or medium-sized sites
  • Next-click or next-view prediction from a short recent interaction history
  • Baseline models for evaluating more sophisticated sequential architectures
  • Educational or low-traffic environments where interpretability and simplicity are preferred

Rising Popularity

Model Summary

Rising Popularity is a rule-based policy that ranks items based on recent changes in engagement, emphasizing momentum rather than absolute volume. It compares engagement in a recent window against a previous baseline to surface items that are rapidly gaining traction, making it useful for trending and “what’s hot” experiences.

When to Use This Model

  • You want to surface items that are rapidly gaining engagement or momentum
  • You need a non-personalized trending or “Top rising” experience
  • You are handling cold-start or anonymous users where you lack rich history
  • You want a simple, robust baseline for time-sensitive content without training a complex model

When to Not Use This Model

  • You need highly personalized rankings driven by user history (use ALS, ELSA, Two-Tower, or sequential models)
  • You care more about long-term relevance than short-term spikes (use LightGBM, XGBoost, DeepFM, or Wide & Deep)
  • Your catalog or usage patterns change slowly and recent engagement is not a strong signal
  • You need fine-grained control over feature interactions and objectives (use learning-to-rank or neural scoring models)

Sample Use Cases / Item Types

  • Trending or “Top rising” sections on homepages and category pages
  • Surfacing viral or breaking news articles, videos, or social posts
  • Cold-start experiences for new or logged-out users before personalization is available
  • Fallback or blended signals in multi-signal ranking strategies (e.g., mixing trending with personalized scores)

Scoring Models - Quick Decision Guide

For general CTR and ranking on tabular data:

  • Best choice: LightGBM or XGBoost, depending on dataset size and performance needs

For rich feature interaction modeling without manual crosses:

  • Best choice: DeepFM (or Wide & Deep if you are already invested in feature engineering)

For simple or resource-constrained sequential setups:

  • Best choice: Ngram (baseline), or reuse sequential embedding models in a scoring stage

For trending and cold-start experiences:

  • Best choice: Rising Popularity, often blended with personalized scores from other models