Skip to main content

Choosing a model policy

This guide provides an overview of embedding models available for training in Shaped, helping you choose the right model for vector search and item similarity tasks.


SVD (Singular Value Decomposition)

Model Summary

SVD implements matrix factorization using iterative algorithms inspired by singular value decomposition (like Funk SVD or SVD++). It can incorporate user/item biases and is typically trained using Stochastic Gradient Descent (SGD).

When to Use This Model

  • You have explicit feedback data (ratings, reviews with scores)
  • You want to model user and item biases (mean-centering tendencies)
  • You prefer SGD-based optimization over ALS
  • You need a straightforward matrix factorization approach
  • You're working with medium-sized datasets

When to Not Use This Model

  • You have only implicit feedback (ALS is typically better for this)
  • You need to leverage item content features (use Two-Tower or BeeFormer)
  • You want the most scalable solution (ELSA may be better)
  • You need to model sequential patterns (use sequential models)

Sample Use Cases / Item Types

  • Movie recommendations with explicit ratings (e.g., MovieLens)
  • Product recommendations with review scores
  • Restaurant or service recommendations with ratings
  • Any domain with explicit user feedback

ELSA (Efficient Latent Sparse Autoencoder)

Model Summary

ELSA is a scalable shallow linear autoencoder for implicit feedback collaborative filtering that learns item-item relationships by reconstructing user interaction vectors. Unlike EASE, it uses a factorized hidden layer structure (low-rank plus sparse) to improve scalability.

When to Use This Model

  • You have large-scale datasets and need scalability
  • You're working with implicit feedback data
  • You want better scalability than EASE while maintaining similar performance
  • You need efficient item-item similarity computation
  • You want a modern autoencoder approach for collaborative filtering

When to Not Use This Model

  • You have very small datasets (EASE might be simpler)
  • You need to incorporate item content features (use Two-Tower or BeeFormer)
  • You want to model sequential patterns (use sequential models)
  • You have explicit feedback only (SVD might be more appropriate)

Sample Use Cases / Item Types

  • Large e-commerce platforms with millions of products
  • Content streaming services with extensive catalogs
  • Social media feed recommendations
  • Any large-scale implicit feedback recommendation system

EASE (Embarrassingly Shallow Autoencoder)

Model Summary

EASE is a simple linear autoencoder for item-based collaborative filtering that uses a closed-form solution, avoiding iterative optimization for computational efficiency.

When to Use This Model

  • You want the simplest possible autoencoder approach
  • You need fast training with a closed-form solution
  • You have implicit feedback data
  • You're working with medium-sized datasets
  • You want a quick baseline or prototype

When to Not Use This Model

  • You have very large-scale datasets (ELSA scales better)
  • You need to leverage item content features (use Two-Tower or BeeFormer)
  • You want to model sequential patterns (use sequential models)
  • You need the highest accuracy (more complex models may perform better)

Sample Use Cases / Item Types

  • Medium-sized e-commerce sites
  • Content recommendation systems
  • Product similarity for "customers who bought this also bought"
  • Quick prototypes and baselines

Two-Tower

Model Summary

The Two-Tower architecture separates user and item computation into two distinct neural networks that output embeddings in the same vector space. Item embeddings can be pre-computed offline, enabling efficient Approximate Nearest Neighbor (ANN) search at inference time.

When to Use This Model

  • You have rich item metadata (text descriptions, categories, images)
  • You need efficient large-scale vector search and retrieval
  • You want to leverage both collaborative and content signals
  • You need production-ready embeddings for real-time recommendations
  • You want the best performance for general item similarity tasks

When to Not Use This Model

  • You have only interaction data without item features (use ALS/ELSA)
  • You need to model strict sequential patterns (use SASRec/BERT4Rec)
  • You have very limited compute resources
  • You need a simple baseline model

Sample Use Cases / Item Types

  • E-commerce with product descriptions, categories, images (e.g., clothing, electronics)
  • Content platforms with rich metadata (articles, videos with descriptions)
  • Job recommendations with job descriptions and requirements
  • Real estate with property descriptions and features
  • Any domain where items have rich textual or categorical attributes

BeeFormer

Model Summary

BeeFormer fine-tunes pre-trained sentence Transformers using user-item interaction data to bridge semantic similarity (from language models) and interaction-based similarity (from collaborative patterns).

When to Use This Model

  • You have rich text content (product descriptions, titles, reviews)
  • You want to combine semantic understanding with behavioral signals
  • You need good cold-start performance for new items
  • You want to leverage pre-trained language model knowledge
  • You have items with detailed textual descriptions

When to Not Use This Model

  • You have minimal or no text content for items
  • You only have interaction data without item descriptions
  • You need the fastest training possible
  • You have very limited text data quality
  • You want a pure collaborative filtering approach

Sample Use Cases / Item Types

  • E-commerce with detailed product descriptions (fashion, furniture, electronics)
  • Content platforms (articles, blog posts, research papers)
  • Job boards with detailed job descriptions
  • Real estate with property descriptions
  • Books, movies, or media with synopses and reviews
  • Any domain where semantic understanding of text content matters

Item2Vec

Model Summary

Item2Vec adapts the Word2Vec algorithm (CBOW or Skip-gram) to learn item embeddings from user interaction sequences, capturing co-occurrence patterns within a context window.

When to Use This Model

  • You have sequential interaction data (user sessions, browsing history)
  • You want to capture item co-occurrence patterns
  • You need a simple, efficient sequential embedding approach
  • You're working with session-based recommendations
  • You want to model "items frequently viewed together"

When to Not Use This Model

  • You need to model strict temporal order and dependencies (use SASRec/BERT4Rec)
  • You want to leverage item content features (use Two-Tower or BeeFormer)
  • You only have non-sequential interaction data
  • You need the highest accuracy for sequential recommendations

Sample Use Cases / Item Types

  • E-commerce browsing sessions (items viewed in same session)
  • Music playlists and song sequences
  • Video watching sequences
  • Shopping cart co-occurrence patterns
  • Session-based web recommendations

SASRec (Self-Attentive Sequential Recommendation)

Model Summary

SASRec utilizes the Transformer architecture's self-attention mechanism to model user interaction sequences, capturing both short-term and long-range dependencies to predict the next item. It's unidirectional, processing sequences from past to future.

When to Use This Model

  • You have sequential interaction data with clear temporal order
  • You need to predict the "next item" in a sequence
  • You want to capture both short-term and long-term patterns
  • You need state-of-the-art sequential recommendation performance
  • You're building session-based or sequential recommendation systems

When to Not Use This Model

  • You don't have sequential or temporal data
  • You want general item similarity rather than next-item prediction
  • You need bidirectional context understanding (use BERT4Rec)
  • You have very short sequences (Item2Vec might be simpler)
  • You want to leverage item content features primarily (use Two-Tower)

Sample Use Cases / Item Types

  • E-commerce next-item prediction (what to buy next)
  • Video streaming next-video recommendations
  • Music playlist continuation
  • News feed article sequences
  • Gaming item progression recommendations
  • Any domain with clear sequential user behavior

BERT4Rec

Model Summary

BERT4Rec adapts the bidirectional Transformer architecture (BERT) for sequential recommendation, using bidirectional self-attention to learn context-aware item representations by considering both past and future context.

When to Use This Model

  • You have sequential data and want bidirectional context understanding
  • You need richer item representations than unidirectional models
  • You want to leverage both past and future context in sequences
  • You're working with sequences where context matters in both directions
  • You need state-of-the-art sequential recommendation with bidirectional attention

When to Not Use This Model

  • You need real-time next-item prediction (unidirectional is more natural)
  • You have very long sequences (computational complexity)
  • You want the simplest sequential model (Item2Vec or SASRec)
  • You don't have sequential data
  • You primarily need general item similarity (use Two-Tower or ALS)

Sample Use Cases / Item Types

  • Context-aware sequential recommendations
  • Playlist generation with full context
  • Reading sequences where future context matters
  • Educational content sequences
  • Any sequential recommendation where bidirectional understanding helps

GSASRec (Generalized Self-Attentive Sequential Recommendation)

Model Summary

GSASRec is an enhancement of SASRec designed to mitigate overconfidence issues from negative sampling by improving prediction calibration while retaining the core self-attention mechanism.

When to Use This Model

  • You're experiencing overconfidence issues with SASRec
  • You need better calibrated predictions for sequential recommendations
  • You want improved negative sampling strategies
  • You need the benefits of SASRec with better training stability
  • You're working on production systems where calibration matters

When to Not Use This Model

  • You don't have sequential data
  • You want the simplest sequential model (use SASRec or Item2Vec)
  • You need general item similarity (use Two-Tower or ALS)
  • You're just starting with sequential recommendations (SASRec is simpler)
  • You want to leverage item content primarily (use Two-Tower or BeeFormer)

Sample Use Cases / Item Types

  • Production sequential recommendation systems
  • E-commerce next-item prediction with calibrated scores
  • Video streaming with improved prediction confidence
  • Any sequential recommendation where prediction calibration is critical

Quick Decision Guide

For general item similarity and vector search:

  • Best choice: Two-Tower (if you have item features) or ALS/ELSA (if you only have interactions)

For text-rich items:

  • Best choice: BeeFormer (combines semantic + behavioral) or Two-Tower (if you have other features too)

For sequential recommendations:

  • Best choice: SASRec (unidirectional) or BERT4Rec (bidirectional), or Item2Vec (simpler co-occurrence)

For large-scale implicit feedback:

  • Best choice: ELSA (scalable) or ALS (simple and effective)

For explicit feedback:

  • Best choice: SVD (with biases) or ALS (can work with explicit too)