Skip to main content

SASRec (Sequential)

Description

The SASRec (Self-Attentive Sequential Recommendation) policy utilizes the Transformer architecture's self-attention mechanism to model user interaction sequences. By weighing the importance of all previous items, it captures both short-term and long-range dependencies to predict the next item the user is likely to interact with.

Policy Type: sasrec Supports: embedding_policy, scoring_policy

Hyperparameter tuning

  • batch_size: Number of samples processed before updating model weights.
  • eval_batch_size: Batch size used during model evaluation.
  • n_epochs: Number of complete passes through the training dataset.
  • negative_samples_count: Number of negative samples per positive sample for contrastive learning.
  • device
  • hidden_size: Size of the hidden layers in the transformer.
  • inner_size: Size of the feed-forward network inner layer.
  • learning_rate: Learning rate for gradient descent optimization.
  • attn_dropout_prob: Dropout probability for attention layers.
  • hidden_act: Activation function.
  • hidden_dropout_prob: Dropout probability for hidden layers.
  • n_heads: Number of attention heads in the transformer.
  • n_layers: Number of transformer layers.
  • layer_norm_eps
  • initializer_range
  • mask_rate: Fraction of tokens to mask during training.
  • loss_type
  • max_seq_length: Maximum length of input sequences.
  • sample_strategy
  • append_item_features: Whether to append item features.
  • append_item_embeddings: Whether to append item embeddings.
  • use_candidate_embeddings: Whether to use candidate embeddings.
  • sample_seed
  • sample_ratio
  • eval_step
  • early_stopping_step

V1 API

policy_configs:
scoring_policy: # Can also be used under embedding_policy
policy_type: sasrec
# Training Hyperparameters
batch_size: 1000 # Samples per training batch
n_epochs: 1 # Number of training epochs
negative_samples_count: 2 # Negative samples per positive for contrastive loss
learning_rate: 0.001 # Optimizer learning rate
# Architecture Hyperparameters
hidden_size: 64 # Dimensionality of hidden layers/embeddings
n_heads: 2 # Number of self-attention heads
n_layers: 2 # Number of Transformer layers
attn_dropout_prob: 0.2 # Dropout rate in attention mechanism
hidden_act: "gelu" # Activation function (e.g., "gelu", "relu")
max_seq_length: 50 # Maximum input sequence length

Usage

Use this model when:

  • You have sequential interaction data with clear temporal order
  • You need to predict the "next item" in a sequence
  • You want to capture both short-term and long-term patterns
  • You need state-of-the-art sequential recommendation performance
  • You're building session-based or sequential recommendation systems

Choose a different model when:

  • You don't have sequential or temporal data
  • You want general item similarity rather than next-item prediction
  • You need bidirectional context understanding (use BERT4Rec)
  • You have very short sequences (Item2Vec might be simpler)
  • You want to leverage item content features primarily (use Two-Tower)

Use cases

  • E-commerce next-item prediction (what to buy next)
  • Video streaming next-video recommendations
  • Music playlist continuation
  • News feed article sequences
  • Gaming item progression recommendations
  • Any domain with clear sequential user behavior

Reference