SASRec (Sequential)
Description
The SASRec (Self-Attentive Sequential Recommendation) policy utilizes the Transformer architecture's self-attention mechanism to model user interaction sequences. By weighing the importance of all previous items, it captures both short-term and long-range dependencies to predict the next item the user is likely to interact with.
Policy Type: sasrec
Supports: embedding_policy, scoring_policy
Hyperparameter tuning
batch_size: Number of samples processed before updating model weights.eval_batch_size: Batch size used during model evaluation.n_epochs: Number of complete passes through the training dataset.negative_samples_count: Number of negative samples per positive sample for contrastive learning.devicehidden_size: Size of the hidden layers in the transformer.inner_size: Size of the feed-forward network inner layer.learning_rate: Learning rate for gradient descent optimization.attn_dropout_prob: Dropout probability for attention layers.hidden_act: Activation function.hidden_dropout_prob: Dropout probability for hidden layers.n_heads: Number of attention heads in the transformer.n_layers: Number of transformer layers.layer_norm_epsinitializer_rangemask_rate: Fraction of tokens to mask during training.loss_typemax_seq_length: Maximum length of input sequences.sample_strategyappend_item_features: Whether to append item features.append_item_embeddings: Whether to append item embeddings.use_candidate_embeddings: Whether to use candidate embeddings.sample_seedsample_ratioeval_stepearly_stopping_step
V1 API
policy_configs:
scoring_policy: # Can also be used under embedding_policy
policy_type: sasrec
# Training Hyperparameters
batch_size: 1000 # Samples per training batch
n_epochs: 1 # Number of training epochs
negative_samples_count: 2 # Negative samples per positive for contrastive loss
learning_rate: 0.001 # Optimizer learning rate
# Architecture Hyperparameters
hidden_size: 64 # Dimensionality of hidden layers/embeddings
n_heads: 2 # Number of self-attention heads
n_layers: 2 # Number of Transformer layers
attn_dropout_prob: 0.2 # Dropout rate in attention mechanism
hidden_act: "gelu" # Activation function (e.g., "gelu", "relu")
max_seq_length: 50 # Maximum input sequence length
Usage
Use this model when:
- You have sequential interaction data with clear temporal order
- You need to predict the "next item" in a sequence
- You want to capture both short-term and long-term patterns
- You need state-of-the-art sequential recommendation performance
- You're building session-based or sequential recommendation systems
Choose a different model when:
- You don't have sequential or temporal data
- You want general item similarity rather than next-item prediction
- You need bidirectional context understanding (use BERT4Rec)
- You have very short sequences (Item2Vec might be simpler)
- You want to leverage item content features primarily (use Two-Tower)
Use cases
- E-commerce next-item prediction (what to buy next)
- Video streaming next-video recommendations
- Music playlist continuation
- News feed article sequences
- Gaming item progression recommendations
- Any domain with clear sequential user behavior
Reference
- Kang, W. C., & McAuley, J. (2018). Self-attentive sequential recommendation. ICDM.