SASRec (Sequential)

Description

The SASRec (Self-Attentive Sequential Recommendation) policy utilizes the Transformer architecture's self-attention mechanism to model user interaction sequences. By weighing the importance of all previous items, it captures both short-term and long-range dependencies to predict the next item the user is likely to interact with.

Policy Type: sasrec Supports: embedding_policy, scoring_policy

Hyperparameter tuning

event_values: List of event value strings to filter interactions by.
batch_size: Number of samples processed before updating model weights.
eval_batch_size: Batch size used during model evaluation.
n_epochs: Number of complete passes through the training data.
negative_samples_count: Number of negative samples per positive sample for contrastive learning.
device
hidden_size: Size of the hidden layers in the transformer.
inner_size: Size of the feed-forward network inner layer.
learning_rate: Learning rate for gradient descent optimization.
attn_dropout_prob: Dropout probability for attention layers.
hidden_act: Activation function.
hidden_dropout_prob: Dropout probability for hidden layers.
n_heads: Number of attention heads in the transformer.
n_layers: Number of transformer layers.
layer_norm_eps
initializer_range
mask_rate: Fraction of tokens to mask during training.
loss_type
max_seq_length: Maximum length of input sequences.
sample_strategy
append_item_features: Whether to append item features.
append_item_embeddings: Whether to append item embeddings.
use_candidate_embeddings: Whether to use candidate embeddings.
sample_seed
sample_ratio
eval_step
early_stopping_step

V1 API

policy_configs:
  scoring_policy: # Can also be used under embedding_policy
    policy_type: sasrec
    # Training Hyperparameters
    batch_size: 1000           # Samples per training batch
    n_epochs: 1                # Number of training epochs
    negative_samples_count: 2  # Negative samples per positive for contrastive loss
    learning_rate: 0.001       # Optimizer learning rate
    # Architecture Hyperparameters
    hidden_size: 64            # Dimensionality of hidden layers/embeddings
    n_heads: 2                 # Number of self-attention heads
    n_layers: 2                # Number of Transformer layers
    attn_dropout_prob: 0.2     # Dropout rate in attention mechanism
    hidden_act: "gelu"         # Activation function (e.g., "gelu", "relu")
    max_seq_length: 50         # Maximum input sequence length

Usage

Use this model when:

You have sequential interaction data with clear temporal order
You need to predict the "next item" in a sequence
You want to capture both short-term and long-term patterns
You need state-of-the-art sequential recommendation performance
You're building session-based or sequential recommendation systems

Choose a different model when:

You don't have sequential or temporal data
You want general item similarity rather than next-item prediction
You need bidirectional context understanding (use BERT4Rec)
You have very short sequences (Item2Vec might be simpler)
You want to leverage item content features primarily (use Two-Tower)

Use cases

E-commerce next-item prediction (what to buy next)
Video streaming next-video recommendations
Music playlist continuation
News feed article sequences
Gaming item progression recommendations
Any domain with clear sequential user behavior

Reference

Kang, W. C., & McAuley, J. (2018). Self-attentive sequential recommendation. ICDM.

Description​

Hyperparameter tuning​

Usage​

Use this model when:​

Choose a different model when:​

Use cases​

Reference​