Skip to main content

BERT4Rec (Sequential)

info

This is a preview of the new Shaped docs. Found an issue or have feedback? Let us know!

Description

The BERT4Rec policy adapts the bidirectional Transformer architecture (BERT) for sequential recommendation. Unlike unidirectional models (like SASRec), it uses bidirectional self-attention and is typically trained using a masked item prediction objective (predicting masked items based on both past and future context within the sequence). This allows it to learn rich, context-aware item representations.

Policy Type: bert4rec Supports: embedding_policy, scoring_policy

Configuration Example

scoring_policy_bert4rec.yaml
policy_configs:
scoring_policy: # Can also be used under embedding_policy
policy_type: bert4rec
# Training Hyperparameters
batch_size: 1000 # Samples per training batch
n_epochs: 1 # Number of training epochs
negative_samples_count: 2 # Negative samples (often relevant for loss calculation)
learning_rate: 0.001 # Optimizer learning rate
dropout_rate: 0.2 # General dropout rate for regularization
# Architecture Hyperparameters
hidden_size: 64 # Dimensionality of hidden layers/embeddings
n_heads: 2 # Number of self-attention heads
n_layers: 2 # Number of Transformer layers
max_seq_length: 50 # Maximum input sequence length

Reference