BERT4Rec (Sequential)
Description
The BERT4Rec policy adapts the bidirectional Transformer architecture (BERT) for sequential recommendation. Unlike unidirectional models (like SASRec), it uses bidirectional self-attention and is typically trained using a masked item prediction objective (predicting masked items based on both past and future context within the sequence). This allows it to learn rich, context-aware item representations.
Policy Type: bert4rec
Supports: embedding_policy
, scoring_policy
Configuration Example
scoring_policy_bert4rec.yaml
policy_configs:
scoring_policy: # Can also be used under embedding_policy
policy_type: bert4rec
# Training Hyperparameters
batch_size: 1000 # Samples per training batch
n_epochs: 1 # Number of training epochs
negative_samples_count: 2 # Negative samples (often relevant for loss calculation)
learning_rate: 0.001 # Optimizer learning rate
dropout_rate: 0.2 # General dropout rate for regularization
# Architecture Hyperparameters
hidden_size: 64 # Dimensionality of hidden layers/embeddings
n_heads: 2 # Number of self-attention heads
n_layers: 2 # Number of Transformer layers
max_seq_length: 50 # Maximum input sequence length
Reference
- Sun, F., et al. (2019). BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. CIKM.