BERT4Rec (Sequential)

Description

The BERT4Rec policy adapts the bidirectional Transformer architecture (BERT) for sequential recommendation. Unlike unidirectional models (like SASRec), it uses bidirectional self-attention and is typically trained using a masked item prediction objective (predicting masked items based on both past and future context within the sequence). This allows it to learn rich, context-aware item representations.

Policy Type: bert4rec Supports: embedding_policy, scoring_policy

Configuration Example

scoring_policy_bert4rec.yaml
policy_configs:
  scoring_policy: # Can also be used under embedding_policy
    policy_type: bert4rec
    # Training Hyperparameters
    batch_size: 1000           # Samples per training batch
    n_epochs: 1                # Number of training epochs
    negative_samples_count: 2  # Negative samples (often relevant for loss calculation)
    learning_rate: 0.001       # Optimizer learning rate
    dropout_rate: 0.2          # General dropout rate for regularization
    # Architecture Hyperparameters
    hidden_size: 64            # Dimensionality of hidden layers/embeddings
    n_heads: 2                 # Number of self-attention heads
    n_layers: 2                # Number of Transformer layers
    max_seq_length: 50         # Maximum input sequence length

Reference

Sun, F., et al. (2019). BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. CIKM.

BERT4Rec (Sequential)

Description​

Configuration Example​

Reference​

Description

Configuration Example

Reference