Skip to main content

BeeFormer (Neural Retrieval)

Description

The beeFormer policy implements a recommendation model that fine-tunes pre-trained sentence Transformers using user-item interaction data. It aims to bridge the gap between purely semantic similarity (from the language model) and interaction-based similarity (from collaborative patterns). It leverages a base sentence Transformer as an encoder and often uses an interaction-based loss (conceptually similar to ELSA's reconstruction loss) to update the Transformer's weights. This produces embeddings that reflect both content meaning and behavioral patterns, potentially improving cold-start performance and enabling knowledge transfer.

Policy Type: beeformer Supports: embedding_policy, scoring_policy

Hyperparameter tuning

  • device
  • seed: Random seed for reproducibility.
  • lr: Learning rate for gradient descent optimization.
  • use_scheduler: Whether to use a learning rate scheduler.
  • epochs: Number of complete passes through the training dataset.
  • max_output: Negative sampling hyperparameter, uniform.
  • batch_size: Number of samples processed before updating model weights.
  • top_k: Optimize only for top-k predictions on the output.
  • embedder
  • use_images: Use image features.
  • max_seq_length: Maximum sequence length for tokenized input.
  • embedder_batch_size: Batch size for the embedder model (e.g., BERT).
  • train_distributed: Train on multiple devices.

V1 API

policy_configs:
embedding_policy: # Or scoring_policy
policy_type: beeformer
max_seq_length: 384 # Max sequence length for the transformer input
# Training Hyperparameters
batch_size: 1 # Samples per training batch (may use gradient accumulation internally)
epochs: 1 # Total training epochs
lr: 1e-5 # Learning rate (typically small for fine-tuning)
max_output: 1 # Related to negative sampling/loss calculation (from source doc)
top_k: 0 # Restrict optimization (from source doc, 0=no restriction)
embedder_batch_size: 100 # Internal batch size for base transformer inference

Usage

Use this model when:

  • You have rich text content (product descriptions, titles, reviews)
  • You want to combine semantic understanding with behavioral signals
  • You need good cold-start performance for new items
  • You want to leverage pre-trained language model knowledge
  • You have items with detailed textual descriptions

Choose a different model when:

  • You have minimal or no text content for items
  • You only have interaction data without item descriptions
  • You want a pure collaborative filtering approach

Use cases

  • E-commerce with detailed product descriptions (fashion, furniture, electronics)
  • Content platforms (articles, blog posts, research papers)
  • Job boards with detailed job descriptions
  • Real estate with property descriptions
  • Books, movies, or media with synopses and reviews
  • Any domain where semantic understanding of text content matters

References