BeeFormer (Neural Retrieval)

Description

The beeFormer policy implements a recommendation model that fine-tunes pre-trained sentence Transformers using user-item interaction data. It aims to bridge the gap between purely semantic similarity (from the language model) and interaction-based similarity (from collaborative patterns). It leverages a base sentence Transformer as an encoder and often uses an interaction-based loss (conceptually similar to ELSA's reconstruction loss) to update the Transformer's weights. This produces embeddings that reflect both content meaning and behavioral patterns, potentially improving cold-start performance and enabling knowledge transfer.

Policy Type: beeformer Supports: embedding_policy, scoring_policy

Premium Model

This model requires the Standard Plan or higher.

Hyperparameter tuning

event_values: List of event value strings to filter interactions by.
device
seed: Random seed for reproducibility.
lr: Learning rate for gradient descent optimization.
use_scheduler: Whether to use a learning rate scheduler.
epochs: Number of complete passes through the training data.
max_output: Negative sampling hyperparameter, uniform.
batch_size: Number of samples processed before updating model weights.
top_k: Optimize only for top-k predictions on the output.
embedder
use_images: Use image features.
max_seq_length: Maximum sequence length for tokenized input.
embedder_batch_size: Batch size for the embedder model (e.g., BERT).
train_distributed: Train on multiple devices.

V1 API

policy_configs:
  embedding_policy: # Or scoring_policy
    policy_type: beeformer
    max_seq_length: 384       # Max sequence length for the transformer input
    # Training Hyperparameters
    batch_size: 1               # Samples per training batch (may use gradient accumulation internally)
    epochs: 1                   # Total training epochs
    lr: 1e-5                    # Learning rate (typically small for fine-tuning)
    max_output: 1               # Related to negative sampling/loss calculation (from source doc)
    top_k: 0                    # Restrict optimization (from source doc, 0=no restriction)
    embedder_batch_size: 100    # Internal batch size for base transformer inference

Usage

Use this model when:

You have rich text content (product descriptions, titles, reviews)
You want to combine semantic understanding with behavioral signals
You need good cold-start performance for new items
You want to leverage pre-trained language model knowledge
You have items with detailed textual descriptions

Choose a different model when:

You have minimal or no text content for items
You only have interaction data without item descriptions
You want a pure collaborative filtering approach

Use cases

E-commerce with detailed product descriptions (fashion, furniture, electronics)
Content platforms (articles, blog posts, research papers)
Job boards with detailed job descriptions
Real estate with property descriptions
Books, movies, or media with synopses and reviews
Any domain where semantic understanding of text content matters

References

Vančura, V., Kordík, P., & Straka, M. (2024). beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems. RecSys '24 / arXiv.
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. EMNLP.
Vančura, V., et al. (2022). Scalable Linear Shallow Autoencoder for Collaborative Filtering. WSDM. (ELSA paper, relevant to beeFormer's loss).

Description​

Hyperparameter tuning​

Usage​

Use this model when:​

Choose a different model when:​

Use cases​

References​