BeeFormer (Neural Retrieval)
Description
The beeFormer policy implements a recommendation model that fine-tunes pre-trained sentence Transformers using user-item interaction data. It aims to bridge the gap between purely semantic similarity (from the language model) and interaction-based similarity (from collaborative patterns). It leverages a base sentence Transformer as an encoder and often uses an interaction-based loss (conceptually similar to ELSA's reconstruction loss) to update the Transformer's weights. This produces embeddings that reflect both content meaning and behavioral patterns, potentially improving cold-start performance and enabling knowledge transfer.
Policy Type: beeformer
Supports: embedding_policy, scoring_policy
Hyperparameter tuning
deviceseed: Random seed for reproducibility.lr: Learning rate for gradient descent optimization.use_scheduler: Whether to use a learning rate scheduler.epochs: Number of complete passes through the training dataset.max_output: Negative sampling hyperparameter, uniform.batch_size: Number of samples processed before updating model weights.top_k: Optimize only for top-k predictions on the output.embedderuse_images: Use image features.max_seq_length: Maximum sequence length for tokenized input.embedder_batch_size: Batch size for the embedder model (e.g., BERT).train_distributed: Train on multiple devices.
V1 API
policy_configs:
embedding_policy: # Or scoring_policy
policy_type: beeformer
max_seq_length: 384 # Max sequence length for the transformer input
# Training Hyperparameters
batch_size: 1 # Samples per training batch (may use gradient accumulation internally)
epochs: 1 # Total training epochs
lr: 1e-5 # Learning rate (typically small for fine-tuning)
max_output: 1 # Related to negative sampling/loss calculation (from source doc)
top_k: 0 # Restrict optimization (from source doc, 0=no restriction)
embedder_batch_size: 100 # Internal batch size for base transformer inference
Usage
Use this model when:
- You have rich text content (product descriptions, titles, reviews)
- You want to combine semantic understanding with behavioral signals
- You need good cold-start performance for new items
- You want to leverage pre-trained language model knowledge
- You have items with detailed textual descriptions
Choose a different model when:
- You have minimal or no text content for items
- You only have interaction data without item descriptions
- You want a pure collaborative filtering approach
Use cases
- E-commerce with detailed product descriptions (fashion, furniture, electronics)
- Content platforms (articles, blog posts, research papers)
- Job boards with detailed job descriptions
- Real estate with property descriptions
- Books, movies, or media with synopses and reviews
- Any domain where semantic understanding of text content matters
References
- Vančura, V., Kordík, P., & Straka, M. (2024). beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems. RecSys '24 / arXiv.
- Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. EMNLP.
- Vančura, V., et al. (2022). Scalable Linear Shallow Autoencoder for Collaborative Filtering. WSDM. (ELSA paper, relevant to beeFormer's loss).