Skip to main content

DeepFM (Neural Scoring)

Description

The DeepFM (Deep Factorization Machine) policy combines a Factorization Machine (FM) component and a deep neural network (DNN) component, sharing input feature embeddings.

  • FM Component: Models low-order interactions (linear features and pairwise/2nd-order interactions) efficiently using the dot product of feature embeddings.
  • Deep Component: An MLP that learns high-order, non-linear feature interactions implicitly from the shared embeddings.

This avoids the need for manual feature crossing (unlike the wide part of Wide & Deep) and captures a spectrum of interactions. Outputs from both components are combined for the final prediction.

Policy Type: deepfm Supports: scoring_policy

Premium Model

This model requires the Standard Plan or higher.

Hyperparameter tuning

  • embedding_dim: Dimensionality of feature embeddings (shared).
  • deep_hidden_units: Layer sizes for the deep MLP component.
  • activation_fn: Activation for deep layers.
  • dropout: Dropout rate for deep layers.
  • learning_rate: Optimizer learning rate.

V1 API

policy_configs:
scoring_policy:
policy_type: deepfm
# Architecture
embedding_dim: 16 # Dimensionality of feature embeddings (shared)
deep_hidden_units: [128, 64, 32] # Layer sizes for the deep MLP component
activation_fn: "relu" # Activation for deep layers
dropout: 0.2 # Dropout rate for deep layers
# Training Control
learning_rate: 0.001 # Optimizer learning rate

Usage

Use this model when:

  • You need a strong CTR or conversion prediction model that can learn both low- and high-order feature interactions
  • You have rich categorical and continuous features and want to avoid manual feature crossing
  • You care about modeling interaction terms between many sparse features (e.g., user, item, and context features)
  • You want a neural scoring model that generalizes better than pure linear or FM-based approaches alone

Choose a different model when:

  • You have relatively simple feature sets and want a fast, tree-based baseline (use LightGBM or XGBoost)
  • You do not have the infrastructure or appetite to train and serve neural models in production
  • You mainly need retrieval-stage embeddings rather than point-wise scoring (use Two-Tower, ALS, or BeeFormer)
  • You primarily need to capture sequential patterns in user behavior (use SASRec, BERT4Rec, or other sequential models)

Use cases

  • CTR prediction for feeds, recommendations, and advertising with many categorical IDs
  • Ranking items in e-commerce or content platforms using rich user, item, and context features
  • Scoring candidates in a ranking stage after retrieval from embeddings or search
  • Personalization tasks where complex feature interactions drive performance

References