Skip to main content

LightGBM (GBT)

Description

The LightGBM policy utilizes the LightGBM framework, a highly efficient gradient boosting implementation. It builds an ensemble of decision trees sequentially, where each new tree corrects errors made by the previous ones. It's optimized for speed and memory usage, handles large datasets well, and supports various objectives, including classification, regression, and specialized learning-to-rank objectives like LambdaRank (implementing LambdaMART).

Policy Type: lightgbm Supports: scoring_policy

Premium Model

This model requires the Standard Plan or higher.

Hyperparameter tuning

  • event_values: List of event value strings to filter interactions by.
  • objective: Objective function (regression, binary, lambdarank, rank_xendcg).
  • n_estimators: Number of boosting iterations.
  • max_depth: Maximum depth of the tree. -1 means no limit.
  • num_leaves: Maximum number of leaves in one tree.
  • min_child_weight: Minimum sum Hessian in one leaf.
  • learning_rate: Learning rate (shrinkage) for gradient boosting.
  • colsample_bytree: Subsample columns on each iteration.
  • subsample: Subsample training data on each iteration.
  • subsample_freq: Bagging frequency.
  • zero_as_missing: Treat zero values as missing.
  • bin_construct_sample_cnt: Number of samples used to construct bins.
  • verbose
  • verbose_eval
  • num_threads
  • enable_resume: Whether to enable resume functionality.
  • lambdarank_truncation_level: Number of pairs used in pairwise loss. Should be set to slightly higher than the k value used for NDCG@k.
  • calibrate: Whether to calibrate output probabilities.
  • event_value_user_affinity_features: Whether to use event value user affinity features.
  • event_value_affinity_features_value_filter
  • rolling_window_hours
  • negative_affinity_features: Whether to use negative affinity features.
  • content_affinity_features: Whether to use content affinity features.
  • content_affinity_features_batch_size: Batch size for content affinity features.
  • content_affinity_max_num_latest_items
  • container_categorical_to_multi_hot: Whether to convert container categorical to multi-hot.
  • container_to_container_affinities: Whether to use container to container affinities.
  • point_in_time_item_feature: Whether to use point in time item feature.
  • drop_user_id: Whether to drop user ID.
  • drop_item_id: Whether to drop item ID.
  • early_stopping_rounds: Number of rounds for early stopping.

V1 API

policy_configs:
scoring_policy:
policy_type: lightgbm
# Core Parameters
objective: "lambdarank" # Key for LTR: "binary", "regression", "lambdarank", "rank_xendcg"
n_estimators: 1000 # Number of trees (boosting rounds)
learning_rate: 0.05 # Step size shrinkage
# Tree Structure Parameters
max_depth: 8 # Max tree depth (-1 for no limit)
num_leaves: 31 # Max leaves per tree (consider tuning based on data)
min_child_weight: 1e-3 # Minimum sum of instance weight (hessian) needed in a child
# Regularization & Sampling Parameters
colsample_bytree: 0.8 # Fraction of features considered per tree
subsample: 0.8 # Fraction of data sampled per boosting iteration (bagging)
subsample_freq: 5 # Frequency for bagging (0 means disabled)
# Other Parameters
calibrate: false # Calibrate output probabilities (usually for binary objective)

Usage

Use this model when:

  • You have large-scale datasets with millions of rows and many categorical features
  • You need fast training and inference for ranking or CTR prediction workloads
  • You want a learning-to-rank objective such as LambdaRank/LambdaMART for slate ranking
  • You need a production-ready, memory-efficient GBDT implementation with strong baseline performance
  • You want to combine rich feature sets (behavioral, content, and affinity features) into a single scoring model

Choose a different model when:

  • You primarily work with medium-sized datasets and prefer stronger regularization defaults (use XGBoost)
  • You want to explicitly learn high-order feature interactions via deep networks (use DeepFM or Wide & Deep)
  • You have extremely sparse interaction-only data and no rich feature set (use ALS/ELSA or other embedding models)
  • You need sequence-aware modeling of user behavior (use SASRec, BERT4Rec, or other sequential models)
  • You only need a simple trending or heuristic baseline (use Rising Popularity or value-model expressions)

Use cases

  • E-commerce CTR prediction and ranking for product search and browse results
  • Feed ranking and homepage personalization using mixed behavioral and content features
  • Ad ranking and sponsored content placement where learning-to-rank objectives are important
  • Re-ranking of retrieved candidates in multi-stage recommendation architectures
  • Any tabular recommendation or ranking problem with a wide variety of engineered features

References