LightGBM (GBT)
Description
The LightGBM policy utilizes the LightGBM framework, a highly efficient gradient boosting implementation. It builds an ensemble of decision trees sequentially, where each new tree corrects errors made by the previous ones. It's optimized for speed and memory usage, handles large datasets well, and supports various objectives, including classification, regression, and specialized learning-to-rank objectives like LambdaRank (implementing LambdaMART).
Policy Type: lightgbm
Supports: scoring_policy
Hyperparameter tuning
objective: Objective function (regression, binary, lambdarank, rank_xendcg).n_estimators: Number of boosting iterations.max_depth: Maximum depth of the tree. -1 means no limit.num_leaves: Maximum number of leaves in one tree.min_child_weight: Minimum sum Hessian in one leaf.learning_rate: Learning rate (shrinkage) for gradient boosting.colsample_bytree: Subsample columns on each iteration.subsample: Subsample training data on each iteration.subsample_freq: Bagging frequency.zero_as_missing: Treat zero values as missing.bin_construct_sample_cnt: Number of samples used to construct bins.verboseverbose_evalnum_threadsenable_resume: Whether to enable resume functionality.lambdarank_truncation_level: Number of pairs used in pairwise loss. Should be set to slightly higher than the k value used for NDCG@k.calibrate: Whether to calibrate output probabilities.event_value_user_affinity_features: Whether to use event value user affinity features.event_value_affinity_features_value_filterrolling_window_hoursnegative_affinity_features: Whether to use negative affinity features.content_affinity_features: Whether to use content affinity features.content_affinity_features_batch_size: Batch size for content affinity features.content_affinity_max_num_latest_itemscontainer_categorical_to_multi_hot: Whether to convert container categorical to multi-hot.container_to_container_affinities: Whether to use container to container affinities.point_in_time_item_feature: Whether to use point in time item feature.drop_user_id: Whether to drop user ID.drop_item_id: Whether to drop item ID.early_stopping_rounds: Number of rounds for early stopping.
V1 API
policy_configs:
scoring_policy:
policy_type: lightgbm
# Core Parameters
objective: "lambdarank" # Key for LTR: "binary", "regression", "lambdarank", "rank_xendcg"
n_estimators: 1000 # Number of trees (boosting rounds)
learning_rate: 0.05 # Step size shrinkage
# Tree Structure Parameters
max_depth: 8 # Max tree depth (-1 for no limit)
num_leaves: 31 # Max leaves per tree (consider tuning based on data)
min_child_weight: 1e-3 # Minimum sum of instance weight (hessian) needed in a child
# Regularization & Sampling Parameters
colsample_bytree: 0.8 # Fraction of features considered per tree
subsample: 0.8 # Fraction of data sampled per boosting iteration (bagging)
subsample_freq: 5 # Frequency for bagging (0 means disabled)
# Other Parameters
calibrate: false # Calibrate output probabilities (usually for binary objective)
References
- Ke, G., et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NeurIPS.
- Burges, C. J. C. (2010). From RankNet to LambdaRank to LambdaMART: An Overview. Microsoft Research Technical Report. (LambdaMART explanation).