Understanding feature_parallel distributed learning algorithm in LightGBMClassifier
I want to understand feature_parallel
algorithm in LightGBMClassifier.
It describes how it is done traditionally and how LightGBM
aims to improve it
The two ways are as follows (verbatim from linked site):
Traditional Feature_parallel:
Feature parallel aims to parallelize the “Find Best Split” in the decision tree. The procedure of traditional feature parallel is:
- Partition data vertically (different machines have different feature set).
- Workers find local best split point {feature, threshold} on local feature set.
- Communicate local best splits with each other and get the best one.
- Worker with best split to perform split, then send the split result of data to other workers.
Feature_parallel in LightGBM: (Problem text bold and italicized)
Since feature parallel cannot speed up well when #data is large, we make a little change: instead of partitioning data vertically, every worker holds the full data. Thus, LightGBM doesn’t need to communicate for the split result of data since every worker knows how to split data
The procedure of feature parallel in LightGBM: (Problem text bold and italicized)
Workers find local best split point {feature, threshold} on the local feature set. (Why call it local? Since it is being found on the entire data this is the global best split)
Communicate local best splits with each other and get the best one. (But, earlier it was mentioned that Thus, LightGBM doesn’t need to communicate for the split result of data since every worker knows how to split data)
Perform best split. (Did we not already have the best split?)
Moreover, If each worker is retaining the entire dataset then it is capable of creating the entire estimator itself. What is distributed learning here?
Topic gradient-boosting-decision-trees lightgbm decision-trees
Category Data Science