Tabular: Added XGBoost Model#691
Conversation
|
Job PR-691-1 is done. |
|
This looks awesome! Really nicely put together. I will take a deeper look tomorrow and provide review feedback, as well as pull this into my local machine and test it on some datasets. |
Innixma
left a comment
There was a problem hiding this comment.
Added some initial comments. We are in code-freeze at present as we are working on a major modularization PR. This should be complete in 2 weeks at which point we can rebase this PR with mainline to prepare for merging. My initial basic tests and runs look good and I will plan to benchmark it more heavily after modularization is merged into mainline.
Thanks again for the high quality PR!
autogluon/utils/tabular/ml/models/xgboost/hyperparameters/searchspaces.py
Outdated
Show resolved
Hide resolved
|
Sorry for late reply. I had a long vacation at last week. Thanks! |
Innixma
left a comment
There was a problem hiding this comment.
Thanks for the changes, looks good! Last thing before benchmarking will be rebasing with mainline. We are still moving a few things around after modularization and I will let you know later this week when it is ready to rebase.
|
Mainline should be stable now, feel free to rebase. Things to be aware of for rebase:
|
eab8d5e to
4dda5d1
Compare
|
|
Job PR-691-11 is done. |
There was a problem hiding this comment.
@sackoh Benchmark results are good! Here are the results:
AutoGluon_4h_2020_11_02_xgb_XGBoostClassifier VS all
framework > AutoGluon_4h_2020_11_02_xgb_XGBoostClassifier < AutoGluon_4h_2020_11_02_xgb_XGBoostClassifier = AutoGluon_4h_2020_11_02_xgb_XGBoostClassifier % Less Avg. Errors Than AutoGluon_4h_2020_11_02_xgb_XGBoostClassifier time_train_s metric_error time_infer_s loss_rescaled rank rank=1_count rank=2_count rank=3_count rank>3_count error_count
0 AutoGluon_4h_2020_11_02_xgb_CatboostClassifier 23 14 0 6.098552 258.030727 0.275211 0.041744 0.072145 3.470588 8 10 3 16 0
1 AutoGluon_4h_2020_11_02_xgb_XGBoostClassifier 0 0 37 0.000000 608.679669 0.280103 1.821426 0.097839 3.926471 2 7 9 19 0
2 AutoGluon_4h_2020_11_02_xgb_LightGBMClassifierXT 20 16 1 -1.879454 14.797889 0.308314 0.123659 0.119923 4.279412 3 5 8 21 0
3 AutoGluon_4h_2020_11_02_xgb_LightGBMClassifier... 13 21 0 -6.096218 56.137859 0.304770 0.341352 0.127638 4.485294 7 2 2 23 3
4 AutoGluon_4h_2020_11_02_xgb_LightGBMClassifier 6 31 0 -9.069855 16.127528 0.334640 0.082547 0.184649 5.485294 1 3 3 30 0
5 AutoGluon_4h_2020_11_02_xgb_NeuralNetClassifier 12 25 0 -10.257584 87.022387 0.301530 0.737617 0.183621 6.441176 8 0 3 26 0
6 AutoGluon_4h_2020_11_02_xgb_RandomForestClassi... 9 27 0 -15.483156 27.738800 0.353887 0.605831 0.199359 6.500000 1 3 1 31 1
7 AutoGluon_4h_2020_11_02_xgb_RandomForestClassi... 9 27 0 -16.195367 17.687104 0.351978 1.356308 0.196360 6.661765 2 1 2 31 1
8 AutoGluon_4h_2020_11_02_xgb_ExtraTreesClassifi... 7 29 0 -17.629729 6.924649 0.396179 1.014320 0.229516 7.117647 2 3 2 29 1
9 AutoGluon_4h_2020_11_02_xgb_ExtraTreesClassifi... 7 29 0 -18.391755 6.582585 0.394828 1.222700 0.231477 7.367647 2 2 1 31 1
10 AutoGluon_4h_2020_11_02_xgb_KNeighborsClassifi... 2 35 0 -56.051629 1.742805 0.887229 7.447910 0.903906 10.823529 1 0 1 35 0
11 AutoGluon_4h_2020_11_02_xgb_KNeighborsClassifi... 2 35 0 -56.862477 1.741305 0.891297 7.526989 0.931777 11.441176 0 1 0 36 0
XGBoost is the 2nd best single model, beaten only by CatBoost on average. XGBoost had 0 failures which is great.
A few things to take note of:
-
XGBoost takes a very long time to train, averaging 2.5x longer than CatBoost and 40x longer than LightGBM. This could in part be due to the lower learning rate used (CatBoost and LightGBM are using 0.1, but they might do better with 0.03 as you did). CatBoost was previously the slowest model we had, so XGBoost is very slow compared to most of the models.
-
XGBoost takes a long time to infer. It is 20x slower than LightGBM and 50x slower than CatBoost. This is somewhat of a concern for users who want fast inference speed, so it might be good for us to look into ways to speed this up in future.
Because of these concerns, I'll recommend to not add XGBoost to the default configs yet until we deep dive more into potential solutions for training/inference speed (GPU acceleration perhaps?), and to avoid adding a new hard dependency into the requirements (will plan to add as optional dependency via pip install autogluon.tabular['xgboost'] and be part of a future pip install autogluon.tabular['full'] alongside FastAI, Torch, and other models.
I've added other comments in to improve the current code. As an example, early stopping is still using the final iteration instead of the best iteration, which once fixed will likely improve XGBoost's performance by a good margin. Once these comments are addressed, I'll be happy to approve and merge the PR!
Thanks again for all the work that went into this!
tabular/src/autogluon/tabular/task/tabular_prediction/hyperparameter_configs.py
Show resolved
Hide resolved
tabular/src/autogluon/tabular/task/tabular_prediction/tabular_prediction.py
Show resolved
Hide resolved
| if feature == original_feature: | ||
| importance_dict[feature] += value | ||
|
|
||
| return importance_dict |
There was a problem hiding this comment.
Can you add (requires rebase first):
from ...features.feature_metadata import R_OBJECT
def _get_default_auxiliary_params(self) -> dict:
default_auxiliary_params = super()._get_default_auxiliary_params()
extra_auxiliary_params = dict(
ignored_type_group_raw=[R_OBJECT],
)
default_auxiliary_params.update(extra_auxiliary_params)
return default_auxiliary_params
This way the model will no longer crash if given object dtype input, and will just drop it instead. This is in preparation for multi-modal tabular+text support (#756)
There was a problem hiding this comment.
Great! I really wanted the functions to ignore object dtype.
|
|
||
| bst = self.model.get_booster() | ||
| self.params_trained['n_estimators'] = bst.best_iteration + 1 | ||
| self.params_trained['best_ntree_limit'] = bst.best_ntree_limit |
There was a problem hiding this comment.
Currently, the model is using the final trained iteration during prediction instead of the best iteration (found during early stopping):
[757] validation_0-logloss:0.26971
Stopping. Best iteration:
[721] validation_0-logloss:0.26962
Saving AutogluonModels/ag-20201103_200907/models/XGBoostClassifier/model.pkl
-0.2697 = Validation log_loss score
29.67s = Training runtime
0.16s = Validation runtime
I believe that the predict call has to be updated:
self.model.predict(data, ntree_limit=bst.best_ntree_limit)
Therefore, at the end of _fit, we can set a variable self._best_ntree_limit = bst.best_ntree_limit
and then call with:
self.model.predict(data, ntree_limit=self._best_ntree_limit)
This is the new code that should be added:
def _predict_proba(self, X, **kwargs):
X = self.preprocess(X, **kwargs)
if self.problem_type == REGRESSION:
return self.model.predict(X, ntree_limit=self._best_ntree_limit)
y_pred_proba = self.model.predict_proba(X, ntree_limit=self._best_ntree_limit)
if self.problem_type == BINARY:
if len(y_pred_proba.shape) == 1:
return y_pred_proba
elif y_pred_proba.shape[1] > 1:
return y_pred_proba[:, 1]
else:
return y_pred_proba
elif y_pred_proba.shape[1] > 2:
return y_pred_proba
else:
return y_pred_proba[:, 1]
Once this is added, the correct iteration is used:
[757] validation_0-logloss:0.26971
Stopping. Best iteration:
[721] validation_0-logloss:0.26962
Saving AutogluonModels/ag-20201103_202139/models/XGBoostClassifier/model.pkl
-0.2696 = Validation log_loss score
29.17s = Training runtime
0.12s = Validation runtime
There was a problem hiding this comment.
Yes, that's a good point. The model is using the last trained iterations at the time early stopped, not best iteration. I was trying not to override _predict_proba() and considering whether to use best_ntree_limit for prediction. I had a plan, but I forgot to this. 😢 That's the reason I wrote following code.
self.params_trained['best_ntree_limit'] = bst.best_ntree_limit
I appreciate your thoughtful review. I will add the codes.
| ) | ||
|
|
||
| bst = self.model.get_booster() | ||
| self.params_trained['n_estimators'] = bst.best_iteration + 1 |
There was a problem hiding this comment.
Can actually replace with simply self.params_trained['n_estimators'] = bst.best_ntree_limit as bst.best_ntree_limit == bst.best_iteration + 1
There was a problem hiding this comment.
Yes, there are same. I will replace it simply.
| i = env.iteration | ||
| if i % period == 0 or i + 1 == env.begin_iteration or i + 1 == env.end_iteration: | ||
| msg = '\t'.join([_fmt_metric(x, show_stdv) for x in env.evaluation_result_list]) | ||
| logger.log(20, '[%d]\t%s\n' % (i, msg)) |
There was a problem hiding this comment.
No need for \n in the log messages, as logger.log already adds a \n to every message. This applies to every log message in early_stop_custom as well.
There was a problem hiding this comment.
I will remove all \n at the end every log message.
| hyperparams = {'NN': {'num_epochs': 10, 'activation': 'relu', 'dropout_prob': ag.Real(0.0,0.5)}, | ||
| 'GBM': {'num_boost_round': 1000, 'learning_rate': ag.Real(0.01,0.1,log=True)} } | ||
| 'GBM': {'num_boost_round': 1000, 'learning_rate': ag.Real(0.01,0.1,log=True)}, | ||
| 'XGB': {'n_estimators': 1000, 'learning_rate': ag.Real(0.01,0.1,log=True)} } |
There was a problem hiding this comment.
Not sure why we would specifically add XGboost to this example. The example is just supposed to illustrate how users can exert more control over fit(), not to highlight what models are available.
Instead I think we should add a dedicated unit test that evaluates just the XGBoost model alone. @Innixma what do you think?
There was a problem hiding this comment.
Ideally we want a unit test for all of our models, so I think thats something that we can do after the PR to avoid delays in merging. Regarding example_advanced_tabular.py, I think it can be reverted to be unchanged, and a dedicated XGB test can be added in future (along with all the other models).
|
@Innixma Thank your for your comments. One of possible issues is # possible option 1
import os
params = {'n_jobs': os.cpu_count()} # or using multiprocessing
# possible option 2
params = {'n_jobs': 9999}And another possible issue is a default parameters as you also mentioned. After resolving above 2 issues, the time was faster than before. If you don't mind it could be better to check whether performance would be improved. |
|
@sackoh I'm open to both of those changes, thanks for the deep dive! If you could add those changes along with addressing the comments, I can do another benchmark run to see how it compares. |
Note that `os.cpu_count()` worse the inference speed when the number of CPUs in the training system is lower than in the inferencing system.
|
@Innixma After you review, I will remove XGB from the default configs. Thank you for your reviews in advance! |
|
Job PR-691-12 is done. |
|
@sackoh Awesome work with the optimizations, here are the results: With the optimizations, XGBoost has no noticeable drop is predictive accuracy, while being ~3x faster to train and ~5x faster to infer, bringing it inline with the other GBM models. With these improvements, I think we can keep XGBoost in the default config and as a default dependency (we may move to optional in future PR). As a final preparation for merging, please resolve the minor conflict in Finally, if you are interested in contributing to AutoGluon in the future, here is an invite link to our developer slack channel. Feel free to message me on slack if you want to learn more about what we are working on. Thanks again for the high quality contribution! |
…ular_xgboost � Conflicts: � tabular/src/autogluon/tabular/trainer/model_presets/presets.py
|
Job PR-691-13 is done. |
Innixma
left a comment
There was a problem hiding this comment.
Looks great, thanks for the contribution!
|
@Innixma I'm pleased to be invited to the slack channel. I'll contact you soon through the channel. |
* Add xgboost model and utils to fit * Add custom callback functions for early stopping * Add basic params and hyperparameter spaces to tune xgboost model * Update tabular prediction to include xgboost model for training * Updated xgboost model fit to exclude invalid params `num_threads`, `num_gpus` * Added XGBoost model to advanced tabular examples for test * Modified env.iteration to best_iteration * Removed overwritten parameter n_jobs * Changed rabit to logger and Added print_evaluation to log iteration every 50 steps * Modified to log every 50 or 1 iterations with callbacks * Updated learning_rate in searchsapces to set equal with default hyperparameters * Updated thread parameters to use all cores as default * Updated xgboost model to use 'OneHotMergeRaresHandleUnknownEncoder' * Updated small changes to import and setup * Updated setup.py * Updated way to get max_category_levels parameter * Updated setup.py and Fixed typos after rebase * Updated preprocess to use refit_full * Updated try import xgboost * Deleted `\n` from every log message * Updated `n_jobs` to use whole parallel threads Note that `os.cpu_count()` worse the inference speed when the number of CPUs in the training system is lower than in the inferencing system. * Added `_get_default_auxiliary_params` and `_predict_proba` * Added `test_xgboost` in a unittest
Issue #, if available:
Tabular: Add XGBoost Model #589
Description of changes:
This code adds XGboost model to tabular predictor. I have tried to reference the existing LightGBM and Catboost models for ease of maintenance.
Features
xgboost.trainfunctions like callbacks, custom eval functions, continuous training.scipy.sparse.csr_matrixas training datasetsTested
TODO
Future works
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.