Skip to content

Conversation

@kapsner
Copy link
Contributor

@kapsner kapsner commented Jan 28, 2026

This PR fixes the cause of the CRAN note, for which treeshap was removed from CRAN in 12/2025.
Would be nice if one of the maintainers could submit this package again please.
Best, Lorenz

@pbiecek @piotrpiatyszek @tmikolajczyk @krzyzinskim

@kapsner kapsner changed the title fix:: added depends R>=4.1.0 to description fix: added depends R>=4.1.0 to description Jan 29, 2026
@kapsner kapsner mentioned this pull request Jan 29, 2026
@kapsner kapsner changed the title fix: added depends R>=4.1.0 to description fix: update xgboost api and add depends R>=4.1.0 to description Jan 29, 2026
@kapsner
Copy link
Contributor Author

kapsner commented Jan 29, 2026

Note:
There is still some issue with the latest xgboost-api, I need to figure out.
I was able to narrow it down to the functioning of the new xgboost::xgb.model.dt.tree method. (Besides the col-name changed from "Quality" to "Gain") Compared with the previous xgboost-version (1.7.11.1), the outputs of "Missing" differ and "Gain"/"Prediction" has different values for "Leaf"-nodes, which likely results the different output from "unify_predict"/"predict_cpp".

@kapsner
Copy link
Contributor Author

kapsner commented Jan 29, 2026

Edit:
found the root cause of the different behaviour in xgboost >3.0:

It's related to how they handle the intercept now

Since 2.0.0, XGBoost supports estimating the model intercept (named base_score) automatically based on targets upon training.

(https://xgboost.readthedocs.io/en/stable/tutorials/intercept.html#intercept)

I will propose a fix soon in this PR

restored expected behaviour of 'predictions from unified == original predictions'
by now controlling the 'base_score' parameter in order
to prevent automated calculation of the intercept, as it was
introduced to xgboost with v2 and later versions
renamed argument 'model' -> 'model_name' to avoid issues
with also used object named 'model'
as split value is considered to be 'less than'
@kapsner
Copy link
Contributor Author

kapsner commented Jan 29, 2026

@mayer79 I was able to fix everything related to the failing CRAN stuff and related to the new xgboost API.

However, to restore the behaviour of the previous xgboost version in the unit tests, some hacky workaround was necessary (5ba814a#diff-b1f4725ec5309708f3632330acf3ef5551186dec8b62919e6209af8b0eda42daR160-R179)

In short:

  • xgboost now automatically infers the intercept, leading to different Gain-values as before from the text based tree model. This can be prevented by setting the 'base_score' parameter to some integer
  • this is the first aspect necessary, to get the following tests working
  • the second aspect is that for those tests to run successfully, "Decision.type" must be set to "<" for all non-leaf nodes
  • However, when defaulting "Decision.type" to "<" directly in the xgboost.unify function, xgboost related treeshap-correctness unittests are failing

As I considered the latter more improtant, I've implemented everything in a manner that those tests work well and created the hacky workaround only for the two tests referenced above (not sure if they are really required though?!).

Would be great if you or one of the maintainers could have a look at those changes.

Best,
Lorenz

renamed globals.R -> zzz.R and moved all import statements there
Copy link
Member

@pbiecek pbiecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @kapsner
I will do some extra stress tests and resubmit to CRAN

@pbiecek pbiecek merged commit 2ad5fa0 into ModelOriented:master Feb 1, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants