Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ml_feature_importance() method for ml_model_xgboost_classification #21

Closed

Conversation

yutannihilation
Copy link
Contributor

@yutannihilation yutannihilation commented Oct 23, 2019

Close #16

Note that this requires to update xgboost4j-spark to at least 0.82 to use Booster.getScore() (c.f. dmlc/xgboost@431c850). So, I did

  1. create a directory internal/xgboost4j-spark
  2. manually download xgboost4j-spark-0.82.jar from Maven and place it in the directory
  3. Run configure.R

but I don't include the updated inst/java/sparkxgb-2.3-2.11.jar because I'm not fully sure if I did the things right. Please let me know if I need to include them.

library(sparkxgb)
library(sparklyr)
library(dplyr, warn.conflicts = FALSE)

sc <- spark_connect(master = "local")
iris_tbl <- sdf_copy_to(sc, iris)

xgb_model <- xgboost_classifier(
  iris_tbl, 
  Species ~ .,
  objective = "multi:softprob",
  num_class = 3,
  num_round = 50, 
  max_depth = 4
)

ml_feature_importances(xgb_model)
#>        feature  importance
#> 4 Petal_Length 0.671981781
#> 2  Petal_Width 0.311575675
#> 1  Sepal_Width 0.009863888
#> 3 Sepal_Length 0.006578656

spark_disconnect(sc)
#> NULL

Created on 2019-10-23 by the reprex package (v0.3.0)

Base automatically changed from master to main February 18, 2021 16:47
@yitao-li
Copy link
Contributor

yitao-li commented Feb 19, 2021

Hey @yutannihilation! Thanks for contributing to sparkxgb.

My name is Yitao. I will be taking over from Kevin as the maintainer of sparkxgb soon. Just catching up on pending PRs at the moment.

I think this change looks good. I'll create a similar PR listing you as the primary author, and then add a test case for feature importances and make sure it passes with Spark 2.4 and Spark 3.0, and after that it should be good to go 👍

@yitao-li
Copy link
Contributor

Please see #38 for the newer version of this PR that contains the same ml_feature_importance() plus unit test coverage for this functionality. Thanks!

@yitao-li yitao-li closed this Feb 19, 2021
@yutannihilation yutannihilation deleted the feature/feature-importance branch February 19, 2021 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature request: Feature Importance of models
2 participants