Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in cut.default(x[[i]], unique(explainer$bin_cuts[[i]]), labels = FALSE, : invalid number of intervals #186

Open
marianeira opened this issue Jul 26, 2021 · 0 comments

Comments

@marianeira
Copy link

marianeira commented Jul 26, 2021

It seems that lime explanation does not work with variables with just NAs and constant value, which do fit the XGBOOST.

For instance, I have a variable that is highly correlated to the target, in fact, it is the variable with the highest gain within the importance of variables. Besides, if we replace missing values with an extreme value we obtain a correlation with the target of 0.77.

However, it does not work within LIME explanation because its deviation is zero (it does not consider missing values, unlike xgboost). Therefore I can't use the lime benefits with these types of variables. Is there any other solution rather than removing that type of columns, which seems to work well in XGBOOST?

Here, there is a simple example of the problem. Thanks in advance

df <- data.frame(target = c(0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2),
var1 = rnorm(22),
var2 = rnorm(22)*10,
var3 = c(rep(0,20),1,1),
var4 = c(-1,-2,5,3,1,2,2,1,1,2,1,-1,5,1,1,20,2,1,0,2,2,2),
var5 = c(NA,NA,NA,NA,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))

Train Xgboost

X_train <- df %>% select(-target)

dtrain <- xgb.DMatrix(data.matrix(X_train),
label = as.matrix(df$target))

boost <- xgb.train(data = dtrain,
list(max_depth = 7, eta = 0.1,
objective = "multi:softprob",
eval_metric = "error", nthread = 1),
num_class = 3,
nrounds = 100)
xgb.importance(feature_names = colnames(dtrain),
model = boost)

local_obs <- X_train[c(1,2),]

Fit Lime, quantile bins = FALSE

explainer1 <- lime(x=X_train,model=boost, quantile_bins = F)
Error in cut.default(x[[i]], unique(explainer$bin_cuts[[i]]), labels = FALSE, :
invalid number of intervals

explanations1 <- lime::explain(local_obs, explainer1, n_labels = 2, n_features = 2)
plot_explanations(explanations1)

Fit Lime, quantile bins = TRUE

explainer2 <- lime(x=X_train,model=boost, quantile_bins = T)
Error in cut.default(x[[i]], unique(explainer$bin_cuts[[i]]), labels = FALSE, :
invalid number of intervals
In addition: Warning messages:
1: var3 does not contain enough variance to use quantile binning. Using standard binning instead.
2: var5 does not contain enough variance to use quantile binning. Using standard binning instead.

explanations2 <- lime::explain(local_obs, explainer2, n_labels = 2, n_features = 2)
plot_explanations(explanations2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant