Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Setting values for linear coefficient #6390

Open
velezbeltran opened this issue Mar 27, 2024 · 5 comments
Open

[Question] Setting values for linear coefficient #6390

velezbeltran opened this issue Mar 27, 2024 · 5 comments

Comments

@velezbeltran
Copy link

velezbeltran commented Mar 27, 2024

Summary

Hello! Thank you for the library; it has been invaluable to my work for the past couple of years!

I was wondering if, from the Python interface, it is possible to manually set the linear model at the leaf if we are fitting a linear tree. That is if when training the model, we use linear_tree=True is it possible to afterwards modify the linear model at each leaf. If not, I think it would be useful.

Motivation

This is good if you want to compute the derivatives of the tree and use the linear model as an approximation. That is what we were planning on using it for. In that case we can differentiate by modifying the linear model and setting some values to 0.

Description

Essentially, having some function that is similar to set_leaf_output but for the coefficients.

image
@jameslamb
Copy link
Collaborator

Thanks for using LightGBM.

I've edited your post to actually use set_leaf_output in plaintext, so this could be found from search engines.

I think this is an interesting idea. Could you write some pseudo-code showing what you'd like the interface to look like? For example, would it be like this?

Booster.set_linear_leaf_coefficients(
   tree_id=1234,
   leaf_id=5,
   constant=100.5,
   beta=0.89
)

(I don't recall if LightGBM linear models have a constant, would have to double-check)

@aagrande
Copy link

aagrande commented Mar 27, 2024

Thank you for the prompt reply @jameslamb! I collaborate with @velezbeltran.

When linear_tree=True, each leaf has:

  • leaf_const: intercept of the linear model.
  • leaf_features: indices of the numerical features in the leaf's branch.
  • leaf_coeff: slopes of the linear model, one for each feature.

So the interface may look like this:

Booster.set_leaf_linear_model(
   tree_id=1234,
   leaf_id=5,
   constant=100.5,
   features=[0, 3, 4],
   coefficients=[0.89, 0.12, 3.14]
)

To modify the coefficients within a leaf, we need to know which features appear in the leaf's linear model. So the set method would be paired with a get method (similar to get_leaf_output and set_leaf_output):

Booster.get_leaf_linear_model(
   tree_id=1234,
   leaf_id=5
)
  """
  Return intercept, features, and slopes of the linear model.
  """

My understanding is that at the moment the only method to access the linear coefficients is via Booster.dump_model().

@jameslamb
Copy link
Collaborator

Thanks for that, makes sense to me!

We'd have to figure out specifics on how much validation to do, how to test this, etc. but in general I think this would be a great addition to the library, to add functionality for linear models that's similar to what you can get for regular single-value leaf nodes with `set_leaf_output().

I think we'd want to add this at the level of the C API and keep the logic on the Python side as minimal as possible.

@guolinke @shiyu1994 @jmoralez @borchero @btrotta what do you think about this? I think I should not be the one to decide along whether or not we accept an expansion of the library's API like this.

@borchero
Copy link
Collaborator

I personally cannot gauge the usefulness of this feature and believe that this is quite a niche requirement. That being said, I also don't see a reason to not expose the coefficients of the linear models via the Python API and, similarly, allow to modify these values.

Regarding testing, I don't have a lot of concerns: it seems to me like this would essentially be about implementing "getter/setter" methods for the coefficients.

@shiyu1994
Copy link
Collaborator

I think we'd want to add this at the level of the C API and keep the logic on the Python side as minimal as possible.

I agree. I can help to implement this feature in the C API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants