[R-package] [python-package] More convenient API for interaction_constraints #6376

mayer79 · 2024-03-20T09:05:26Z

Interaction constraints are extremely useful in practice. The API in LightGBM has two aspects that could be improved:

Currently, the Python package allows only integer positions like [[0], [1, 2]]. We could also allow feature names as in XGBoost, or in the R-package of LightGBM.
Features not appearing in any interaction group are silently dropped from the training altogether. Unlike in Scikit-Learn and XGBoost, where missing features can interact (i.e., build an own interaction group). This applies to both Python/R-APIs. Above example could be entered as [[0]], or as [["living_area"]].

Both aspects help to reduce user mistakes and aligns the API with XGBoost (and partly with Scikit-Learn's histogram gradient boosting implementation).

I can make two PRs: one for the R-package, and one for the Python package.

@jameslamb @btrotta

The text was updated successfully, but these errors were encountered:

mayer79 · 2024-03-22T09:00:32Z

@jameslamb could you also add a Python tag? There, we would need two things: translating feature names to int positions, and adding the set of skipped features as additional set.

mayer79 · 2024-03-26T16:24:09Z

Question to @jameslamb and others: In Python, the interaction constraints are transformed into a string here, just like any other list of list-like object:

https://github.com/microsoft/LightGBM/blob/master/python-package/lightgbm/basic.py,

def _param_dict_to_str(data: Optional[Dict[str, Any]]) -> str:
    """Convert Python dictionary to string, which is passed to C API."""
    if data is None or not data:
        return ""
    pairs = []
    for key, val in data.items():
        if isinstance(val, (list, tuple, set)) or _is_numpy_1d_array(val):
            pairs.append(f"{key}={','.join(map(_to_string, val))}")
        elif isinstance(val, (str, Path, _NUMERIC_TYPES)) or _is_numeric(val):
            pairs.append(f"{key}={val}")
        elif val is not None:
            raise TypeError(f"Unknown type of parameter:{key}, got:{type(val).__name__}")
    return " ".join(pairs)

Is it an option to add a snippet like this?

if key == "interaction_constraints":
  val = _prep_interaction_constraints(vals, feature_names)

_prep_interaction_constraints() would do two things:

Translate feature names to integers (if strings)
Add a list with omitted features

Both would need access to the list of feature_names, so we would need to either pass feature_names = None or the Dataset object to _param_dict_to_str(). Not sure if this the right way to proceed?

jameslamb · 2024-03-27T02:32:06Z

could you also add a Python tag?

We don't actually have a python label but probably should. I've at least edited the PR title for now.

jameslamb · 2024-03-27T02:36:40Z

hmmmm @btrotta @guolinke if interaction_constraints is provided, should features that do not appear be excluded from training?

At first I agreed with @mayer79 's assessment that they should not be excluded, but seeing this docstring makes me think they're intentionally excluded:

LightGBM/include/LightGBM/config.h

Line 575 in 28536a0

    
           // desc = any two features can only appear in the same branch only if there exists a constraint containing both features

jameslamb · 2024-03-27T02:39:50Z

Is it an option to add a snippet like this?

I'd prefer that that be done outside of _param_dict_to_str(), similar to how categorical_feature is resolved:

LightGBM/python-package/lightgbm/basic.py

Lines 2110 to 2130 in 28536a0

    
           # get categorical features 
        
           if isinstance(categorical_feature, list): 
        
               categorical_indices = set() 
        
               feature_dict = {} 
        
               if isinstance(feature_name, list): 
        
                   feature_dict = {name: i for i, name in enumerate(feature_name)} 
        
               for name in categorical_feature: 
        
                   if isinstance(name, str) and name in feature_dict: 
        
                       categorical_indices.add(feature_dict[name]) 
        
                   elif isinstance(name, int): 
        
                       categorical_indices.add(name) 
        
                   else: 
        
                       raise TypeError(f"Wrong type({type(name).__name__}) or unknown name({name}) in categorical_feature") 
        
               if categorical_indices: 
        
                   for cat_alias in _ConfigAliases.get("categorical_feature"): 
        
                       if cat_alias in params: 
        
                           # If the params[cat_alias] is equal to categorical_indices, do not report the warning. 
        
                           if not (isinstance(params[cat_alias], list) and set(params[cat_alias]) == categorical_indices): 
        
                               _log_warning(f"{cat_alias} in param dict is overridden.") 
        
                           params.pop(cat_alias, None) 
        
                   params["categorical_column"] = sorted(categorical_indices)

I'd support it being a private function, so it can be shared between cv() and train().

jameslamb · 2024-03-27T02:42:17Z

I'm generally supportive of everything you've proposed at the top of this issue, thanks for working on it and for adding that context about consistency with XGBoost and scikit-learn!

Let's just please wait to hear from @btrotta or @guolinke about whether omitting features not mentioned in interaction_constraints was intentional, there might be something from #2884 and #3130

guolinke · 2024-03-27T03:21:15Z

Sorry, I don't remember about that 🤣, I think it was not intentional. We can align with XGBoost.

jameslamb · 2024-03-27T03:28:16Z

😂 ok thank you, I don't remember either. Agree then, let's move forward @mayer79

mayer79 · 2024-03-27T06:48:35Z

Is it an option to add a snippet like this?

I'd prefer that that be done outside of _param_dict_to_str(), similar to how categorical_feature is resolved:

LightGBM/python-package/lightgbm/basic.py

Lines 2110 to 2130 in 28536a0

# get categorical features

if isinstance(categorical_feature, list):

categorical_indices = set()

feature_dict = {}

if isinstance(feature_name, list):

feature_dict = {name: i for i, name in enumerate(feature_name)}

for name in categorical_feature:

if isinstance(name, str) and name in feature_dict:

categorical_indices.add(feature_dict[name])

elif isinstance(name, int):

categorical_indices.add(name)

else:

raise TypeError(f"Wrong type({type(name).__name__}) or unknown name({name}) in categorical_feature")

if categorical_indices:

for cat_alias in _ConfigAliases.get("categorical_feature"):

if cat_alias in params:

# If the params[cat_alias] is equal to categorical_indices, do not report the warning.

if not (isinstance(params[cat_alias], list) and set(params[cat_alias]) == categorical_indices):

_log_warning(f"{cat_alias} in param dict is overridden.")

params.pop(cat_alias, None)

params["categorical_column"] = sorted(categorical_indices)

I'd support it being a private function, so it can be shared between cv() and train().

Thanks for the hint. I will soon draft the function and then we determine where to place it :-).

mayer79 mentioned this issue Mar 20, 2024

[R-package] ensure use of interaction_constraints does not lead to features being ignored #6377

Open

jameslamb added the r-package label Mar 22, 2024

jameslamb changed the title ~~More convenient API for interaction_constraints~~ [R-package] [python-package] More convenient API for interaction_constraints Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-package] [python-package] More convenient API for interaction_constraints #6376

[R-package] [python-package] More convenient API for interaction_constraints #6376

mayer79 commented Mar 20, 2024 •

edited

mayer79 commented Mar 22, 2024

mayer79 commented Mar 26, 2024

jameslamb commented Mar 27, 2024

jameslamb commented Mar 27, 2024

jameslamb commented Mar 27, 2024

jameslamb commented Mar 27, 2024

guolinke commented Mar 27, 2024 •

edited

jameslamb commented Mar 27, 2024

mayer79 commented Mar 27, 2024 •

edited

[R-package] [python-package] More convenient API for interaction_constraints #6376

[R-package] [python-package] More convenient API for interaction_constraints #6376

Comments

mayer79 commented Mar 20, 2024 • edited

mayer79 commented Mar 22, 2024

mayer79 commented Mar 26, 2024

jameslamb commented Mar 27, 2024

jameslamb commented Mar 27, 2024

jameslamb commented Mar 27, 2024

jameslamb commented Mar 27, 2024

guolinke commented Mar 27, 2024 • edited

jameslamb commented Mar 27, 2024

mayer79 commented Mar 27, 2024 • edited

mayer79 commented Mar 20, 2024 •

edited

guolinke commented Mar 27, 2024 •

edited

mayer79 commented Mar 27, 2024 •

edited