Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features not transformed by BinningProcess #194

Open
jnsofini opened this issue Sep 25, 2022 · 2 comments
Open

Features not transformed by BinningProcess #194

jnsofini opened this issue Sep 25, 2022 · 2 comments
Assignees
Labels
enhancement New feature or request question Further information is requested
Projects
Milestone

Comments

@jnsofini
Copy link
Contributor

It will be great to have a way to have special features that are not transformed by the binning process. This gives users a way to have columns like snapshot-date and ids (not used in calculating woe) that are not transformed but can be easily mapped to the raw data for use in the downstream process.

@guillermo-navas-palencia
Copy link
Owner

guillermo-navas-palencia commented Sep 25, 2022

Hi @jnsofini.

I think users can perform the operation you described leveraging sklearn.compose.ColumnTransformer. See the example below. Given the versatility provided by ColumnTransformer, I am not entirely sure whether it is worth incorporating a similar function. Happy to discuss this further.

import numpy as np
import pandas as pd

from optbinning import BinningProcess
from sklearn.compose import ColumnTransformer
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

Add id column

df['id'] = np.arange(df.shape[0])

Use column transformer and pass the features to apply binning. The remaining column (id) will be passed through.

features = df.columns[:-1].tolist()

binning_process = BinningProcess(variable_names=features)

ct = ColumnTransformer(
    transformers=[('binning_process', binning_process, features)],
    remainder='passthrough'
)

Xt = ct.fit_transform(df, y)

Convert to dataframe

pd.DataFrame(data=Xt, columns=df.columns)

@guillermo-navas-palencia guillermo-navas-palencia added enhancement New feature or request question Further information is requested labels Sep 25, 2022
@guillermo-navas-palencia
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
ToDo
  
To do
Development

No branches or pull requests

2 participants