GitHub - microsoft/augmented-interpretable-models: Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.

Augmenting Interpretable Models with LLMs during Training

This repo contains code to reproduce the experiments in the Aug-imodels paper (Nature Communications, 2023). For a simple scikit-learn interface to use Aug-imodels, use the imodelsX library. Below is a quickstart example.

Installation: pip install imodelsx

from imodelsx import AugLinearClassifier, AugTreeClassifier, AugLinearRegressor, AugTreeRegressor
import datasets
import numpy as np

# set up data
dset = datasets.load_dataset('rotten_tomatoes')['train']
dset = dset.select(np.random.choice(len(dset), size=300, replace=False))
dset_val = datasets.load_dataset('rotten_tomatoes')['validation']
dset_val = dset_val.select(np.random.choice(len(dset_val), size=300, replace=False))

# fit model
m = AugLinearClassifier(
    checkpoint='textattack/distilbert-base-uncased-rotten-tomatoes',
    ngrams=2, # use bigrams
)
m.fit(dset['text'], dset['label'])

# predict
preds = m.predict(dset_val['text'])
print('acc_val', np.mean(preds == dset_val['label']))

# interpret
print('Total ngram coefficients: ', len(m.coefs_dict_))
print('Most positive ngrams')
for k, v in sorted(m.coefs_dict_.items(), key=lambda item: item[1], reverse=True)[:8]:
    print('\t', k, round(v, 2))
print('Most negative ngrams')
for k, v in sorted(m.coefs_dict_.items(), key=lambda item: item[1])[:8]:
    print('\t', k, round(v, 2))

Reference:

@misc{ch2022augmenting,
    title={Augmenting Interpretable Models with LLMs during Training},
    author={Chandan Singh and Armin Askari and Rich Caruana and Jianfeng Gao},
    year={2022},
    eprint={2209.11799},
    archivePrefix={arXiv},
    primaryClass={cs.AI}
}

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
augdistill		augdistill
auggam		auggam
auglm		auglm
augtree		augtree
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
SECURITY.md		SECURITY.md
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

augdistill

augdistill

auggam

auggam

auglm

auglm

augtree

augtree

docs

docs

.gitignore

.gitignore

LICENSE

LICENSE

SECURITY.md

SECURITY.md

readme.md

readme.md

Repository files navigation

About

Releases 1

Contributors 3

Languages

License

microsoft/augmented-interpretable-models

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages