Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
intervention
interpretability
mechanistic-interpretability
activation-intervention
activation-patching
-
Updated
Jun 3, 2024 - Python