Research

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun
Preprint on arXiv. [pdf] [code]

Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
Swarnadeep Saha, Shiyue Zhang, Peter Hase, Mohit Bansal
ICLR 2023. [pdf] [code]

GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
Archiki Prasad, Peter Hase, Xiang Zhou, Mohit Bansal
EACL 2023. [pdf] [code]

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs
Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer
EACL 2023. [pdf] [code]

Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations
Swarnadeep Saha, Peter Hase, Nazneen Rajani, Mohit Bansal
EMNLP 2022. [pdf] [code]

VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives
Zhuofan Ying,* Peter Hase,* Mohit Bansal
NeurIPS 2022. [pdf] [code]

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data
Peter Hase, Mohit Bansal
ACL 2022 Workshop on Natural Language Supervision. [pdf v2] [pdf v1] [code]

Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions
Prateek Yadav, Peter Hase, Mohit Bansal
Preprint on arXiv. [pdf] [code]

The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations
Peter Hase, Harry Xie, Mohit Bansal
NeurIPS 2021. [pdf] [code]

FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging
Han Guo, Nazneen Fatema Rajani, Peter Hase, Mohit Bansal, Caiming Xiong
EMNLP 2021. [pdf] [code]

Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?
Peter Hase, Shiyue Zhang, Harry Xie, Mohit Bansal
Findings of EMNLP. [pdf] [code]

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?
Peter Hase, Mohit Bansal
ACL 2020. [pdf] [code]

Interpretable Image Recognition with Hierarchical Prototypes
Peter Hase, Chaofan Chen, Oscar Li, Cynthia Rudin
AAAI-HCOMP 2019. [pdf] [code]

Shall I Compare Thee to a Machine-Written Sonnet? An Approach to Algorithmic Sonnet Generation
John Benhardt,* Peter Hase,* Liuyi Zhu,* Cynthia Rudin
Preprint on arXiv. [pdf] [code]