About Me

I am a fifth-year PhD student in the UNC-NLP lab at the University of North Carolina at Chapel Hill, where I am advised by Mohit Bansal. My work at UNC is supported by a Google PhD Fellowship and previously by a Royster Fellowship. Before this, I graduated with a bachelor’s degree from Duke University, where my thesis advisor was Cynthia Rudin. At Duke I was supported by a Trinity Scholarship.

My research interests center on interpretable machine learning and natural language processing. Below are some of the main problems I’ve worked on (publications here):

  1. Mechanistic Interpretability
  2. Natural Language Explanations
  3. Model Editing
  4. Scalable Oversight
  5. Supervised and Decomposable Reasoning
  6. XAI Methods & Evaluation

Broadly, I am interested in explaining model behavior and improving model safety. I see language models as a good object of study since we lack complete explanations for their behavior and human language provides a rich means of interaction with models. I find work on clarifying concepts and developing strong evaluation procedures especially valuable.

Email: peter@cs.unc.edu

Google Scholar Page

News

  • 2024 - Invited talk at Stanford NLP Seminar on “Controlling and Editing Knowledge in Large Language Models” [slides]
  • 2024 - Invited talks at OpenAI and CHAI (UC Berkeley) on “The Unreasonable Effectiveness of Easy Training Data for Hard Tasks “ [slides]
  • 2024 - New paper out! “The Unreasonable Effectiveness of Easy Training Data for Hard Tasks” [pdf] [code]
  • 2024 - Paper accepted to ICLR with a spotlight: “Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks” [pdf] [code]
  • 2023 - Serving as an Area Chair for EACL 2024 in the Interpretability and Analysis of Models for NLP track
  • 2023 - New paper out! “Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks” [pdf] [code]
  • 2023 - Three papers accepted to NeurIPS 2023! Our work on (1) localization and model editing, (2) mechanistic interpretability for vision models, and (3) LMs explaining tasks to weaker agents (teaching).
  • 2023 - Named an Outstanding Area Chair at ACL 2023 (1-1.5% of the pool of reviewers and chairs)
  • 2023 - New paper out! “Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind” [pdf] [code]
  • 2023 - New paper out! “Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects” [pdf] [code]
  • 2023 - Started summer internship at AI2! Supervised by Sarah Wiegreffe and Peter Clark
  • 2023 - New paper out! “Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models” [pdf] [code]
  • 2022 - Serving as an Area Chair for ACL 2023 in the Interpretability and Analysis of Models for NLP track
  • 2022 - Serving as an Area Chair for the AAAI 2023 Workshop on Representation learning for Responsible Human-Centric AI
  • 2022 - Work accepted to EMNLP 2022: “Are Hard Examples also Harder to Explain? A Study with Human and Model-Generated Explanations” [pdf]
  • 2022 - Work accepted to NeurIPS 2022: “VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives” [pdf]
  • 2022 - Serving as an Area Chair for EMNLP 2022 in the Interpretability, Interactivity and Analysis of Models for NLP track
  • 2022 - Started summer internship at Google Research! Supervised by Asma Ghandeharioun and Been Kim
  • 2022 - Invited talk at the University of Oxford on Explainable Machine Learning in NLP
  • 2022 - Paper accepted to ACL 2022 Workshop on Natural Language Supervision! “When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data” [pdf] [code]
  • 2022 - Invited talk at NEC Laboratories Europe, on Explainable Machine Learning in NLP
  • 2022 - Invited talk at the National Institute for Standards and Technology, on Evaluating Explainable AI
  • 2022 - Invited talk at the Allen Institute for AI, on Detecting, Updating, and Visualizing Language Model Beliefs
  • 2022 - Invited talk at Uber AI, on The OOD Problem and Search Methods in Explainable ML
  • 2021 - New preprint on arxiv! “Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs” [pdf] [code]
  • 2021 - Paper accepted to NeurIPS 2021! “The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations” [pdf] [code]
  • 2021 - Awarded a Google PhD Fellowship for Natural Language Processing!
  • 2021 - Invited talk at CHAI, UC Berkeley, on Evaluating Explainable AI
  • 2021 - Paper accepted to EMNLP 2021: “FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging” [pdf] [code]
  • 2021 - Named as an outstanding reviewer for ACL-IJCNLP 2021
  • 2021 - New paper on arxiv! “Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals” [pdf] [code]
  • 2021 - Started summer internship at FAIR, supervised by Srini Iyer.
  • 2021 - New blog post on the Alignment Forum: “Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers” [link]
  • 2021 - New preprint on arxiv: “When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data” [pdf] [code]
  • 2020 - New preprint on arxiv! “FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging” [pdf] [code]
  • 2020 - Recognized as an Outstanding Reviewer for EMNLP 2020
  • 2020 - Paper accepted into Findings of EMNLP, “Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?” [pdf] [code]
  • 2020 - Paper accepted into ACL 2020, “Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?” [pdf] [code]
  • 2019 - Paper accepted into AAAI-HCOMP 2019, “Interpretable Image Recognition with Hierarchical Prototypes” [pdf] [code]
  • 2019 - Joined the UNC NLP lab
  • 2019 - Graduated with a B.S. from the Department of Statistical Science at Duke University
  • 2019 - Awarded a Royster PhD Fellowship from UNC Chapel Hill