Bio
I do research in AI safety and interpretability.
I am currently an AI Institute Fellow at Schmidt Sciences and a Postdoctoral Researcher at the Stanford NLP Group, working with Chris Potts. Previously, I’ve been at Anthropic, AI2, Google, Meta, and the University of North Carolina at Chapel Hill, where I did my PhD.
Below are some of the main research areas I am interested in:
- Interpretability
- Model Editing & Unlearning
- Scalable Oversight
Broadly, I am interested in explaining and controlling the behavior of machine learning models. I see language models as a good object of study since we lack complete explanations for their behavior and human language provides a rich means of interaction with models. I find work on clarifying concepts and developing strong evaluation procedures especially valuable.
For an up-to-date list of papers, see my Google Scholar page.
Email: peter@cs.unc.edu
