About Me

I do research in AI safety and interpretability.

I am currently a Visiting Scientist at Schmidt Sciences and a Visiting Researcher at the Stanford NLP Group, working with Chris Potts. Previously, I’ve been at Anthropic, AI2, Google, Meta, and the University of North Carolina at Chapel Hill, where I did my PhD.

Below are some of the main research areas I am interested in:

  1. Interpretability
  2. Model Editing & Unlearning
  3. Scalable Oversight

Broadly, I am interested in explaining and controlling the behavior of machine learning models. I see language models as a good object of study since we lack complete explanations for their behavior and human language provides a rich means of interaction with models. I find work on clarifying concepts and developing strong evaluation procedures especially valuable.

Email: peter@cs.unc.edu

Google Scholar Page

News