Talks

2026

  • Tsinghua University — LLM Interpretability: Faithful Reasoning and Controllable Knowledge [slides]
  • University of OxfordIbid.
  • Imperial College LondonIbid.
  • ELLIS-LisbonIbid.
  • Together AIIbid.
  • NEC EuropeIbid.
  • Stanford University, CS338 (Aligning Superintelligence), guest lecture — Scalable Reward Learning [slides]

2025

  • ICML 2025 Workshop on Machine Unlearning for Generative AI — Beyond Retain and Forget Sets: Unlearning as Rational Belief Revision [slides]
  • University of Chicago, CS and DSI Joint Colloquium — AI Safety Through Interpretable and Controllable Language Models [slides]

2024

  • TTIC, Young Researcher Seminar Series — AI Safety Through Interpretable and Controllable Language Models [slides]
  • Harvard University — Controlling and Editing Knowledge in Large Language Models [slides]
  • Pacific Northwest National LaboratoriesIbid.
  • Stanford NLP SeminarIbid.
  • OpenAI — The Unreasonable Effectiveness of Easy Training Data for Hard Tasks [slides]
  • CHAI, UC BerkeleyIbid.

2023

  • Brown University — Interpretable and Controllable Language Models [slides]
  • Princeton UniversityIbid.
  • New York UniversityIbid.
  • University of PennsylvaniaIbid.

2022

  • University of Oxford — Explainable Machine Learning in NLP: Methods and Evaluation [slides]
  • NEC Laboratories EuropeIbid. [slides]
  • National Institute for Standards and Technology (NIST) — Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? [slides]
  • Allen Institute for AI — Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs [slides]
  • Uber AI — The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations [slides]

2021

  • CHAI, UC Berkeley — Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? [slides]