Talks

2026

Tsinghua University — LLM Interpretability: Faithful Reasoning and Controllable Knowledge [slides]
University of Oxford — Ibid.
Imperial College London — Ibid.
ELLIS-Lisbon — Ibid.
Together AI — Ibid.
NEC Europe — Ibid.
Stanford University, CS338 (Aligning Superintelligence), guest lecture — Scalable Reward Learning [slides]

ICML 2025 Workshop on Machine Unlearning for Generative AI — Beyond Retain and Forget Sets: Unlearning as Rational Belief Revision [slides]
University of Chicago, CS and DSI Joint Colloquium — AI Safety Through Interpretable and Controllable Language Models [slides]

TTIC, Young Researcher Seminar Series — AI Safety Through Interpretable and Controllable Language Models [slides]
Harvard University — Controlling and Editing Knowledge in Large Language Models [slides]
Pacific Northwest National Laboratories — Ibid.
Stanford NLP Seminar — Ibid.
OpenAI — The Unreasonable Effectiveness of Easy Training Data for Hard Tasks [slides]
CHAI, UC Berkeley — Ibid.

University of Oxford — Explainable Machine Learning in NLP: Methods and Evaluation [slides]
NEC Laboratories Europe — Ibid. [slides]
National Institute for Standards and Technology (NIST) — Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? [slides]
Allen Institute for AI — Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs [slides]
Uber AI — The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations [slides]

CHAI, UC Berkeley — Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? [slides]