Talks
2026
- Tsinghua University — LLM Interpretability: Faithful Reasoning and Controllable Knowledge [slides]
- University of Oxford — Ibid.
- Imperial College London — Ibid.
- ELLIS-Lisbon — Ibid.
- Together AI — Ibid.
- NEC Europe — Ibid.
- Stanford University, CS338 (Aligning Superintelligence), guest lecture — Scalable Reward Learning [slides]
2025
- ICML 2025 Workshop on Machine Unlearning for Generative AI — Beyond Retain and Forget Sets: Unlearning as Rational Belief Revision [slides]
- University of Chicago, CS and DSI Joint Colloquium — AI Safety Through Interpretable and Controllable Language Models [slides]
2024
- TTIC, Young Researcher Seminar Series — AI Safety Through Interpretable and Controllable Language Models [slides]
- Harvard University — Controlling and Editing Knowledge in Large Language Models [slides]
- Pacific Northwest National Laboratories — Ibid.
- Stanford NLP Seminar — Ibid.
- OpenAI — The Unreasonable Effectiveness of Easy Training Data for Hard Tasks [slides]
- CHAI, UC Berkeley — Ibid.
2023
- Brown University — Interpretable and Controllable Language Models [slides]
- Princeton University — Ibid.
- New York University — Ibid.
- University of Pennsylvania — Ibid.
2022
- University of Oxford — Explainable Machine Learning in NLP: Methods and Evaluation [slides]
- NEC Laboratories Europe — Ibid. [slides]
- National Institute for Standards and Technology (NIST) — Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? [slides]
- Allen Institute for AI — Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs [slides]
- Uber AI — The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations [slides]
2021
- CHAI, UC Berkeley — Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? [slides]
