Anna Hedström
- Postdoc
- anna.hedstroem@ai.ethz.ch
- OAT X19.1
- External Website
- My research explores the intersection of evaluation-centric interpretability and alignment for the control and safety of large language models. I aim to develop principled methods that transform mechanistic understanding of models into signals for steering and post-training, enabling preventative safeguards and mitigation of emergent misalignment.
