Master's Student at Saarland University
Focusing on Mechanistic Interpretability and Scalable Oversight.
I am a Master's student in Data Science and Artificial Intelligence aiming to reduce existential risk from advanced artificial intelligence.
My current interest focuses on reverse-engineering transformer models to understand internal representations of truth. I am also interested in working on reinforcement learning environments.
I am currently looking for research engineering internships or PhD positions for Fall 2026.
Investigating whether activations in intermediate layers can predict deceptive outputs in RLHF-tuned models.
Reproduction of Anthropic's work on superposition in toy models, visualized using Python and D3.js.
A deep dive into the limitations of voluntary compute reporting and proposals for hardware-level verification.