Working on
AI Alignment & Safety

Master's Student at Saarland University

Focusing on Mechanistic Interpretability and Scalable Oversight.

About Me

I am a Master's student in Data Science and Artificial Intelligence aiming to reduce existential risk from advanced artificial intelligence.

My current interest focuses on reverse-engineering transformer models to understand internal representations of truth. I am also interested in working on reinforcement learning environments.

I am currently looking for research engineering internships or PhD positions for Fall 2026.

Interests

  • Mechanistic Interpretability
  • Robustness
  • Model Governance
  • Evals
  • Deep Learning

Selected Research

ArXiv 2024

Probing Large Language Models for Deception

Investigating whether activations in intermediate layers can predict deceptive outputs in RLHF-tuned models.

Project 2023

Toy Models of Superposition

Reproduction of Anthropic's work on superposition in toy models, visualized using Python and D3.js.

LessWrong 2023

Critique of Current Governance Frameworks

A deep dive into the limitations of voluntary compute reporting and proposals for hardware-level verification.

Recent Writing