╔════════════════════╗
║ research -- thinking, reasoning models ║
╚════════════════════╝
I study how large language models perform multi-step reasoning and how training and post-training methods can improve their reliability, efficiency, and scalability.
My work focuses on the post-training stack for LLMs — supervised fine-tuning (SFT), preference optimization, reinforcement learning methods such as RLVR, and inference-time compute strategies that improve reasoning without requiring larger models.
I’m also interested in the interpretability of reasoning models: understanding the internal mechanisms that support multi-step reasoning and diagnosing failures such as shortcut reasoning, reward hacking, and unfaithful chain-of-thought.
Currently building and open-sourcing implementations of reasoning-focused training pipelines and contributing to LLM infrastructure and post-training frameworks.
* I love SpaceX rockets *



