Zhengyan Shi

Hi, welcome to my personal page. I am a Senior Researcher at Microsoft Research (MSR). My current research at MSR focuses on teaching language models (LMs) to reason and code. I build learning loops in which LMs not only act but also reason within scalable, self-evolving environments. By allowing models to plan and interact with various environments, I explore how LMs learn from experience/past interactions to continually refine themselves.


I obtained my PhD in Computer Science at University College London (UCL). Before that, I completed an MSc in Data Science (Statistics) with Distinction at UCL and a BSc in Mathematics with First Class Honours from the University of Liverpool and Xi'an Jiaotong-Liverpool University. I have also held research internships at Cohere (London) and Amazon (London & Seattle). Central to my work is the ambition to leverage LMs efficiently and robustly to solve general tasks. To that end, my existing work can be broadly categorized into the following directions:

Google Scholar  /  Twitter  /  Github  /  LinkedIn  /  Email

Zhengyan Shi - Senior Researcher at Microsoft Research

Research (Selected)

Paper preview

BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills

Atharv Sonwane, Isadora White, Hyunji Lee, Matheus Pereira, Lucas Caccia, Minseon Kim, Zhengyan Shi, Chinmay Singh, Alessandro Sordoni, Marc-Alexandre Cรดtรฉ, Xingdi Yuan

arXiv preprint, 2025

Presents BugPilot, a method that instructs code agents to add features which inadvertently break tests, yielding realistic high-quality synthetic data at scale for code agents.

Paper preview

Gistify! Codebase-Level Understanding via Runtime Execution

Hyunji Lee, Minseon Kim, Chinmay Singh, Matheus Pereira, Atharv Sonwane, Isadora White, Elias Stengel-Eskin, Mohit Bansal, Zhengyan Shi, Alessandro Sordoni, Marc-Alexandre Cรดtรฉ, Xingdi Yuan, Lucas Caccia

arXiv preprint, 2025

Introduces Gistify, a task requiring coding LLMs to distill large codebases into minimal executable files, highlighting challenges in codebase understanding for current state-of-the-art code agents.

Paper preview

Instruction Tuning With Loss Over Instructions

Zhengyan Shi, Adam X. Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, Aldo Lipani

Advances in Neural Information Processing Systems (NeurIPS), 2024

We show that in certain scenarios, applying loss to instructions rather than outputs only, which we refer to as Instruction Modelling, could largely improve the performance of instruction tuning on both various NLP and open-ended generation benchmarks. Remarkably, in the most advantageous case, our approach boosts model performance on AlpacaEval 1.0 by over 100%.

Paper preview

Rethinking Semi-supervised Learning with Language Models

Zhengyan Shi, Francesco Tonolini, Nikolaos Aletras, Emine Yilmaz, Gabriella Kazai, Yunlong Jiao

Association for Computational Linguistics (Findings of ACL), 2023

Shows Task-adaptive Pre-training (TAPT) as a simple yet effective method for semi-supervised learning (often SoTA performance). Highlights the effectiveness of TAPT even with only a few hundred unlabelled samples (in contrary to the common belief that continued pre-training requires a large amount of unlabelled data).

Paper preview

StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts

Zhengyan Shi, Qiang Zhang, Aldo Lipani

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2022

Introduces StepGame, a new benchmark for testing multi-hop spatial reasoning in texts. This dataset challenges models to perform robust spatial reasoning across multiple steps, providing a valuable tool for advancing natural language understanding in complex spatial scenarios.

Teaching Activities

Recent News

2024/10 New preprint on likelihood over-optimisation in direct alignment algorithms is now available on arXiv.
2024/09 Paper on "Instruction Tuning With Loss Over Instructions" accepted to NeurIPS 2024!
2024/01 Paper on "DePT: Decomposed Prompt Tuning" accepted to ICLR 2024!
2023/09 Paper accepted to NeurIPS 2023 on powerful prompt-based fine-tuning!

Academic Services

Program Committee: NeurIPS (2023, 2024), ICML (2024), ICLR (2025), AAAI (2023, 2024), COLM (2024), ACL ARR (Feb. 2023 - Jan. 2024), ACL (2023), EMNLP (2022, 2023), EACL (2023), COLING (2023, 2024), ECML/PKDD (2022), KDD (2023), SIGIR (2022, 2023, 2024), ECIR (2024), SDM (2024)

๐Ÿง‘โ€๐Ÿš€
Use arrow keys to move the astronaut! ๐Ÿง‘โ€๐Ÿš€
๐Ÿค–