Shuqing Shi

Shuqing Shi 史述青

Email: shuqing.shi[at]kcl[dot]ac[dot]uk

I am a Ph.D. student in Computer Science at King's College London, supervised by Dr. Yali Du. Previously, I received my M.Sc. and B.Eng. from the University of Electronic Science and Technology of China (UESTC).

My research focuses on building AI agents that can effectively coordinate and cooperate in complex multi-agent environments. I'm particularly interested in the fundamental question: How can heterogeneous agents learn to cooperate in a complex environment in an efficient and scalable way?

Research Interests

  • Multi-Agent Reinforcement Learning: Developing algorithms for coordination and cooperation in complex environments.
  • Cooperative Game Theory: Understanding coalition formation and value distribution among agents.
  • Causal Inference: Applying counterfactual reasoning for better credit assignment and generalization.

📰 News

Jan 2026 Two papers accepted to ICLR 2026: BRIDGE and SocialJax!
Oct 2025 Recognized as NeurIPS 2025 Top Reviewer (Top 8–10%)! 🎉
Sep 2025 Our paper on causal policy learning received NeurIPS 2025 Spotlight (Top 3%)! 🌟
Sep 2025 Paper on LLM-based agents accepted to NeurIPS 2025 Datasets & Benchmarks Track.
Sep 2024 Paper on stochastic cooperative games accepted to NeurIPS 2024.

📚 Publications

* Equal contribution

Preprints
Under Review

Resolving Complex Social Dilemmas by Aligning Preferences with Counterfactual Regret

Shuqing Shi, Yudi Zhang, Joel Z. Leibo, Yali Du

We propose a novel approach to resolve complex social dilemmas by aligning agent preferences using counterfactual regret minimization. Our method enables agents to learn cooperative strategies in mixed-motive scenarios where traditional approaches fail.
2026
ICLR 2026

BRIDGE: Bi-level Reinforcement Learning for Dynamic Group Structure in Coalition Formation Games

Shuqing Shi, Nam Phuong Tran, Hao Liang, Debmalya Mandal, Long Tran-Thanh, Yali Du

In The Thirteenth International Conference on Learning Representations (ICLR), 2026

Coalition formation is fundamental to multi-agent cooperation, yet existing approaches typically treat it as a static problem. We propose BRIDGE, a bi-level reinforcement learning framework that jointly learns when to form coalitions (high-level) and how to coordinate within them (low-level).
@inproceedings{shi2026bridge, title={BRIDGE: Bi-level Reinforcement Learning for Dynamic Group Structure in Coalition Formation Games}, author={Shi, Shuqing and Tran, Nam Phuong and Liang, Hao and Mandal, Debmalya and Tran-Thanh, Long and Du, Yali}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2026} }
ICLR 2026

SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Zihao Guo*, Shuqing Shi*, Richard Willis, Tristan Tomilin, Joel Z. Leibo, Yali Du

In The Thirteenth International Conference on Learning Representations (ICLR), 2026

We present SocialJax, a JAX-based evaluation suite for multi-agent reinforcement learning in sequential social dilemmas. Built for speed, SocialJax runs thousands of episodes in seconds on a single GPU while providing standard implementations of classic social dilemmas.
@inproceedings{guo2026socialjax, title={SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas}, author={Guo, Zihao and Shi, Shuqing and Willis, Richard and Tomilin, Tristan and Leibo, Joel Z. and Du, Yali}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2026} }
2025
NeurIPS 2025 🌟 Spotlight (Top 3%)

Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems

Hao Liang*, Shuqing Shi*, Yudi Zhang, Biwei Huang, Yali Du

In The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

We propose a principled approach that combines causal inference with locality principles for provably generalizable and scalable policy learning in networked systems. Our method achieves state-of-the-art performance while providing theoretical guarantees.
@inproceedings{liang2025causality, title={Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems}, author={Liang, Hao and Shi, Shuqing and Zhang, Yudi and Huang, Biwei and Du, Yali}, booktitle={The Thirty-Ninth Annual Conference on Neural Information Processing Systems}, year={2025} }
NeurIPS D&B 2025

Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

Chandler Smith et al., Shuqing Shi et al.

In NeurIPS Datasets and Benchmarks Track, 2025

We evaluate the generalization capabilities of LLM-based agents in mixed-motive scenarios using the Concordia framework, providing insights into how large language models perform in multi-agent social interactions.
@inproceedings{smith2025evaluating, title={Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia}, author={Smith, Chandler and others}, booktitle={NeurIPS Datasets and Benchmarks Track}, year={2025} }
2024
NeurIPS 2024

Learning the Expected Core of Strictly Convex Stochastic Cooperative Games

Nam P. Tran, Shuqing Shi, Debmalya Mandal, Yali Du, Long Tran-Thanh

In The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024

We study the problem of learning the expected core in strictly convex stochastic cooperative games, providing theoretical analysis and practical algorithms for stable coalition value distribution under uncertainty.
@inproceedings{tran2024learning, title={Learning the Expected Core of Strictly Convex Stochastic Cooperative Games}, author={Tran, Nam P. and Shi, Shuqing and Mandal, Debmalya and Du, Yali and Tran-Thanh, Long}, booktitle={The Thirty-Eighth Annual Conference on Neural Information Processing Systems}, year={2024} }
2022
IJCNN 2022

Solving Poker Games Efficiently: Adaptive Memory Based Deep Counterfactual Regret Minimization

Shuqing Shi, Xiaobin Wang, Dong Hao, Zhiyou Yang, Hong Qu

In International Joint Conference on Neural Networks (IJCNN), 2022

We propose an adaptive memory-based approach to deep counterfactual regret minimization for solving large-scale poker games more efficiently, reducing memory requirements while maintaining solution quality.
@inproceedings{shi2022solving, title={Solving Poker Games Efficiently: Adaptive Memory Based Deep Counterfactual Regret Minimization}, author={Shi, Shuqing and Wang, Xiaobin and Hao, Dong and Yang, Zhiyou and Qu, Hong}, booktitle={International Joint Conference on Neural Networks}, year={2022} }

💼 Experience

The Chinese University of Hong Kong, Shenzhen

Feb 2023 – Jul 2023
Research Assistant · Supervised by Prof. Guiliang Liu

Worked on counterfactual policy evaluation under out-of-distribution (OOD) shifts. By intervening on the action-value function in factorized spaces, achieved more accurate estimation on OOD samples.

Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)

Dec 2022 – Feb 2023
Visiting Student · Supervised by Prof. Zhiqiang Xu

Studied an online RL setting with reward-weighted methods to reweight policy distributions, enabling efficient discovery of globally optimal policy distributions.

🏆 Awards

🌟 NeurIPS 2025 Spotlight (Top 3%) Sep 2025
🎯 NeurIPS 2025 Top Reviewer (Top 8–10%) Oct 2025
Outstanding Student Scholarship (UESTC) Oct 2020
Outstanding Student Scholarship (UESTC) Oct 2018
Outstanding Student Scholarship (UESTC) Oct 2017

🎓 Academic Services