Homepage - Yuwei Zhang

Yuwei Zhang

Ph.D. Student, Halıcıoğlu Data Science Institute, UC San Diego

About Me

I am a Ph.D. student at the Halıcıoğlu Data Science Institute, UC San Diego, advised by Prof. Jingbo Shang. My research focuses on LLM post-training, continual adaptation, and memory.

I am especially interested in how language models can improve after pretraining: learning from limited task data, rare success signals, long-horizon feedback, and persistent context while retaining prior knowledge. My recent work studies knowledge injection, on-policy self-distillation / reinforcement learning, and memory-based context management.

Beyond research, I am passionate about sports, especially hiking, basketball and motor racing.

Finally, I want to express my sincere gratitude to my advisors, collaborators, and mentors for their invaluable guidance and support throughout my research and life.

Curriculum Vitae

Education

University of California San Diego

Halıcıoğlu Data Science Institute

Ph.D. in Data Science

2023 - present
University of California San Diego

Electrical Engineering

M.S. in Electrical Engineering, Machine Learning & Data Science

2021 - 2023
Nankai University

Physics

B.S. in Physics

2016 - 2020

Experience

Google DeepMind

Research Scientist Intern

Summer 2026
Amazon Rufus (Foundation Models Team)

Applied Scientist Intern

Summer 2025 - Spring 2026
AMD Research

Research Scientist Intern

Spring 2025
Tencent AI Lab

Research Scientist Intern

Summer 2024
Amazon AWS

Applied Scientist Intern

Summer 2023

News

2026

We released HERO, our work on agentic self-distillation.

Jun 13

I advanced to Ph.D. candidacy! The presentation slides are available here.

Jun 10

We released the paper, code and blog for RESD, our work on on-policy self-distillation.

May 12

Our work on agent context management CoMem was accepted to ICML 2026.

May 01

Our work on knowledge injection WikiDYK was accepted to ACL 2026 as an oral presentation.

Apr 07

Selected Publications (view all )

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

Haoran Liu*, Yuwei Zhang*, Xiyao Li, Bohan Lyu, Jingbo Shang (* equal contribution)

Preprint 2026

HERO turns next environment observations into hindsight reflection signals for multi-turn agent self-distillation, improving task success and reducing unnecessary tool-use turns under limited training budgets.

[Paper]

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

Haoran Liu*, Yuwei Zhang*, Xiyao Li, Bohan Lyu, Jingbo Shang (* equal contribution)

Preprint 2026

[Paper]

CoMem: Context Management with A Decoupled Long-Context Model

Yuwei Zhang, Chengyu Dong, Shuowei Jin, Changlong Yu, Hejie Cui, Hongye Jin, Xinyang Zhang, Hamed Bonab, Colin Lockard, Jianshu Chen, Zhenyu Shi, Jingbo Shang, Xian Li, Bing Yin

ICML 2026

CoMem decouples agent reasoning from memory summarization so long-horizon agents can preserve most long-context performance while reducing latency through asynchronous context management.

[Paper] [Code]

CoMem: Context Management with A Decoupled Long-Context Model

Yuwei Zhang, Chengyu Dong, Shuowei Jin, Changlong Yu, Hejie Cui, Hongye Jin, Xinyang Zhang, Hamed Bonab, Colin Lockard, Jianshu Chen, Zhenyu Shi, Jingbo Shang, Xian Li, Bing Yin

ICML 2026

CoMem decouples agent reasoning from memory summarization so long-horizon agents can preserve most long-context performance while reducing latency through asynchronous context management.

[Paper] [Code]

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Yuwei Zhang, Sha Li, Changlong Yu, Qin Lu, Shuowei Jin, Chengyu Dong, Haoran Liu, Ilgee Hong, Xintong Li, Zhenyu Shi, Bing Yin, Jingbo Shang

Preprint 2026

RESD turns failed rollouts into reflection-based supervision and reusable playbook knowledge, letting models improve efficiently even when successful rollouts are rare.

[Paper] [Project Page] [Code]

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Yuwei Zhang, Sha Li, Changlong Yu, Qin Lu, Shuowei Jin, Chengyu Dong, Haoran Liu, Ilgee Hong, Xintong Li, Zhenyu Shi, Bing Yin, Jingbo Shang

Preprint 2026

RESD turns failed rollouts into reflection-based supervision and reusable playbook knowledge, letting models improve efficiently even when successful rollouts are rare.

[Paper] [Project Page] [Code]

Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection

Yuwei Zhang, Wenhao Yu, Shangbin Feng, Yifan Zhu, Letian Peng, Jayanth Srinivasa, Gaowen Liu, Jingbo Shang

ACL 2026 Oral

WikiDYK benchmarks real-world knowledge injection from fresh Wikipedia facts and shows bidirectional language models memorize injected knowledge more reliably than causal LMs.

[Paper] [Code]

Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection

Yuwei Zhang, Wenhao Yu, Shangbin Feng, Yifan Zhu, Letian Peng, Jayanth Srinivasa, Gaowen Liu, Jingbo Shang

ACL 2026 Oral

WikiDYK benchmarks real-world knowledge injection from fresh Wikipedia facts and shows bidirectional language models memorize injected knowledge more reliably than causal LMs.

[Paper] [Code]

Education

Experience

News

Selected Publications (view all )

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

CoMem: Context Management with A Decoupled Long-Context Model

CoMem: Context Management with A Decoupled Long-Context Model

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection

Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection

All publications