Publications - Yuwei Zhang

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

Haoran Liu*, Yuwei Zhang*, Xiyao Li, Bohan Lyu, Jingbo Shang (* equal contribution)

Preprint 2026

HERO turns next environment observations into hindsight reflection signals for multi-turn agent self-distillation, improving task success and reducing unnecessary tool-use turns under limited training budgets.

[Paper]

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

Haoran Liu*, Yuwei Zhang*, Xiyao Li, Bohan Lyu, Jingbo Shang (* equal contribution)

Preprint 2026

HERO turns next environment observations into hindsight reflection signals for multi-turn agent self-distillation, improving task success and reducing unnecessary tool-use turns under limited training budgets.

[Paper]

CoMem: Context Management with A Decoupled Long-Context Model

Yuwei Zhang, Chengyu Dong, Shuowei Jin, Changlong Yu, Hejie Cui, Hongye Jin, Xinyang Zhang, Hamed Bonab, Colin Lockard, Jianshu Chen, Zhenyu Shi, Jingbo Shang, Xian Li, Bing Yin

ICML 2026

CoMem decouples agent reasoning from memory summarization so long-horizon agents can preserve most long-context performance while reducing latency through asynchronous context management.

2026

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

CoMem: Context Management with A Decoupled Long-Context Model

CoMem: Context Management with A Decoupled Long-Context Model

ChipMATE: Multi-Agent Training via Reinforcement Learning for Enhanced RTL Generation

ChipMATE: Multi-Agent Training via Reinforcement Learning for Enhanced RTL Generation

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Learning with Rare Success but Rich Feedback via Reflection-Enhanced Self-Distillation

Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection

Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection

2025

MaPPO: Maximum a posteriori preference optimization with prior knowledge

MaPPO: Maximum a posteriori preference optimization with prior knowledge

Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval

Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval

Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning

Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

2024

Controllable Data Augmentation for Few-Shot Text Mining with Chain-of-Thought Attribute Manipulation

Controllable Data Augmentation for Few-Shot Text Mining with Chain-of-Thought Attribute Manipulation

Answer is All You Need: Instruction-following Text Embedding via Answering the Question

Answer is All You Need: Instruction-following Text Embedding via Answering the Question

Can your model tell a negation from an implicature? Unravelling challenges with intent encoders

Can your model tell a negation from an implicature? Unravelling challenges with intent encoders

2023

ClusterLLM: Large Language Models as a Guide for Text Clustering

ClusterLLM: Large Language Models as a Guide for Text Clustering

Toward Unsupervised Realistic Visual Question Answering

Toward Unsupervised Realistic Visual Question Answering

2022

Fine-tuning Pre-trained Language Models for Few-shot Intent Detection: Supervised Pretraining and Isotropization

Fine-tuning Pre-trained Language Models for Few-shot Intent Detection: Supervised Pretraining and Isotropization

New Intent Discovery with Pre-training and Contrastive Learning

New Intent Discovery with Pre-training and Contrastive Learning

Development and validation of a screening model for lung cancer using machine learning: a large-scale, multi-center study of biomarkers in breath

Development and validation of a screening model for lung cancer using machine learning: a large-scale, multi-center study of biomarkers in breath

Automated Calculation of Liquid Crystal Sensing Images Based on Deep Learning

Automated Calculation of Liquid Crystal Sensing Images Based on Deep Learning

Research on assistant diagnosis of fundus optic neuropathy based on deep learning

Research on assistant diagnosis of fundus optic neuropathy based on deep learning

2021

Effectiveness of Pre-training for Few-shot Intent Classification

Effectiveness of Pre-training for Few-shot Intent Classification

KuraNet: Systems of Coupled Oscillators that Learn to Synchronize

KuraNet: Systems of Coupled Oscillators that Learn to Synchronize