To address the challenge of enabling large-scale models to reason, act, and learn collectively, we introduce MrlX, a novel asynchronous co-evolutionary framework designed for Large Language Model (LLM) agents. This framework establishes a symbiotic relationship between an on-policy explorer and an off-policy adapter, facilitating rapid and stable capability enhancement through continuous mutual improvement. We demonstrate the efficacy of MrlX through two case studies: a doctor-patient co-training simulation for diagnostic interviewing and a multi-agent research pipeline. In both scenarios, the multi-agent approach significantly outperforms single-agent baselines, showcasing the framework’s potential in complex, collaborative tasks.
At the heart of this architecture are two distinct roles:
This architecture creates a highly efficient and stable learning system where the two agents act as stepping stones for one another. The result is a synergistic, upward spiral of capabilities driven by continuous co-evolution.
Drawing upon this multi-agent coordinated training framework, we have implemented and validated it across multiple domains. Two illustrative case studies are presented below.
In the context of clinical diagnosis, where effective patient interviewing is pivotal, conventional self-play paradigms that only update the clinician agent are insufficient. Such approaches miss the mutual gains achievable when the patient simulator also evolves. Our work demonstrates that co-training both clinician and patient agents via joint multi-agent reinforcement learning markedly outperforms clinician-only training. This underscores the necessity of co-evolving all interacting agents to enhance the performance of LLM-driven diagnostic interviewing.
For this demo, we adopt data and reward definitions from DoctorAgent-RL.
As illustrated by the reward curves below, the joint training methodology produces significant improvements for both agents compared to the clinician-only baseline. This highlights the value of co-evolving patient simulators to achieve superior diagnostic interviewing capabilities in LLMs.
Results from the doctor-patient co-training show that mutual asynchronous evolution of the participating interactive agents yields measurable gains over single-agent training. Building on this finding, we next explore a domain with distinct challenges—context-length explosion and frequent multi-tool usage—addressed through MrlX’s multi-agent research pipeline.
Project Link: MrlX-DeepResearch
In complex reasoning tasks requiring external tool usage, single-agent LLM systems face a context length explosion because the continuous history of tool interactions, verbose outputs, and intermediate reasoning steps rapidly accumulates, overwhelming the model’s finite context window. Our work adopts a multi-agent architecture where distinct LLMs specialize in separate roles while training separately at the same time, enabling them to interact and dynamically generate training samples for one another. This co-evolutionary process allows each agent to deepen its expertise on its specific task, a clear departure from single-model approaches that simulate roles.
The results demonstrate that our multi-agent setup with distinct, specialized LLMs (green line) converges faster and achieves higher rewards than a single-agent baseline configuration (blue line).
As the single-agent baseline configuration slightly difffers from the multi-agent setup, the initial rewards are inconsistent. More rigorous ablation study will be presented in the following paper.
The case studies confirm that asynchronous co-evolution of specialized LLM agents delivers measurable gains in stability, convergence speed, and task performance over single-agent approaches. Looking forward, MrlX will be extended to more diverse multi-agent ecosystems beyond LLMs, and adapted for real-time dynamic environments with potential self-organizing role allocation mechanisms.
@misc{mrlx2025,
title = {MrlX: A Multi-Agent Reinforcement Learning Framework},
author = {AQ Team},
year = {2025},
publisher={GitHub},
howpublished = {\url{https://github.com/AQ-MedAI/MrlX}},
}