Alibaba – Pierre-Marcel De Mussac

Three releases in 72 hours, one direction: multi-agent ensembles, 35-hour autonomous runs, and 6-layer memory just became default.

In the last three days, three independent releases landed that on the surface look unrelated. Anthropic shipped Dynamic Workflows in Claude Code on May 28. Alibaba launched Qwen3.7-Plus on Bailian on June 2. A community developer named Claudio Drews open-sourced Memory OS, a 6-layer memory stack for Hermes Agent, on June 1.

Read them separately and you get three product announcements. Read them together and you get the same argument from three different labs: the single-agent ReAct loop is over.

Plural in space

Anthropic’s Dynamic Workflows lets Claude write its own orchestration scripts. The scripts fan out up to 16 concurrent subagents, with a 1,000-agent cap per run. Critic subagents try to refute the findings of the others, and the run keeps iterating until the answers converge. The proof point Anthropic surfaced was Jarred Sumner closing 99.8% of an existing test suite across roughly 750,000 lines of Rust in 11 days from first commit to merge.

That is not a “more agents” story. It is a “verify by refutation” story. Most multi-agent frameworks use parallelism for speed or for ensembling. Anthropic is using parallelism for adversarial check: agents are explicitly tasked with looking for reasons the previous findings are wrong before anything ships back up. The shape change is that the unit of work is now a multi-agent ensemble that decomposes-then-verifies, not a single agent that reasons-then-acts.

Persistent in time

Qwen3.7-Plus on Bailian made a much louder claim and a much quieter design choice. The loud claim: a 35-hour autonomous run without measurable degradation, chaining over 1,000 tool calls in a single session. The quiet design choice: Bailian itself adds an Agentic RL layer that uses real-world execution feedback to refine accuracy over time, a continual-learning feature operating above the base model.

The 35-hour number is vendor-published and needs harness disclosure before anyone bets infrastructure on it. But the framing matters even if the exact number does not survive scrutiny. The implicit assertion is that agents need to extend in time, not just in fan-out. The work that justifies an “agent” is the work that does not fit inside one human session, one chat history, one ReAct loop. If reproducible, this changes what kinds of long-running tasks are economically viable to delegate to AI.

Layered in memory

Memory OS is the smallest release of the three and the most architecturally honest. A single community developer published a 6-layer memory stack built on top of Hermes Agent: workspace files injected into the system prompt, session history with full-text search, structured facts with trust scoring, LLM-powered session extraction, a Qdrant vector database with hybrid retrieval, and an auto-curated wiki that gets re-ingested back into vectors. Apache 2.0, no benchmarks published yet, brand-new repo.

The code is early-stage and you probably should not bet production on it today. But the design lesson is the point. Most agent builders end up assembling some version of this stack ad-hoc, then collapsing it into one “vector DB and pray” pattern when they ship. Memory OS is an explicit articulation that memory has at least six distinct purposes (system context, session recall, fact storage, semantic search, structured extraction, knowledge accumulation) and that each one deserves its own mechanism. The layering is the takeaway whether or not you ever touch the code.

The connecting thread

Plural in space. Persistent in time. Layered in memory.

These three properties are what builders end up needing to assemble when they push past the demo phase. The single-agent ReAct loop with simple chat history was the v1 abstraction, and it has been the implicit unit of work in agent platforms for about two years. June 2026 is the month three different teams quietly confirmed that the v1 is being replaced.

The interesting thing about reading the releases together is that none of them claim to be doing the same thing. Anthropic is not pitching memory infrastructure. Alibaba is not pitching adversarial verification. Memory OS is not pitching long-running agents. Each is solving a specific bottleneck the v1 abstraction hit. But the bottlenecks were always the same three: a single agent cannot cover ground broadly enough to be sure, cannot operate long enough to finish complex tasks, and cannot remember enough to stay coherent across resets.

You can extend the unit of work along any of the three axes. Anthropic extended on the space axis. Alibaba extended on the time axis. Memory OS extended on the memory axis. The agents in production a year from now will be the ones that extend along all three, and they will mostly stop being called “agents.” They will be called platforms, systems, fleets, or pipelines. The word “agent” will keep referring to the v1 single-loop thing for a while out of inertia, but the work is moving past it.

What changes Monday morning

If you are shipping single-agent workflows that are hitting walls: pick the axis you are hitting first. If your agent is making confident wrong decisions, you need adversarial verification (Anthropic’s pattern). If your agent runs out of context or time on real tasks, you need persistence (Alibaba’s pattern, even if you build it yourself). If your agent loses coherence across sessions, you need layered memory (Memory OS’s pattern).

If you are building an agent platform: the v1 ReAct primitive is no longer enough as a default. The next default has all three properties, and the design questions are how to compose them rather than which to pick.

And if you are a researcher: the convergence happening across labs without any apparent coordination is the signal worth tracking. Three different teams in 72 hours just argued the same thing without referencing each other. That usually means the field is finishing one phase and starting another.

Tag: Alibaba