All Posts note 5 min read

Multi-Agent Conversational AI

What production conversational AI takes once you stop treating it as request-response: instant talk, slow thinking, one hard invariant, and patience for users who don't wait their turn.

Contents

Most AI products are a search box that learned to type. You send a message, you wait, a wall of text appears. That’s a transaction. People feel the difference from a conversation in the first ten seconds.

Watch someone talk to an agent they actually trust. They interrupt. They split one thought across five messages. They ask a question, then add the context that mattered a beat later. They go quiet for three minutes and come back mid-sentence. In our product, a third of all user messages arrive before the agent has finished answering the one before. People don’t wait their turn.

The tail is what breaks naive systems. Most people fire two or three messages in a row, plenty fire five or more, and one fired thirty-six before the agent could get a word in. Request-response holds none of it. We published the principles a while back. This is what survived contact with real users, across tens of thousands of conversations and sessions that run a full hour.

Talking and thinking are different jobs#

The useful work is slow. A real answer can mean several tool calls, several model turns, cross-referencing everything you know about someone. Done inside the conversation, that’s a minute of the user watching a typing indicator. Nobody waits a minute.

So the part that talks and the part that thinks are different processes. One stays present: it acknowledges, handles the back-and-forth, and never blocks. The other does the slow work beside it and hands the result over when it’s ready. The user feels a fast, attentive conversation with a depth that arrives on its own.

They don’t talk to each other directly. They share a Terra Kernel, a small in-session document store: the analysis process writes its result into a slot, the conversation process subscribes and picks it up on its next turn. No messages passed between agents, no choreography to fall out of sync, just a shared document with pub/sub.

That was a deliberate choice. Two agents coordinating by sending each other messages is a distributed-systems problem you do not want inside a live conversation: ordering, retries, partial failure, all of it in the hot path. A shared store collapses that into a single question the conversation process can answer instantly: what’s in the slot right now.

The store is small and strict, and the strictness is the point. Reads never block: an agent asks for a slot and gets the current document or nothing, never a wait. A writer locks the slot, does its work, and commits in one atomic step, and because the lock belongs to whoever holds it, a crash mid-write releases it on the way down. No deadlock, no half-written state, no cleanup code. The whole session sits under one supervisor: kill the store and every agent restarts with it, kill one agent and only that one comes back. None of that is code we wrote. It’s what running agents as real processes buys you, and most of why we do.

How a slot reads back is the elegant part. The kernel wraps its content with the slot’s state, locked or idle, and the time it was last written. So when the conversation agent reads the analysis slot, the model can see for itself whether the work is still in flight or already settled. Coordination that usually hides under the runtime becomes something the model reasons about directly.

The split pays off again on interruption. Most of what people send mid-thought is more context, not a new question. If the slow work lived inside the conversation, every “oh, and also” would cancel it and start over. Two processes means the thinking ignores the chatter and finishes.

One invariant carries the whole system#

Three things have to stay identical, always: what the user saw, what you persisted, and what the model sees on the next turn.

Break it in either direction and the illusion collapses. Save more than the user saw, and the agent references a sentence they never read. Save less, and it forgets something they’re certain they said. Add streaming, interruptions, and that background work all landing at once, and keeping the three in lockstep becomes most of the engineering. It sounds trivial written down. It is the hardest thing in the system, and most of what makes an interruption feel seamless instead of broken is holding this line under pressure.

Impatient users get answered, not dropped#

When a third of messages arrive before the last answer is done, the naive design, cancel the response and start over, cancel again, burns a minute producing nothing. Five messages, five false starts, zero replies.

So a burst isn’t five interruptions, it’s one input. Let the messages land, read them together, answer all of it at once. The person who fired off five quick lines gets a single reply that addresses every one, which is what they wanted when they sent them.

The patterns are easy to say and brutal to ship#

You can write the whole approach in three lines. Split talking from thinking. Hold one invariant. Batch the impatient. The sentences are simple.

Everything underneath them is not. What counts as a complete thought worth delivering. When silence means “keep going” and when it means “stop.” How to recover when an interruption, a background result, and a five-message burst all land in the same second. Those thousand judgments are where the years went, and they don’t fit in an article or a framework. That part is the product.

Watching it run#

A system this concurrent is only as trustworthy as your ability to see inside it. Terra emits a telemetry event for everything an agent does: state changes, invocations, tool calls, kernel reads and writes. We batch those events, stream them over NATS, and land them in ClickHouse, where a conversation from three months ago can be pulled apart event by event. When something looks wrong in production, you don’t guess. You query.

agent telemetry · production
telemetry eventsTerra agentsstate · invoke · tool · kernelover NATSbatchedClickHousequery · replaya conversation, months later
Every agent action becomes an event, batched to ClickHouse, queryable months later.

The full stack#

Rune is the memory: an append-only graph of how a user’s life evolves across sessions. Terra is the runtime: stateful agent processes, context aging, shared state. This is the layer that makes them feel like a person, instant, interruptible, and patient with the way people actually type.

Three problems, three layers. Rune remembers. Terra runs. This orchestrates. Together they’re a conversation worth coming back to a hundred times.


Built at Anuvaya. We’re building AI in India.

Third of three. Previously: Rune, long-term memory for AI agents and Terra, an agent framework extracted from production.

Written by

Nitesh Kumar Niranjan

Founder, Anuvaya Labs

Nitesh works across product, engineering, and design at Anuvaya Labs.