Rune - Long-Term Memory for AI Agents

The simplest memory an LLM has is its context window. We solved that layer: a context engine that keeps it fresh and consistent within a session.

But what does the agent know about you on session 98?

The industry answer is extraction. Pull facts from conversations (“user is planning a trip,” “user left a job under bad circumstances”), turn them into memory units, and retrieve them next session. As the category matured, the stack got better: rerankers, entity linking, profiles, graph relationships, version chains, automatic forgetting. Useful improvements. But the primitive stayed the same: extract a memory now, decide how it should evolve, and trust retrieval to reconstruct the right story later.

Some systems compress aggressively. Some version. Some forget. Some build graphs around the extracted facts. All of them are trying to solve the same problem: make remembered information manageable at read time.

These systems solve real problems well. If your agent mainly needs preference recall, profile building, or cheap personalization, this stack is sensible.

Our objection is narrower. For long-lived advisory relationships, continuity lives in the path between facts.

And compression is exactly where that path starts disappearing. Once a history is flattened into summaries, merged memories, or a current profile, the connective tissue is gone.

If you’ve used Claude Code or Codex, you’ve watched one version of that tradeoff in real time. The conversation compacts. Earlier work flattens into a summary. Details vanish. Fine for a coding session. For a user who comes back hundreds of times over years, each compression pass bets that what it discards won’t matter later.

We made that bet for a year. Five versions. Lost it often enough to stop.

The wrong question#

Our first attempts tried to decide what mattered at write time. But importance isn’t a property of information in isolation. It’s a property of how information connects to everything else. A user mentions leaving a job in session 3. Twelve sessions later, they’re hesitant about a new opportunity. You understand why without being told. Compressed memory misses that connection. Write-time selection has false negatives, and in conversation, false negatives compound.

Later versions tried the inverse: keep everything, discard what seems irrelevant. That broke across users. The discard logic that’s right for one profile corrupts memory for another.

Same destination every time: retrieval had to become precise enough that keeping everything remained viable.

That is the boundary of our claim. Factual memory is the wrong atom for the kind of continuity we need.

Structure, not compression#

Rune is an append-only knowledge graph. Five node types: entities (people, places, things), states (life domains like career, marriage, health), relationships (how two entities connect), events (something that happened), and assertions (exact words worth preserving).

The topology does the heavy lifting. States and relationships are hubs. Events are joints between them. Assertions are leaves, nothing ever points to a leaf.

This is the key difference. Rune stores states and transitions, with causality as a first-class property.

entitystaterelationshipeventassertion

States and events form the causal spine; assertions hang as leaves; one exit event branches into multiple life domains. Hover a node to trace its neighborhood.

Walk any chain and you get a coherent causal story: a new job sits inside a sequence that includes the previous role, the exit, and whatever else shaped the transition. Compare this to a star topology:

star · flat

chain · causal

A star pins facts to a central entity but has no path between them. A chain is traversable — walk it and you reconstruct the story.

Stars are easy to build. They’re also flat. You can’t traverse them and reconstruct a narrative. Chains can.

A fact graph tells you what is true, what was updated, and what extends what. A causal graph tells you what changed, what it changed from, what else it touched, and which exact words anchored the transition. For us, that distinction is the whole system.

Preferences live here too. How someone wants to be engaged (direct answers, no sugarcoating), these are assertions on the agent-user relationship chain. Same structure that tracks “got promoted” tracks “prefers bluntness.”

LLM reads, deterministic code writes#

The LLM is powerful but unreliable. So we split the pipeline.

A reader agent examines the conversation and proposes a structured extraction plan. A deterministic builder executes it against hard constraints. The LLM never touches the graph.

The constraint system is the fence. A static edge-validity table defines which node types can connect: assertions can never source edges, entities go through relationships, states need events as joints. Semantic rules add what topology alone can’t enforce: one root state per topic, no duplicate relationships between the same pair. When a single event affects multiple domains, shadow nodes keep the DAG intact. If any constraint fails mid-operation, everything rolls back. No half-written graphs.

The plan is the contract between the two sides. The LLM proposes operations using identifiers it invented. The builder maps those to sequence numbers. Bad references get rejected; the error feeds back for self-correction. The LLM proposes. Deterministic code enforces. The graph stays clean.

Any AI system where an LLM mutates structured state benefits from this separation.

Append-only#

You never update or delete nodes. Events evolve states. History is always there. The LLM can’t corrupt what exists, only add to it.

This handles a tension most memory systems ignore: discovery order versus causal order. Information arrives piecemeal, out of sequence, across sessions. Writes happen when we learn something. Reads reconstruct what actually happened and when. A user reveals in session twelve that they hadn’t spoken to their partner for six months after a career setback. Session three stays intact. New paths connect nodes that already exist.

Current state is read from the graph. The past stays exactly where it happened.

Retrieval is graph traversal, not search. It narrows deterministically, entity then domain, then walks the relevant chain. Reads land in around 20 milliseconds. That speed is the whole reason we built a graph: the structure answers most questions outright, so we almost never fall back to vector search.

structured graph read latency · production

Live memory reads in production. p90 stays under 100ms; the structure answers most queries without semantic search.

So what does this enable?#

Session 98 carries the full structured history of sessions 1 through 97. Every relationship, event, preference: navigable. No nightly compression deciding what you’re allowed to remember. In production our power users are already there: relationships that cross 100 sessions, the deepest past 850.

A human professional who has advised thousands of people carries pattern recognition no single client could produce. They’ve seen this before. They know what to ask next. Rune is built for the same arc. Each user’s graph stays their own; the system’s read on the domain compounds with every conversation it has ever had.

How we measure this#

Rune runs in production today, with over 147,000 graph records built from real conversations.

conversations remembered since launch

Conversations Rune has built memory from since it launched, March 2025.

We don’t chase fact-recall benchmarks. They reward remembering that a user likes coffee. Our problem is harder and differently shaped: can the system reconstruct why a career change three months ago connects to a relationship that started unraveling a year before that?

So we hold Rune to three questions. Did extraction build the right structure from the conversation? Can retrieval reconstruct the causal chain later? Does that hold as the sessions pile up?

We answer them by replay against real conversation data: build memory through session n-1, replay session n, measure whether what matters actually surfaces. The target is the messy, nonlinear, contradictory way people talk about their lives across months, not a synthetic dataset.

Rune is built for the moment where one missed session distorts the entire present.

The previous five versions failed that test in different ways. Rune cleared it.

The wrong question#

Structure, not compression#

LLM reads, deterministic code writes#

Append-only#

So what does this enable?#

How we measure this#

Nitesh Kumar Niranjan

Related Reading

Terra: An Agent Framework Extracted from Production

RCE - Realtime Context Engine

Stateful Agent Orchestration - Human-Like Conversational AI