Rune — Long-Term Memory for AI Agents
The simplest memory an LLM has is its context window. We solved that layer: a context engine that keeps it fresh and consistent within a session.
But what does the agent know about you on session 98?
The industry answer is extraction. Pull facts from conversations ("user is planning a trip," "user left a job under bad circumstances"), turn them into memory units, and retrieve them next session. As the category matured, the stack got better: rerankers, entity linking, profiles, graph relationships, version chains, automatic forgetting. Useful improvements. But the primitive stayed the same: extract a memory now, decide how it should evolve, and trust retrieval to reconstruct the right story later.
conversation ──► extract facts ──► memory units ──► store/index
│
┌─────────────────────────┼─────────────────────────┐
▼ ▼ ▼
rerank link / version profile / expire
│ │ │
└─────────────────────────┴──────── retrieve ───────┘
│
▼
inject into context
Some systems compress aggressively. Some version. Some forget. Some build graphs around the extracted facts. All of them are trying to solve the same problem: make remembered information manageable at read time.
These systems solve real problems well. If your agent mainly needs preference recall, profile building, or cheap personalization, this stack is sensible.
Our objection is narrower. For long-lived advisory relationships, continuity lives in the path between facts.
And compression is exactly where that path starts disappearing. Once a history is flattened into summaries, merged memories, or a current profile, the connective tissue is gone.
If you've used Claude Code or Codex, you've watched one version of that tradeoff in real time. The conversation compacts. Earlier work flattens into a summary. Details vanish. Fine for a coding session. For a user who comes back hundreds of times over years, each compression pass bets that what it discards won't matter later.
We made that bet for a year. Five versions. Lost it often enough to stop.
The wrong question
Our first attempts tried to decide what mattered at write time. But importance isn't a property of information in isolation. It's a property of how information connects to everything else. A user mentions leaving a job in session 3. Twelve sessions later, they're hesitant about a new opportunity. You understand why without being told. Compressed memory misses that connection. Write-time selection has false negatives, and in conversation, false negatives compound.
Later versions tried the inverse: keep everything, discard what seems irrelevant. That broke across users. The discard logic that's right for one profile corrupts memory for another.
Same destination every time: retrieval had to become precise enough that keeping everything remained viable.
That is the boundary of our claim. Factual memory is the wrong atom for the kind of continuity we need.
Structure, not compression
Rune is an append-only knowledge graph. Five node types: entities (people, places, things), states (life domains like career, marriage, health), relationships (how two entities connect), events (something that happened), and assertions (exact words worth preserving).
The topology does the heavy lifting. States and relationships are hubs. Events are joints between them. Assertions are leaves, nothing ever points to a leaf.
This is the key difference. Rune stores states and transitions, with causality as a first-class property.
┌─ assertion: "I'm done with that place"
│
entity ──► state ──► event ──► state ──► event ──► state
(user) (career: │ (career: │ (career:
stable) │ searching) │ new role)
│ │
│ └─ assertion: "finally feel settled"
│
└──► state
(finance:
tight)
Walk any chain and you get a coherent causal story: a new job sits inside a sequence that includes the previous role, the exit, and whatever else shaped the transition. Compare this to a star topology:
star (flat): chain (causal):
assertion entity ── state ── event ── state
│ │
fact ── user ── fact └── state
│ │
assertion assertion
Stars are easy to build. They're also flat. You can't traverse them and reconstruct a narrative. Chains can.
A fact graph tells you what is true, what was updated, and what extends what. A causal graph tells you what changed, what it changed from, what else it touched, and which exact words anchored the transition. For us, that distinction is the whole system.
Preferences live here too. How someone wants to be engaged (direct answers, no sugarcoating), these are assertions on the agent-user relationship chain. Same structure that tracks "got promoted" tracks "prefers bluntness."
LLM reads, deterministic code writes
The LLM is powerful but unreliable. So we split the pipeline.
conversation ──► reader agent (LLM)
│
│ queries existing graph
│ reasons about what's new
▼
extraction plan ◄── typed contract
│
▼
builder (deterministic)
│
│ validates against constraints
│ rolls back on failure
▼
graph
│
error? ──┘──► feed back to reader for retry
A reader agent examines the conversation and proposes a structured extraction plan. A deterministic builder executes it against hard constraints. The LLM never touches the graph.
The constraint system is the fence. A static edge-validity table defines which node types can connect: assertions can never source edges, entities go through relationships, states need events as joints. Semantic rules add what topology alone can't enforce: one root state per topic, no duplicate relationships between the same pair. When a single event affects multiple domains, shadow nodes keep the DAG intact. If any constraint fails mid-operation, everything rolls back. No half-written graphs.
The plan is the contract between the two sides. The LLM proposes operations using identifiers it invented. The builder maps those to sequence numbers. Bad references get rejected; the error feeds back for self-correction. The LLM proposes. Deterministic code enforces. The graph stays clean.
Any AI system where an LLM mutates structured state benefits from this separation.
Append-only
You never update or delete nodes. Events evolve states. History is always there. The LLM can't corrupt what exists, only add to it.
This handles a tension most memory systems ignore: discovery order versus causal order. Information arrives piecemeal, out of sequence, across sessions. Writes happen when we learn something. Reads reconstruct what actually happened and when. A user reveals in session twelve that they hadn't spoken to their partner for six months after a career setback. Session three stays intact. New paths connect nodes that already exist.
The full history remains available.
Rune keeps the full timeline intact. Current state is read from the graph, while the past remains exactly where it happened.
Retrieval narrows deterministically, entity then domain, before any vector search runs. The expensive part operates on a handful of nodes. Query latency stays under 100 milliseconds because the structure is good, not because the search engine is.
So what does this enable?
Session 98 carries the full structured history of sessions 1 through 97. Every relationship, event, preference: navigable. No nightly compression deciding what you're allowed to remember.
A human professional who has advised ten thousand people over twenty years carries pattern recognition that no single client interaction could produce. They've seen this situation before. They know what question to ask next. Rune is designed for the same arc: per-user graphs for individual memory, cross-user pattern recognition for accumulated expertise. The agent gets better at its domain with every conversation it has ever had.
How we measure this
Standard memory benchmarks test fact recall: did the system remember that the user likes coffee? Some test knowledge updates, temporal reasoning, or multi-session recall. Useful benchmarks. Our problem is harder: can the system reconstruct why a career change three months ago is connected to a relationship that started unraveling a year before that?
We validate against real conversation data. Build memory through session n-1, replay session n, measure whether the system surfaces what matters. The target is the actual messy, nonlinear, contradictory way people talk about their lives across months, not synthetic datasets or curated fact-recall examples.
Rune is built for the moment where one missed session distorts the entire present.
The previous five versions failed that test in different ways. Rune cleared it.