Terra: An Agent Framework Extracted from Production

On December 3, 2024, we opened a private beta and one of our agents had its first real conversation. It’s been in production ever since: more than a million tool calls across tens of thousands of people.

LangChain and AutoGPT made agents look easy. Production made them look like toys: the moment you need an agent to survive a crash mid-response, age tool results out of context, or coordinate with another agent in real time, the five-line demo is gone. The JS/TS ecosystem is home for us, but for this problem Elixir was the better fit. Agents are stateful, long-running processes that need to handle concurrent inputs, recover from crashes, and coordinate with each other. That’s what OTP was built for. The tradeoff: Elixir had no AI agent tooling, so we built our own.

Terra is what we extracted from eighteen months of production. After enough iterations on different agent architectures, patterns emerged that were reusable: a lifecycle model, a context aging pipeline, a multi-agent coordination layer. We pulled them into a framework so we could spin up new agent structures quickly instead of rewiring the same plumbing each time.

We’re open-sourcing it: github.com/anuvaya/terra.

conversations served · cumulative · production

Conversations served since the first one, December 2024. Cumulative.

Agents are processes#

A conversational AI agent is a stateful, long-running process. It starts, transitions through states (greeting, active, idle), handles concurrent inputs (user messages arriving while the LLM is mid-response), needs to survive crashes without losing conversation state, and eventually terminates. Two agents might need to coordinate through shared state in real time.

Thousands of our sessions run long, some a full hour of back-and-forth: exactly what a process model is for.

gen_statem is OTP’s state machine primitive. It gives us typed state transitions, process supervision (automatic restart on crash), and transparent distribution (agents addressable across nodes). We didn’t pick it to be contrarian. We picked it because the problem description and the tool description are the same sentence.

You implement five callbacks: init, context, handle_input, handle_stream_event, handle_response. Terra handles the runtime: streaming, tool execution, state transitions, crash recovery. The framework disappears. You think about your agent’s logic, not about plumbing.

Tools#

An agent’s tools are JSON Schema, and writing JSON Schema by hand is misery. Terra gives you a pipeline builder instead:

tool("get_weather")
|> desc("Get current weather for a city")
|> param(:city, :string, required: true, desc: "City name")

That produces the nested schema the provider expects. Objects, enums, and arrays read the same way, top to bottom, no braces to balance.

Tools live in a registry that both declares them and runs them:

defmodule WeatherTools do
  use Terra.ToolRegistry
  import Terra.Tool

  def tools(_state), do: [tool("get_weather") |> param(:city, :string, required: true)]

  def execute("get_weather", %{city: city}, state) do
    {:ok, %{temp: 72, city: city}, state}
  end
end

tools/1 receives the agent’s state, so the toolset can shift with the conversation: a tool can appear, disappear, or change shape based on where the agent is. execute/3 runs the call. You list the registries in the agent’s init, and Terra handles serialization, dispatch, and results.

Execution is eager. The moment a tool call finishes streaming, Terra runs it, so the result is in hand by the time the model’s turn ends. The agent never makes a second round-trip to fetch what the model just asked for.

Context aging#

A tool result from eight turns ago is noise, not signal. Every agent framework leaves this to the developer. Terra treats it as a first-class concern.

Each tool defines its own aging configuration:

tool("get_weather")
|> aging(expiry: 4, pruning: 8)
|> result_template("Weather in <%= @input[:city] %>: <%= @result %>")
|> expiry_message("Weather for <%= @input[:city] %> expired. Re-fetch if needed.")

As turn distance grows, each tool result ages through three states:

Before every LLM invocation, age_tools walks the conversation history in reverse. Each tool result transitions through active (full result via EEx template), expired (short summary), and pruned (removed entirely). The corresponding tool_use blocks in assistant messages are cleaned up too, so the model never sees a tool call without its result.

Thinking blocks get the same treatment. Extended thinking output is large and only useful for recent reasoning. prune_thinking(1) keeps only the most recent turn’s thinking, stripping the rest. Context stays focused.

Multi-agent sessions#

A Session is a supervisor that starts a Kernel (shared document store) and one or more agents. The Kernel holds named slots that any agent can read, write, or subscribe to. When one agent writes to a slot, every subscriber is notified.

This is how we wire a conversation agent and an analysis agent together: the analysis agent writes its results to a slot, the conversation agent subscribes and picks them up on its next turn. No shared mutable state. No message-passing choreography. Just a document store with pub/sub.

The pattern generalizes. A critic reviews the conversation agent’s drafts before they ship. A coach feeds tone guidance back in real time. A reflector keeps a running digest of the session. Anything that has to share state with the live conversation, turn by turn, fits the same shape.

The supervisor is rest_for_one: if the Kernel crashes, all agents restart. If an agent crashes, only that agent restarts. OTP handles this. We didn’t have to build it.

Bring your own provider#

Terra ships with Anthropic, OpenAI, and Google. But the provider boundary is one function: stream/2. Implement it and Terra drives any model, hosted or local, with no other code changes. Switching providers is a config change, not a rewrite.

The messy parts stay contained per provider: SSE parsing, error handling, retries, response normalization. The agent layer never sees a provider-specific type.

What we kept out#

Terra has opinions about what’s not its job:

Persistence. Terra manages in-memory conversation state. How you persist it is your problem. We use Postgres. You might use something else.
Transport. Terra doesn’t know about HTTP, WebSockets, or NATS. It’s a process you send messages to and receive callbacks from. Your transport layer wraps it.
Prompt engineering. The context/2 callback builds the window. What goes in it is your decision. Terra provides the pipeline (aging, documents, system prompt), not the content.

The framework should be less interesting than what you build with it.

Built at Anuvaya. We’re open-sourcing Terra because the infrastructure layer shouldn’t be proprietary. The agent logic on top of it should be.

Second of three. Previously: Rune, long-term memory for AI agents. Next: multi-agent conversational AI.

Agents are processes#

Tools#

Context aging#

Multi-agent sessions#

Bring your own provider#

What we kept out#

Nitesh Kumar Niranjan

Related Reading

Rune - Long-Term Memory for AI Agents

RCE - Realtime Context Engine

Stateful Agent Orchestration - Human-Like Conversational AI