Three-Tier Memory: How Agentica Keeps Agents Grounded Across Sessions
The question "does the agent remember?" turns out to have at least three distinct answers depending on what you mean by remember. Remembering what was said three messages ago is a different problem from remembering what was discussed last week, which is a different problem again from remembering that a particular user always prefers numerical summaries over narrative ones. Conflating these three memory horizons into a single mechanism is why most production agent deployments feel brittle.
Agentica uses a three-tier architecture that gives each memory horizon its own storage substrate, access pattern, and lifecycle. Understanding why each tier exists — and what breaks without it — is the starting point for building agents that don't degrade over long-horizon tasks.
L1: Short-Term Thread State
L1 memory is the agent's working memory: the current conversation thread, the messages exchanged so far, the intermediate results of tool calls, and any ephemeral state that needs to survive across turns within a single session. This lives in LangGraph's checkpoint layer, backed by Postgres in Agentica's deployment.
The critical property of L1 memory is atomic persistence. Every turn is checkpointed before the response is returned to the user. If the process crashes mid-turn, the state is recoverable. This sounds obvious but is frequently omitted in prototype systems, leading to state corruption that's hard to diagnose and impossible to reproduce reliably.
L1 memory also has a size problem that most implementations discover too late. LLM context windows are finite. A naive implementation that feeds the entire conversation history into every prompt works fine for short sessions but becomes expensive and eventually impossible for long-running threads. The standard solution — truncating old messages — loses important context. Agentica's implementation maintains a message window cap with a sanitization pass that strips large metadata payloads (response metadata from LLM providers can easily add 40-50KB per message) before checkpointing.
L2: Episodic Memory
L2 memory handles what happened in previous sessions. The naive approach is to simply prepend a summary of previous conversations to the current prompt. This works for very short interaction histories but fails at scale: summaries become long, and long summaries increase latency and cost while still losing the specific details that often matter most.
Agentica's L2 implementation uses rolling context compression. After each session, a summarization agent processes the full thread and produces a structured brief — not a prose summary, but a typed data structure capturing decisions made, facts established, open questions, and user preferences observed. When a new session starts, the relevant briefs are retrieved and injected as a compact context preamble.
The key design insight is that episodic memory should be structured, not narrative. A structured brief is searchable, filterable, and composable. Multiple briefs from different past sessions can be merged algorithmically without redundancy. Narrative summaries don't have these properties.
L3: Semantic Memory
L3 memory is the agent's long-term knowledge base: facts extracted across all sessions, indexed by semantic similarity, and retrievable across any conversation. This is where learned user preferences, established domain facts, and cross-session insights live.
The implementation challenge for L3 is extraction quality. Not everything that happens in a conversation is worth persisting to long-term memory. Persisting too aggressively pollutes the memory store with noise; persisting too conservatively loses the accumulated knowledge that makes long-horizon collaboration valuable. Agentica uses a post-session extraction agent that scores candidate facts on novelty, generalizability, and confidence before writing to the L3 store.
L3 retrieval uses the same hybrid approach as document retrieval: dense similarity for conceptual matches, sparse for specific identifiers. The retrieval query is constructed from the current session context, pulling relevant long-term memories into the working prompt without requiring the user to re-establish context that was already established in a previous session.
The Lifecycle of a Memory
A complete picture of memory in Agentica looks like this: a message arrives and is appended to L1 state (checkpointed). At the end of a session, the thread is summarized into an L2 episodic brief and key facts are extracted to L3. When a new session starts, L2 and L3 are queried to populate the context preamble, and L1 is initialized fresh for the new thread.
This architecture means an agent working with a user over months accumulates genuine institutional knowledge — not just a longer and longer conversation history, but structured, retrievable understanding of that user's context, preferences, and ongoing work. That's the difference between a chatbot that resets with every session and a genuine AI collaborator.
Deploy Strategic Intelligence
Schedule a technical briefing on multi-agent deployment patterns.
Similar Research
View All LogsLangGraph in Production: State Management Patterns We Learned the Hard Way
LangGraph's checkpoint system is powerful but has real footguns. After running thousands of production conversations, here are the state management patterns that matter — and the ones that will silently corrupt your agent's context.
When to Use an Agent vs. a Single LLM Call
Agents add latency, cost, and complexity. They're not the right tool for every problem. Here's a decision framework for when multi-step agentic reasoning genuinely outperforms a well-crafted single prompt — and when it doesn't.