The Agent Memory Problem — And How We Started Solving It

Every powerful technology has a quiet flaw that practitioners discover before everyone else does.

With relational databases it was the impedance mismatch — the painful gap between how developers thought about objects and how databases stored rows. With microservices it was distributed state — systems that worked beautifully in isolation fell apart the moment they needed to share information. With early machine learning it was data drift — models that performed well at launch quietly degraded as the world changed around them.

With AI agents, the quiet flaw is memory.

Or more precisely, the lack of it.

What memory means for an agent

An agent, at its core, is a reasoning loop. It perceives a situation, decides on an action, executes it, observes the result, and decides what to do next. This loop can be remarkably capable. Modern agents can browse the web, write and execute code, query databases, draft documents, coordinate with other agents. The reasoning within a single session has gotten genuinely impressive.

But the loop is stateless across sessions. When the conversation ends, the context window evaporates. Everything the agent learned, inferred, and decided during that session — gone. The next session starts from nothing.

This is fine for one-shot tasks. Ask an agent to summarize a document, it does it, session ends. No memory needed.

It's a serious problem for anything ongoing. A research agent that built a deep understanding of a topic over a long session has to rebuild that understanding from scratch next time. An orchestration agent that successfully executed a complex multi-step workflow has no recollection of how it did it. A user who established important context — preferences, constraints, prior decisions — has to re-establish it every single session.

The agent isn't stupid. It's amnesiac.

And unlike human amnesia, which is relatively rare, agent amnesia is universal and by default. Every agent, in every framework, in every deployment, forgets when the session ends — unless you build something deliberate to prevent it.

How the problem became visible in early 2024

In early 2024, the teams building production conversational AI systems were starting to hit this wall in ways that couldn't be ignored.

The context window had gotten longer — models that could handle 8K tokens were being replaced by models handling 32K, then 100K and beyond. For a while, longer context felt like the answer. If the window is big enough, you don't need memory — just keep everything in context.

That reasoning breaks down quickly in practice.

Longer context windows are expensive. Injecting an entire conversation history on every call is cost-prohibitive at scale. More importantly, longer context doesn't solve the cross-session problem — the window still resets. And there's a subtler issue: models don't attend to all parts of a long context equally. Information buried in the middle of a 100K token context is retrieved less reliably than information near the beginning or end. The "lost in the middle" problem is real and measurable.

So longer context windows were a palliative, not a cure. The underlying problem — agents need a durable, structured way to remember things across sessions, at scale, without destroying cost efficiency — remained unsolved.

Retrieval augmented generation helped for knowledge access. You could put documents in a vector store and retrieve the relevant chunks. But RAG addresses a different problem: accessing external knowledge. It doesn't address the episodic continuity problem — the agent's own history, its prior decisions, what it learned about a specific user or task over time.

The more carefully you looked at production agent deployments in early 2024, the clearer it became: memory wasn't a feature to add later. It was a foundational architectural problem that nobody had fully solved.

The work that followed

That recognition led to serious engineering effort.

At AWS, working on the infrastructure that powers Amazon Bedrock and Amazon Lex — systems handling conversational AI workloads at a scale most organizations won't encounter for years — our team was thinking hard about this problem. Not abstractly, but concretely: what does it actually take to generate and maintain long-term memory for orchestration agent sessions? What needs to be stored? In what form? When should it be generated — at session end, incrementally during the session, asynchronously? How do you retrieve it efficiently without blowing up latency or cost? How do you handle the case where new information contradicts what was previously remembered?

These questions don't have obvious answers. The more you dig into them, the more you realize that memory for agents isn't a database design problem — it's a systems problem with components that interact in non-obvious ways.

In September 2024, that work culminated in a patent application filed with the United States Patent and Trademark Office. Application number 18/901,452. Assigned to Amazon Technologies, Inc. I was one of sixteen named inventors on it.

The title: "Generating Long-Term Memory for Orchestration Agent Sessions."

I'm not sharing this to establish credentials. I'm sharing it because the work behind that filing — the thinking, the design decisions, the failure modes we tried to anticipate — is directly relevant to what any team building production agent systems needs to understand today. And because the patent filing captures something important: as of late 2024, this problem was serious enough that a major cloud provider was investing significant engineering and IP effort into solving it.

That's a signal worth paying attention to.

Where the field stands now

The patent represents one approach — a production-oriented, enterprise-scale approach to a specific slice of the memory problem. But the field has continued to evolve, and the thinking has gotten richer.

There are now distinct schools of thought on how agent memory should work. Some treat it as a retrieval problem — better vector stores, better embedding models, better chunking strategies. Some treat it as a context engineering problem — smarter summarization, hierarchical compression, selective retention. Some, most interestingly, treat it as a cognitive architecture problem — asking not just how to store and retrieve memories, but how agents should decide what to remember, when to consolidate, and how to reason about the reliability of their own recollections.

Each approach has merits. Each has failure modes. And in production, the right answer is almost always a combination of multiple approaches, carefully layered.

Next issue: the complete map

Next week, we go deep on the full landscape of agent memory — every type, every tradeoff, and where the frontier is heading.

In-context and working memory. Episodic memory across sessions. Semantic memory and consolidation. Procedural memory and learned workflows. Embedding-based vector retrieval and its real limitations. Graph-based associative memory. And the most compelling emerging architecture in this space right now — one that proposes treating memory not as a storage problem but as a learned cognitive skill, implemented as a dedicated small model whose only job is to decide what's worth remembering and for how long.

It's the single most comprehensive treatment of agent memory I've put together. And it's grounded in something most newsletter writers on this topic don't have — direct experience building production memory systems, and documented, time-stamped thinking about this problem from before the current wave of hype arrived.

The agent amnesia problem is solvable. Next week, we'll show you how.

Until next time,

Learn to use AI. Use AI to learn.

Subscribe at whattheagent.com. Next week's issue on agent memory is the one to forward to your team.

👀 Also Watching

The "lost in the middle" research — the empirical case that longer context windows don't eliminate the need for structured memory. The data is more compelling than most people realize.
Amazon Bedrock's session management evolution — how the production implementation has developed alongside the underlying research.
The emerging category of "memory-as-a-service" — mem0, Zep, Letta all betting that memory is infrastructure, not a feature. Worth watching which model wins.

The Agent Memory Problem — And How We Started Solving It

Keep Reading

Santosh Ameti's Agentic AI Newsletter