Claude doesn't remember you
Open a new Claude conversation. No history. No context. No idea what you were working on yesterday.
This is a deliberate design choice, not an oversight. Anthropic ships Claude as a stateless model. Every conversation is fresh. If you want persistence, you build it.
Most developers hit this wall when they're embedding Claude in something real -- an agent, a workflow, an assistant with custom behavior. The one-chat demo is fine. The multi-session production system is where it falls apart.
Here's what the options actually are.
Option 1: Context stuffing
The simplest thing that works: load relevant information into the system prompt at the start of each conversation.
const systemPrompt = `
You are a coding assistant for the Acme billing service.
Key context:
- The billing service uses an outbox pattern. Never publish events inside a request transaction.
- All external HTTP calls go through /lib/http/client.ts. Do not use fetch or axios directly.
- The tenancy boundary is org_id, not workspace_id.
User preferences:
- Prefers TypeScript over JavaScript
- Wants brief explanations, not verbose walkthroughs
`;
const response = await anthropic.messages.create({
model: "claude-opus-4-5",
system: systemPrompt,
messages: [{ role: "user", content: userMessage }]
});
This works fine for static, stable context. Architecture conventions, coding preferences, fixed project facts.
It breaks down when context is large, when it changes frequently, or when different conversations need different subsets of what you know. Stuffing 20,000 tokens of context into every prompt is expensive and degrades response quality as the model pays more attention to recent tokens.
Option 2: Rolling conversation history
Store conversation turns in a database. On each new message, inject the last N turns as prior messages.
async function chat(userId: string, userMessage: string) {
const history = await db.getRecentTurns(userId, 20);
const response = await anthropic.messages.create({
model: "claude-opus-4-5",
system: systemPrompt,
messages: [
...history,
{ role: "user", content: userMessage }
]
});
await db.saveTurn(userId, userMessage, response.content[0].text);
return response;
}
This gives conversational continuity within a session and partial continuity across sessions. It's not real memory -- it's replaying transcripts. As history grows, you hit token limits and cost problems. Old turns get truncated. The model sees recent conversation but not things said three weeks ago.
Useful as a layer. Not sufficient on its own.
Option 3: MCP memory servers
The Model Context Protocol lets you connect tools to Claude that it can call during a conversation. A memory server exposes remember and recall tools, and Claude can invoke them at appropriate moments.
{
"mcpServers": {
"memory": {
"command": "npx",
"args": ["-y", "@memory-crystal/mcp-server"],
"env": {
"MC_API_KEY": "your-api-key"
}
}
}
}
With this configured, Claude can call recall("billing service conventions") mid-conversation and get back relevant stored memories. It can call remember("user prefers short explanations") to save preferences for later.
The advantage over context stuffing: the memory store grows without bound but only relevant chunks get injected. The model retrieves what it needs rather than receiving everything upfront.
The limitation: Claude has to decide when to call memory tools. In practice, it doesn't always do this reliably without careful prompting. You can work around it by doing recall yourself in your wrapper code and injecting results into the conversation.
Option 4: Programmatic memory with API injection
This is the pattern that gives you the most control. You manage the memory store directly. Claude never sees the memory tools -- it just receives well-prepared context.
import MemoryCrystal from "@memory-crystal/sdk";
const mc = new MemoryCrystal({ apiKey: process.env.MC_API_KEY });
async function chat(userId: string, userMessage: string) {
// Pull relevant memories for this message
const memories = await mc.recall({
query: userMessage,
tags: [userId],
limit: 6
});
// Build context block
const memoryContext = memories.length > 0
? `Relevant context:\n${memories.map(m => `- ${m.content}`).join("\n")}\n\n`
: "";
// Send to Claude with injected memory
const response = await anthropic.messages.create({
model: "claude-opus-4-5",
system: baseSystemPrompt + memoryContext,
messages: [{ role: "user", content: userMessage }]
});
const reply = response.content[0].text;
// Save anything worth keeping from this turn
await saveIfImportant(userId, userMessage, reply);
return reply;
}
For the saveIfImportant function, you can either run heuristics (did the user express a preference? did a decision get made?) or run a lightweight LLM call to extract structured memories from the conversation.
How Memory Crystal fits in
Memory Crystal is a purpose-built memory backend for this pattern. It separates episodic memory (what happened, when) from semantic memory (stable facts and preferences), with a retrieval pipeline that scores by semantic similarity, recency, salience, and entity graph proximity.
That retrieval scoring is the part that's annoying to build yourself. Simple vector similarity returns "sort of related" results. Memory Crystal returns "relevant to this specific conversation right now" results.
// Save a user preference
await mc.remember({
store: "semantic",
category: "fact",
title: "User prefers concise explanations",
content: "When explaining code or architecture, keep it brief. No verbose walkthroughs unless explicitly asked.",
tags: [userId, "preference"]
});
// Save a decision with full context
await mc.remember({
store: "episodic",
category: "decision",
title: "Chose Redis for session cache",
content: "Evaluated Redis vs Memcached. Chose Redis for its persistence options and richer data types. Memcached considered but ruled out due to lack of replication support.",
tags: [userId, "architecture", "session-cache"]
});
On next session, mc.recall({ query: "session caching approach", tags: [userId] }) pulls that decision back with full context.
Tradeoff summary
| Approach | Pros | Cons | |----------|------|------| | Context stuffing | Simple, predictable | Doesn't scale, expensive, static | | Rolling history | Conversational continuity | Token limits, no cross-session durability | | MCP server | Claude can self-manage memory | Relies on model to call tools correctly | | Programmatic injection | Full control, best reliability | More code to write |
For most production use cases, you want programmatic injection backed by a real memory store. Context stuffing as a layer for truly static facts. Rolling history for recent conversational context. MCP as an optional enhancement on top.
What to store per conversation
You can't store everything or retrieval degrades. Focus on:
- Explicit user preferences ("I prefer X over Y")
- Decisions made with reasoning ("we chose X because...")
- Project facts the model would otherwise rediscover each time
- Task outcomes worth knowing next session
Skip: casual conversation, questions that got answered in-context, things that are easily derivable from the codebase.
Practical checklist
- Decide which memory approach fits your use case (most need programmatic injection)
- Set up memory capture on conversation end or at identified save points
- Scope memories by user ID or session to avoid cross-contamination
- Cap retrieval injection to 6-10 memories per turn
- Add TTLs to volatile memories (preferences change, facts go stale)
- Test retrieval precision with probe questions: "what are my code style preferences?"
- Monitor for stale memory surfacing and tune importance thresholds
Claude with well-designed persistent memory behaves noticeably differently. It doesn't re-ask setup questions. It remembers how you like things done. It builds on prior context instead of ignoring it.
That's the gap between a demo and a real assistant.