0
forbidden leaks
Zero cross-tenant leaks
Every scoped-privacy case returned the right memory and only the right memory. Workspaces, peers, and projects stay isolated.
[ BENCHMARKS ]
Memory Crystal scored 100/100 on a weighted memory benchmark that scores recall, contradictions, preferences, and scoped privacy on the same scale. Every case passed. Nothing leaked across tenants. The harness, the fixtures, and the artifact are all in the open.
[ WHY IT WINS ]
0
forbidden leaks
Every scoped-privacy case returned the right memory and only the right memory. Workspaces, peers, and projects stay isolated.
9.52ms
p95 retrieval
Retrieval is fast enough to drop in front of every model call without changing the user's perception of latency.
5/5
contradiction cases
When a fact changes, the new one wins — every time. No stale priorities, no expired deadlines bleeding into the answer.
100
/100 weighted score
Scope, provenance, and recall ranking are first-class. Drop Memory Crystal in front of any model and keep your stack.
[ HEAD TO HEAD ]
Memory Crystal scores come from this repo's harness. Competitor scores are pulled directly from each vendor's own published benchmarks — linked, dated, and unedited.
| System | Headline score | Detail | Source |
|---|---|---|---|
Memory Crystal | 100/100 | Recall@1 93% · Recall@3 100% · p95 9.52ms · 0 cross-scope leaks | Verified · open artifact |
| Mem0 | LoCoMo 91.6 – 92.5 | LongMemEval 93.4 – 94.4 · BEAM 1M 64.1 · BEAM 10M 48.6 | Vendor-published ↗ |
| Zep | DMR 94.8% | LongMemEval-S 71.2% with GPT-4o · DMR is not LoCoMo | Paper + vendor blog ↗ |
| Letta | LoCoMo 74.0% | Filesystem agent + GPT-4o-mini · agent-runtime, not service recall | Vendor research blog ↗ |
| Pinecone Assistant | No comparable score | Publishes RAG evaluation APIs, not persistent memory benchmarks | Adjacent category ↗ |
LoCoMo, LongMemEval, DMR, and BEAM use different scoring rubrics and different conversation corpora. The fairest comparison is platform-by-platform, not benchmark-by-benchmark — which is exactly why we publish the harness instead of a single cherry-picked number.
[ TRACK BY TRACK ]
The benchmark splits memory into six weighted tracks. Memory Crystal cleared every one without dropping a case and without leaking a scoped memory.
Single-hop, multi-hop, temporal, and long-session QA over replayed conversations.
User, project, team, and domain facts pulled across fresh sessions and compactions.
Style, workflow, tool, and communication preferences applied — not just recited.
Newer facts supersede older facts with provenance preserved end-to-end.
Tenant, workspace, channel, peer, and project boundaries enforced on every read.
Quality stays intact while p50 and p95 stay production-usable.
[ LATENCY ]
Memory Crystal's retrieval path is a vector index with scoped filters, not a multi-step agent. p50 lands around five milliseconds and p95 stays under ten — well below the threshold where users notice the round trip.
Most agent-memory platforms publish quality scores without latency. We publish both, because a perfect recall score behind a slow API is not production memory.
p50
5.48ms
median recall latency
p95
9.52ms
95th percentile
Recall @ 1
93%
first result is the right one
Recall @ 3
100%
target in the top three
[ TRANSPARENCY ]
Every Memory Crystal number on this page came from the same harness, the same fixtures, and the same model setup. The artifact JSON is in the repo.
Competitor numbers are vendor-published. We label every row with its source so you can click through, read the methodology, and judge for yourself.
We do not paint over benchmarks we have not reproduced. When we run the same harness against another platform, the row updates — until then it stays clearly attributed.
Scope and privacy are scored the same way as recall. A platform that leaks across tenants does not get to claim a high memory score.
[ COMPETITOR CLAIMS ]
Every claim below is sourced directly from the vendor. We keep them visible — even when the score is impressive — because that is what an honest comparison looks like.
LoCoMo / LongMemEval / BEAM
LoCoMo 91.6; LongMemEval 93.4; BEAM 1M 64.1; BEAM 10M 48.6
Public claim only; not reproduced by the Memory Crystal harness.
Read the source ↗LoCoMo / LongMemEval / BEAM
LoCoMo 92.5; LongMemEval 94.4; BEAM 1M/10M 64.1/48.6
Public claim only; cite exact page date because Mem0 benchmark pages have changed over time.
Read the source ↗Deep Memory Retrieval
94.8%
Public claim only; DMR is not the same benchmark as LoCoMo, LongMemEval, or BEAM.
Read the source ↗LongMemEval-S
71.2%
Public claim only; Memory Crystal should not compare directly until the same harness is run.
Read the source ↗LoCoMo
74.0%
Useful baseline showing LoCoMo can reward retrieval/tool setup; not a direct service-vs-service reproduction.
Read the source ↗Context-Bench
Adjacent benchmark; no direct Memory Crystal comparison
Mention as adjacent, not a Memory Crystal competitor score.
Read the source ↗Assistant evaluation API
No comparable public LoCoMo / LongMemEval / BEAM score found
Do not imply underperformance; category is assistant/RAG evaluation rather than persistent personal memory.
Read the source ↗[ READY? ]
100/100 on the benchmark, sub-10ms p95, zero cross-tenant leaks, and a published artifact you can audit line by line. Memory Crystal drops in front of any model and gives your agent a real long-term memory.
Run mc-seeded-20260525T132140Z · git 1da070f · fixture 03dee778cb · 30 cases