Skip to content

My AI Forgot Who I Am for the 47th Time. So I Built It a Memory Server.

Published:
8 min read

I have ADHD. I’ve mentioned this before. My AI agent also has ADHD — not by design, but because every MCP-compatible agent starts each session with the memory of a goldfish.

Every morning I’d open Claude Code and repeat myself. “I use TypeScript.” “Tests are in vitest.” “Don’t push to origin from the KB repo.” “My name is Nikita, not ‘the user.’” By the third week I was spending more time re-teaching my agent than actually coding.

Something had to give. So I built mnemon-mcp — a persistent memory server that gives any MCP client structured long-term recall. One SQLite file, zero cloud, nothing leaves your machine.

Here’s what I learned building it.

Four memory layers visualized as stacked geological strata

Table of contents

Open Table of contents

The Problem Nobody Talks About

Every AI agent framework has a memory story. Most of them are bad.

The standard approach: dump everything into a vector database and pray that cosine similarity finds the right context. Or worse — a flat JSON file that grows until the model’s context window chokes on it.

I tried three existing solutions before giving up:

SolutionWhat went wrong
Flat JSON memory800 entries, 60% irrelevant noise in every context load
Cloud memory service$19/month to store MY data on SOMEONE ELSE’s server
Vector-only search”Never push without tests” matched “unit tests for push notifications”

The fundamental issue: not all knowledge is the same kind of knowledge.

“I debugged auth on March 5” is an event — it should fade. “Never push without tests” is a rule — it should never fade. “My teammate’s name is Zhenya” is a fact — it should be stable until corrected. “Summary of Clean Code Chapter 3” is a reference — you pull it when you need it.

Dumping all four into one bucket and hoping search figures it out is like storing your diary, your address book, your shopping list, and your bookshelf in one pile on the floor. It works until it doesn’t. For me, it stopped working around entry 200.

The Insight: Memory Has Layers

The human brain doesn’t store everything the same way. Episodic memory (what happened), semantic memory (what you know), and procedural memory (how to do things) are distinct systems with different access patterns and decay rates.

I borrowed that model and added a fourth layer for reference material:

LayerWhat it storesHow it’s accessedLifetime
EpisodicEvents, sessions, journal entriesBy date or periodDecays (30-day half-life)
SemanticFacts, preferences, relationshipsBy topic or entityStable
ProceduralRules, workflows, conventionsLoaded at startupRarely changes
ResourceBook notes, reference materialOn demandDecays slowly (90 days)

This isn’t a new idea. Cognitive science has known this for decades. But somehow every AI memory system I found was either flat (one bucket) or graph-based (everything relates to everything, good luck searching).

What I Actually Built

mnemon-mcp is an MCP server. It speaks JSON-RPC over stdio. Any MCP-compatible client — OpenClaw, Claude Code, Cursor, Windsurf — connects to it and gets 7 tools:

ToolWhat it does
memory_addStore a memory with layer, entity, confidence, importance
memory_searchFull-text search with filters by layer, entity, date, scope
memory_updateUpdate in-place or create a versioned replacement
memory_deleteDelete a memory; re-activates predecessor if part of a chain
memory_inspectLayer statistics or single memory history trace
memory_exportExport to JSON, Markdown, or Claude-md
memory_healthDiagnostics and optional garbage collection

The backend is SQLite with FTS5. No Postgres. No Redis. No Docker. One file at ~/.mnemon-mcp/memory.db that you can back up by copying it.

npm install -g mnemon-mcp

That’s the whole setup. I spent three months building it so you could spend 10 seconds installing it. The ROI math doesn’t work out, but I have ADHD — we don’t do ROI math.

Fact Versioning

Knowledge changes. Your team migrated from React 17 to React 19. You don’t want to delete “team uses React 17” — that might be useful context later. You want to chain them:

v1: "Team uses React 17"  →  superseded_by: v2
v2: "Team uses React 19"  →  supersedes: v1 (active)

Search returns only the latest version. memory_inspect reveals the full chain. memory_delete re-activates the predecessor. Nothing is lost.

This turns out to be important more often than you’d think. An agent correcting a fact isn’t the same as an agent deleting one.

Stemming: Because Languages Are Hard

I write code in English and everything else in Russian. So the search engine needed to handle both.

Snowball stemmer at both index time and query time: "running" matches "runs", and "книги" matches "книга". Stop words filtered in both languages.

Getting Russian morphology right in FTS5 was one of those problems that sounds trivial and isn’t. Russian has 6 grammatical cases, 3 genders, and diminutive forms that change the stem entirely. Snowball handles 90% of it. The other 10% is why I drink tea at 2 AM on Phangan while staring at a regex.

The Tuning Saga: 36.9 → 70.5

I built an eval framework with 50 golden test cases — real queries against real memories. Measured Recall@5, MRR, and nDCG@5.

First score: 36.9 out of 100.

That’s not “needs improvement.” That’s “your search engine is actively guessing.”

Search quality improvement: 36.9 → 70.5 across 4 optimization waves

ChangeImpact
AND → OR fallback when AND returns too few results+8 pts
Decay only for episodic/resource (not semantic/procedural)+5 pts
Importance weighting: 0.3 + 0.7 × importance+4 pts
Stop words: removed “серия” forms killing habit queries+3 pts
Hyphen tokenization: “рэп-архив” → two tokens+2 pts
Stem prefix minimum: 3→2 chars (fixes “Юле”→“юл”)+2 pts
Progressive AND relaxation: top-3 longest stems first+1.5 pts

Final score: 70.5 out of 100. Recall@5 went from 0.390 to 0.780 — doubled.

The remaining 9 failures are mostly temporal queries (“what happened last week?”) that need date-aware search I haven’t built yet. PRs welcome.

What FTS5 Taught Me

Every one of these was counterintuitive:

BM25 scores are corpus-dependent. When I deleted superseded entries from the index, the remaining entries’ scores shifted because the statistical background changed. So I kept superseded entries in the FTS index as “dead” documents for stability. My search index intentionally contains stale data. This is correct.

OR is a terrible default. AND first, OR as supplement when AND returns too few results, at a 0.9x score penalty. Three rewrites to learn what sounds obvious in retrospect.

Access count = popularity bias. My first version boosted frequently accessed memories. On a single-user KB, that creates a feedback loop — popular memories get more popular. Removed it.

Decay is layer-dependent. Applying time decay to “never push without tests” killed factual recall. Decay applies to events and references. Facts and rules don’t expire because Tuesday was two weeks ago.

What I Got Wrong

1. I built search before import. The import pipeline was an afterthought. It should have been designed first — the shape of your data determines your search quality. Rebuilt it twice.

2. I ignored snippets. FTS5 has a snippet() function for highlighted results. But since I index stemmed content, snippets return stems instead of words. “книг” instead of “книги”. Shipped it broken. It haunts me.

3. I over-engineered scoring. First version: frequency boosts, recency bonuses, confidence multiplier. Final version: bm25 × (0.3 + 0.7 × importance) × decay(layer). Simpler is always better. Every time.

4. 268 memories is not 10,000. My eval results look good at current scale. I have no idea how this performs at 10K entries. If you import 10K memories and everything breaks, I want to know.

How It Compares

mnemon-mcpmem0basic-memory
ArchitectureSQLite FTS5Cloud + QdrantMarkdown + vector
Memory structure4 typed layersFlatFlat
Fact versioningSuperseding chainsPartialNo
StemmingEN + RUEN onlyEN only
Cloud requiredNoYesNo
CostFree$19–249/moFree
Setupnpm install -gDocker + API keyspip + deps

Try It

npm install -g mnemon-mcp

Add to your MCP client config:

{
  "mcpServers": {
    "mnemon-mcp": {
      "command": "mnemon-mcp"
    }
  }
}

Your agent now remembers.

MIT licensed. 4 production dependencies. 182 tests. Works everywhere Node 22+ runs.

If you use it, break it, or hate it — open an issue. The best bug reports come from people who actually needed the thing to work.


This post was written in Claude Code, which uses mnemon-mcp as its memory server. The agent that wrote it remembered my writing style, my ADHD references, and the fact that I live on Phangan — without being told. That’s the whole point.


Edit on GitHub