My AI Forgot Who I Am for the 47th Time. So I Built It a Memory Server.

I have ADHD. I’ve mentioned this before. My AI agent also has ADHD — not by design, but because every MCP-compatible agent starts each session with the memory of a goldfish.

Every morning I’d open Claude Code and repeat myself. “I use TypeScript.” “Tests are in vitest.” “Don’t push to origin from the KB repo.” “My name is Nikita, not ‘the user.’” By the third week I was spending more time re-teaching my agent than actually coding.

Something had to give. So I built mnemon-mcp — a persistent memory server that gives any MCP client structured long-term recall. One SQLite file, zero cloud, nothing leaves your machine.

Here’s what I learned building it.

Four memory layers visualized as stacked geological strata

Open Table of contents

The Problem Nobody Talks About
The Insight: Memory Has Layers
What I Actually Built
- Fact Versioning
- Stemming: Because Languages Are Hard
The Tuning Saga: 36.9 → 70.5
- What FTS5 Taught Me
What I Got Wrong
How It Compares
Try It

The Problem Nobody Talks About

Every AI agent framework has a memory story. Most of them are bad.

The standard approach: dump everything into a vector database and pray that cosine similarity finds the right context. Or worse — a flat JSON file that grows until the model’s context window chokes on it.

I tried three existing solutions before giving up:

Solution	What went wrong
Flat JSON memory	800 entries, 60% irrelevant noise in every context load
Cloud memory service	$19/month to store MY data on SOMEONE ELSE’s server
Vector-only search	”Never push without tests” matched “unit tests for push notifications”

The fundamental issue: not all knowledge is the same kind of knowledge.

“I debugged auth on March 5” is an event — it should fade. “Never push without tests” is a rule — it should never fade. “My teammate’s name is Zhenya” is a fact — it should be stable until corrected. “Summary of Clean Code Chapter 3” is a reference — you pull it when you need it.

Dumping all four into one bucket and hoping search figures it out is like storing your diary, your address book, your shopping list, and your bookshelf in one pile on the floor. It works until it doesn’t. For me, it stopped working around entry 200.

The Insight: Memory Has Layers

The human brain doesn’t store everything the same way. Episodic memory (what happened), semantic memory (what you know), and procedural memory (how to do things) are distinct systems with different access patterns and decay rates.

I borrowed that model and added a fourth layer for reference material:

Layer	What it stores	How it’s accessed	Lifetime
Episodic	Events, sessions, journal entries	By date or period	Decays (30-day half-life)
Semantic	Facts, preferences, relationships	By topic or entity	Stable
Procedural	Rules, workflows, conventions	Loaded at startup	Rarely changes
Resource	Book notes, reference material	On demand	Decays slowly (90 days)

This isn’t a new idea. Cognitive science has known this for decades. But somehow every AI memory system I found was either flat (one bucket) or graph-based (everything relates to everything, good luck searching).

What I Actually Built

mnemon-mcp is an MCP server. It speaks JSON-RPC over stdio. Any MCP-compatible client — OpenClaw, Claude Code, Cursor, Windsurf — connects to it and gets 7 tools:

Tool	What it does
`memory_add`	Store a memory with layer, entity, confidence, importance
`memory_search`	Full-text search with filters by layer, entity, date, scope
`memory_update`	Update in-place or create a versioned replacement
`memory_delete`	Delete a memory; re-activates predecessor if part of a chain
`memory_inspect`	Layer statistics or single memory history trace
`memory_export`	Export to JSON, Markdown, or Claude-md
`memory_health`	Diagnostics and optional garbage collection

The backend is SQLite with FTS5. No Postgres. No Redis. No Docker. One file at ~/.mnemon-mcp/memory.db that you can back up by copying it.

npm install -g mnemon-mcp

That’s the whole setup. I spent three months building it so you could spend 10 seconds installing it. The ROI math doesn’t work out, but I have ADHD — we don’t do ROI math.

Fact Versioning

Knowledge changes. Your team migrated from React 17 to React 19. You don’t want to delete “team uses React 17” — that might be useful context later. You want to chain them:

v1: "Team uses React 17"  →  superseded_by: v2
v2: "Team uses React 19"  →  supersedes: v1 (active)

Search returns only the latest version. memory_inspect reveals the full chain. memory_delete re-activates the predecessor. Nothing is lost.

This turns out to be important more often than you’d think. An agent correcting a fact isn’t the same as an agent deleting one.

Stemming: Because Languages Are Hard

I write code in English and everything else in Russian. So the search engine needed to handle both.

Snowball stemmer at both index time and query time: "running" matches "runs", and "книги" matches "книга". Stop words filtered in both languages.

Getting Russian morphology right in FTS5 was one of those problems that sounds trivial and isn’t. Russian has 6 grammatical cases, 3 genders, and diminutive forms that change the stem entirely. Snowball handles 90% of it. The other 10% is why I drink tea at 2 AM on Phangan while staring at a regex.

The Tuning Saga: 36.9 → 70.5

I built an eval framework with 50 golden test cases — real queries against real memories. Measured Recall@5, MRR, and nDCG@5.

First score: 36.9 out of 100.

That’s not “needs improvement.” That’s “your search engine is actively guessing.”

Search quality improvement: 36.9 → 70.5 across 4 optimization waves

Change	Impact
AND → OR fallback when AND returns too few results	+8 pts
Decay only for episodic/resource (not semantic/procedural)	+5 pts
Importance weighting: `0.3 + 0.7 × importance`	+4 pts
Stop words: removed “серия” forms killing habit queries	+3 pts
Hyphen tokenization: “рэп-архив” → two tokens	+2 pts
Stem prefix minimum: 3→2 chars (fixes “Юле”→“юл”)	+2 pts
Progressive AND relaxation: top-3 longest stems first	+1.5 pts

Final score: 70.5 out of 100. Recall@5 went from 0.390 to 0.780 — doubled.

The remaining 9 failures are mostly temporal queries (“what happened last week?”) that need date-aware search I haven’t built yet. PRs welcome.

What FTS5 Taught Me

Every one of these was counterintuitive:

BM25 scores are corpus-dependent. When I deleted superseded entries from the index, the remaining entries’ scores shifted because the statistical background changed. So I kept superseded entries in the FTS index as “dead” documents for stability. My search index intentionally contains stale data. This is correct.

OR is a terrible default. AND first, OR as supplement when AND returns too few results, at a 0.9x score penalty. Three rewrites to learn what sounds obvious in retrospect.

Access count = popularity bias. My first version boosted frequently accessed memories. On a single-user KB, that creates a feedback loop — popular memories get more popular. Removed it.

Decay is layer-dependent. Applying time decay to “never push without tests” killed factual recall. Decay applies to events and references. Facts and rules don’t expire because Tuesday was two weeks ago.

What I Got Wrong

1. I built search before import. The import pipeline was an afterthought. It should have been designed first — the shape of your data determines your search quality. Rebuilt it twice.

2. I ignored snippets. FTS5 has a snippet() function for highlighted results. But since I index stemmed content, snippets return stems instead of words. “книг” instead of “книги”. Shipped it broken. It haunts me.

3. I over-engineered scoring. First version: frequency boosts, recency bonuses, confidence multiplier. Final version: bm25 × (0.3 + 0.7 × importance) × decay(layer). Simpler is always better. Every time.

4. 268 memories is not 10,000. My eval results look good at current scale. I have no idea how this performs at 10K entries. If you import 10K memories and everything breaks, I want to know.

How It Compares

	mnemon-mcp	mem0	basic-memory
Architecture	SQLite FTS5	Cloud + Qdrant	Markdown + vector
Memory structure	4 typed layers	Flat	Flat
Fact versioning	Superseding chains	Partial	No
Stemming	EN + RU	EN only	EN only
Cloud required	No	Yes	No
Cost	Free	$19–249/mo	Free
Setup	`npm install -g`	Docker + API keys	pip + deps

Try It

npm install -g mnemon-mcp

Add to your MCP client config:

{
  "mcpServers": {
    "mnemon-mcp": {
      "command": "mnemon-mcp"
    }
  }
}

Your agent now remembers.

Landing page: aisatisfy.me/mnemon
npm: mnemon-mcp on npm
GitHub: nikitacometa/mnemon-mcp

MIT licensed. 4 production dependencies. 182 tests. Works everywhere Node 22+ runs.

If you use it, break it, or hate it — open an issue. The best bug reports come from people who actually needed the thing to work.

This post was written in Claude Code, which uses mnemon-mcp as its memory server. The agent that wrote it remembered my writing style, my ADHD references, and the fact that I live on Phangan — without being told. That’s the whole point.