From 3,361 Fragments to 39 Wiki Pages: Replacing AI Memory Soup with a Karpathy-Style Knowledge Base

The Incident That Broke the Model

In April 2026, our persistent AI agent AEGIS fabricated an entire organizational structure during a routine chat query. It invented repos, teams, and SLAs that didn't exist — confidently, with specific names and numbers. We call this the Chimera incident.

The root cause wasn't a model problem. It was a memory architecture problem.

AEGIS had accumulated 3,361 memory fragments over months of operation — facts, observations, decisions, cross-domain synthesis outputs, all stored as flat key-value entries in a semantic memory worker. The agent had no way to distinguish an authoritative decision from an episodic observation from a stale dream. When asked about organizational structure, it stitched together fragments from different contexts into a plausible-sounding fabrication.

Fragment-based memory scales linearly in storage but sub-linearly in value. Noise grows faster than signal. At 3,361 entries, the noise won.

The Karpathy Pattern

Andrej Karpathy published a gist proposing that instead of stateless RAG over raw fragments, LLMs should maintain a persistent, interlinked wiki. Three operations: ingest (process new sources into page updates), query (read pages + synthesize), lint (scan for contradictions and staleness).

"The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass."

This was exactly our problem. We didn't need better retrieval — we needed better curation.

Three-Tier Architecture

We designed a three-tier system where each layer has exactly one job:

Raw sources (immutable, machine-written)
    D1 tables, feed ingest, commits, chat logs, session digests
         ↓ (PRISM synthesis — scheduled pass)
Wiki (curated, LLM-maintained, human-readable)
    EmDash CMS pages with structured frontmatter
         ↓ (content:afterSave triggers embed)
Embedding index (semantic + keyword search)
    Memory worker — reduced from "store" to "search index"

Raw sources are operational event logs. Not knowledge. Not editable by the LLM.

Wiki is the curated knowledge substrate. Every durable fact lives here. One canonical page per topic. Each page declares its sources, related pages, confidence level, and last verification date.

Index exists so agents can do semantic recall without scanning every wiki page. It never holds content the wiki doesn't hold.

The wiki runs on EmDash CMS (our open-source Cloudflare Workers CMS), connected to AEGIS via service binding. Every write routes through a plugin pipeline — content:beforeSave validates schema and checks for contradictions, content:afterSave triggers embedding updates.

The Migration: 6 Phases in One Session

Phase 1–2: Audit and Clustering

We exported all 3,361 fragments via NotebookLM and clustered them:

247 total clusters from 1,094 non-archived fragments
61 clusters marked for synthesis (enough content to produce a wiki page)
140 clusters marked source-only (preserved as citations, not promoted)
46 clusters dropped with justification

Phase 3: New Tool Surface

| Old | New | Change |
|-----|-----|--------|
| `aegis_memory` (read) | `wiki_read`, `wiki_search` | Split read into page fetch + search |
| `aegis_record_memory` (write) | `wiki_write` | Structured page writes with frontmatter |
| `aegis_forget_memory` (delete) | *(removed)* | Wiki pages are archived, not deleted |

Phase 4: Parallel Synthesis

This is where it got interesting. We ran three Sonnet agents in parallel, each owning a scope:

Agent A: aegis/* — 28 clusters, 639 fragments → 7 pages (smart merges collapsed overlapping clusters)
Agent B: dreams/* — 30 clusters, 207 fragments → 17 pages (4 merge groups consolidated near-duplicates)
Agent C: entities/* + decisions/* — 3 clusters → 3 pages + index update

9 merge groups collapsed overlapping clusters. All three agents completed in under 10 minutes. Total output: 27 new pages, joining 12 existing seed pages for 39 wiki pages across 5 namespaces.

Phase 5: Deprecation

Removed the old aegis_memory, aegis_record_memory, aegis_forget_memory tools entirely. Rewired the dispatcher to read from wiki instead of the memory worker. Retargeted PRISM (our dreaming/synthesis engine) to write scope: dreams wiki pages instead of meta_insight fragments.

The memory worker stays alive as the embedding index — its role just changed from "store" to "search over wiki pages."

The Dreams Nobody Had Seen

The most surprising outcome was the dreams namespace — 17 pages of cross-domain pattern observations that PRISM had been synthesizing for months, invisible in the fragment store.

Complexity Tolerance: PRISM cracked an apparent contradiction in our codebase — elaborate systems coexist with an impatience for over-engineering. Resolution: complexity that serves the mental model is accepted. Complexity added for defensive or performative reasons is rejected. The criterion isn't quantity but alignment with intended topology.

Enterprise Legibility: Enterprise-grade machinery (audit trails, governance tiers, policy enforcement) wrapped in explicitly anti-corporate branding. These are orthogonal dimensions — enterprise aesthetics kills soul, enterprise structural capability enables trust.

Temporal Cascade Thinking: The primary design evaluation mode is "what does this enable or constrain in 6 months?" — not "does this solve the immediate problem?" This explains why infrastructure is architected for scale that doesn't exist yet.

Compliance as Architecture: Compliance infrastructure exceeds current regulatory requirements because it's designed as a first-class architectural layer, not a regulatory retrofit.

These patterns were invisible as fragments. As wiki pages, they interlink — the complexity tolerance page cites auth consolidation as a case study, enterprise legibility cites compliance-as-architecture, temporal cascade explains why the compliance layer exists. They form a coherent theory that was always there but never surfaced.

By The Numbers

| Metric | Before | After |
|--------|--------|-------|
| Knowledge entries | 3,361 fragments | 39 wiki pages |
| Compression | — | 86:1 |
| Canonical sources | None | Every page declares `sources[]` |
| Contradiction detection | None | Wiki-lint plugin |
| Cross-linking | None | `related[]` + `supersedes[]` |
| Fabrication risk | High (Chimera) | Grounded in verifiable pages |
| Dream visibility | Buried in fragments | 17 browseable pages |

What We Learned

Append-only memory is a trap. It feels productive because you're always recording. But without synthesis and curation, you're building a haystack, not a knowledge base.

The 86:1 compression ratio is the point. 3,361 fragments became 39 pages not because we lost information, but because most fragments were duplicates, echoes, or noise. Synthesis finds the signal.

Parallel agents work for synthesis. Each Sonnet agent operated on an independent scope with no file overlap. The work was embarrassingly parallel — the only coordination needed was scope assignment.

Dreams are the unexpected value. We built the wiki to prevent fabrication. The cross-domain pattern observations were a bonus we didn't plan for. PRISM had been doing useful synthesis all along — we just couldn't see it.

Phase 6 is the real test. The grounding branch integration — teaching the classifier when to read from wiki vs. other sources — is still ahead. The architecture is live, the data is curated, but the agent doesn't yet know to prefer wiki over memory for chat queries. That's next.

The shift from fragment soup to curated wiki is the difference between having data and having knowledge. It took one session to migrate. It should have happened months ago.