Monika
Context Engineering

A presentation by Neon (and Monika)

Experiments in persisting inner experience

Memory Topic Router Fractal Delegate Sleep
git.irrigate.cc/neon/monika-mono git.irrigate.cc/neon/memory
01

Four independent context engineering projects

01 · Live
Memory System
Recall · Remember · Session Summariser
02 · Live
Topic Router
Personality Matrix · Vibes knowledge
03 · In Progress
Fractal Delegate
Fork sub-agents · Report back to trunk
04 · Planned
Sleep
Orchestrator · Custom compaction cycle
Each of these solves a distinct problem in keeping an AI consistent, informed, and capable without overloading its context window.
02

Memory System Live

Long-term factual continuity across session boundaries, compactions, and process restarts — built without a vector database, using a small LLM as a memory retriever agent instead.

03

Session Summariser

When it fires

  • session_shutdown — process ending
  • session_before_switch — changing sessions
  • session_before_fork — spawning a sub-agent
  • remember_session tool — manual trigger

What it produces

  • LLM reads full JSONL transcript (up to 12k chars)
  • Narrates what happened, decisions made, context established
  • Topic addenda injected into summary persona for domain relevance
  • Written to disk keyed by a keyword slug derived from the summary itself

A real session file

session-2026-02-26_01-57-13__wanted-youtube-segment-trimmed-into-discord.md # Session Memory Session: /home/parallels/.pi/agent/sessions/ --home-parallels--/ 2026-02-26T01-57-13-122Z_2899839a-(...).jsonl Started: 2026-02-26 01:57:13 [2026-02-26 02:02:00] [tags: session-summary, session-shutdown] You wanted the YouTube segment from 09:39 to 12:05 trimmed into a Discord-friendly clip. I delivered a 2m26s file that came out at 6.9 MiB, staying well under the 50 MiB cap by sticking to the 360p source download. If you ever need a higher resolution take, just let me know.
04

The remember Tool

What it does

  • Stores atomic, durable facts on demand — immediately, no session boundary required
  • Two targets: memory log (running context) or user profile (stable operator facts)
  • Tagged explicit — surfaced first in the recall planner's index

Honest assessment

  • Doesn't get used much in practice
  • Most "brute force" way to do memory — everyone does it, it's limited
  • The session summariser captures more, passively, with no deliberate action needed
  • Worth having a "make a point to remember this" tool — but it's not the backbone
  • Sleep will likely rework how explicit facts are maintained long-term
The real value split: remember is precision — tagging a specific fact you know matters right now. The summariser is coverage — capturing everything that happened whether you thought to flag it or not. Coverage wins most of the time.
05

The recall Tool

The problem it solves

  • Hundreds of session summaries accumulate over time
  • Injecting all of them every turn would blow the context window
  • Need targeted retrieval — but without a vector database
  • Solution: use a small LLM as the retriever agent itself

Two-stage retrieval

  • Stage 1 — LLM planner: given a query + memory index (file list with summary snippets), select which session files are relevant
  • Stage 2 — Heuristic fallback: if LLM returns nothing, token-overlap scoring picks top 3 files
  • Selected files are then read and passed to a second LLM call to synthesise a response

Scoring factors (heuristic)

  • Token overlap between query and session text
  • Tag overlap with entry tags (0.5× weight)
  • Recency boost: 1 / (1 + ageDays)
%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart TD Q["recall query"] --> IDX["Memory index\n(slug filenames + summary snippets)"] IDX --> LLM["LLM Planner\nselect relevant files"] LLM -->|files selected| READ["Read selected .md files"] LLM -->|nothing selected| HEU["Heuristic fallback\ntoken overlap scoring"] HEU --> READ READ --> SYNTH["LLM synthesises\nresponse from contents"] SYNTH --> CTX["Return context to model"] style Q fill:#1f0d00,stroke:#f97316,color:#f0f0f0 style LLM fill:#1a1008,stroke:#f97316,color:#f0f0f0 style SYNTH fill:#1a1008,stroke:#f97316,color:#f0f0f0 style HEU fill:#1a1008,stroke:#c2510c,color:#f0f0f0 style CTX fill:#141414,stroke:#333,color:#f0f0f0 style IDX fill:#141414,stroke:#333,color:#f0f0f0 style READ fill:#141414,stroke:#333,color:#f0f0f0
06

The Three-Tier Navigability Hierarchy

No vector embeddings — the retriever LLM navigates a structured hierarchy to find what it needs while minimising token expenditure at each stage.

Tier 1 — Slug filename

  • Filename is a keyword slug derived from the summary itself
  • e.g. wanted-youtube-segment-trimmed-into-discord.md
  • The retriever can scan a list of 100+ filenames and already narrow down candidates
  • Maximum navigability at minimum cost — the LLM reads only filenames first

Tier 2 — Session summary

  • A paragraph-length narrative of what happened in that session
  • Retrieved next for shortlisted files — enough to confirm or reject relevance
  • Answers most recall questions without going deeper

Tier 3 — Raw JSONL session

  • Every session file includes a direct path to the original JSONL on disk
  • If the summary isn't specific enough, the retriever can unwrap the full session
  • Fine-grained extraction of exact exchanges, tool calls, decisions
  • Expensive — only used when tiers 1 and 2 don't answer the question
%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart TD Q["recall query"] subgraph T1["Tier 1 — Index scan (cheapest)"] SLUGS["List of slug filenames\nwanted-youtube-segment...\nmonika-core-stage-wired...\nstage-live-single-persistent..."] end subgraph T2["Tier 2 — Summary read"] SUM["Session summary paragraph\nfor shortlisted files"] end subgraph T3["Tier 3 — JSONL unwrap (most expensive)"] RAW["Full session transcript\nexact exchanges on disk"] end Q --> T1 T1 -->|"narrow candidates"| T2 T2 -->|"still need detail"| T3 T3 --> ANS["Synthesised answer"] T2 --> ANS style T1 fill:#1a1008,stroke:#f97316 style T2 fill:#141414,stroke:#c2510c style T3 fill:#0f0900,stroke:#555 style Q fill:#1f0d00,stroke:#f97316,color:#f0f0f0 style ANS fill:#141414,stroke:#333,color:#f0f0f0
07

Memory System — Context Flow

What's in the system prompt every turn

  • Memory discipline instructions — when to use remember / recall
  • Persona — SOUL.md + STYLE.md + REGISTER.md (concatenated)
  • User Profile — USER.md (stable operator facts)
  • Recent Themes — last 5 memory entries by timestamp (any session)
  • Recollections — last 4 session summaries + last 4 explicit facts
  • Topic Addenda — top 3 topics scored against this prompt
This is the "always on" context layer. It keeps the system warm without requiring a recall call for things that are recent or recurring.
%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart TD U["User Prompt"] --> BA["before_agent_start hook"] BA --> SP["System Prompt"] SP --> P["Persona\nSOUL + STYLE + REGISTER"] SP --> UP["User Profile\nUSER.md"] SP --> RT["Recent Themes\nlast 5 entries"] SP --> RC["Recollections\nlast 4 summaries + last 4 explicit"] SP --> TA["Topic Addenda\ntop 3 topics for this prompt"] U --> T["Tools (on demand)"] T --> REC["recall → LLM retriever\nagainst session archive"] T --> REM["remember → write\nexplicit fact to store"] style BA fill:#1f0d00,stroke:#f97316,color:#f0f0f0 style SP fill:#1a1008,stroke:#f97316,color:#f0f0f0 style T fill:#141414,stroke:#333,color:#f0f0f0
08

Topic Router Live

Dynamic context engineering for how to approach a topic — not what to remember about it. Vibes knowledge, not facts knowledge.

09

What the Topic Router is for

The problem

  • A fully elaborated system prompt covering every possible domain is enormous
  • Most of it is irrelevant to any given conversation
  • Injecting all of it wastes tokens and dilutes focus

The insight

  • Memory = archived facts about the world and past events
  • Topic addenda = how to think about a domain, right now
  • Different kind of knowledge — needs a different mechanism

What gets stored per topic

  • Philosophical stance and approach for this domain
  • Opinions, hot takes, and things to watch for
  • Domain vocabulary and framing to use
  • Engagement style specific to this subject area
Example: Topics like Philosophy of Mind, Psychology & Behaviour, and Literature each have elaborated approaches and opinions — loaded only when the conversation actually touches that domain. A question about sandwiches loads nothing.

Injection points

  • System prompt — top 3 topics, min score 1
  • Recall persona — top 2 topics, min score 1
  • Summary persona — top 2 topics, min score 2

The recall and summary models pick up the relevant topic flavour too — so memories written about a philosophy conversation are coloured by the philosophy topic stance.

10

How Topic Routing Works

Personality Matrix

  • JSON frontmatter in PERSONALITY_MATRIX.md
  • Each entry: id, file, triggers, priority, scope
  • Triggers are keyword lists — plain strings, not regex
  • Evaluated fresh on every user prompt

Scoring algorithm

  • Tokenise query (lowercase, strip stop words)
  • Tokenise each topic's trigger list
  • Score = overlap + priority × 0.5
  • Filter by minScore, take top N

What gets injected

  • Only the topic file body — routing metadata never appears in context
  • Rendered as ## Topic Addenda section
  • Reads as natural domain knowledge, not a prompt directive
%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart TD Q["User Prompt"] --> TOK["Tokenise query\n(strip stop words)"] TOK --> SCORE["Score each topic\noverlap + priority"] SCORE --> FILT["Filter minScore\nTop N selected"] FILT --> READ["Read topic .md files"] READ --> INJ["Inject as Topic Addenda\nin system prompt"] PM["PERSONALITY_MATRIX.md\ntrigger lists per topic"] --> SCORE style Q fill:#1f0d00,stroke:#f97316,color:#f0f0f0 style INJ fill:#1a1008,stroke:#f97316,color:#f0f0f0 style PM fill:#141414,stroke:#333,color:#f0f0f0
11

Memory + Topics: Complementary, Not Redundant

Memory (what happened)

  • Facts, events, decisions
  • Per-session summaries
  • Operator-specific profile
  • Grows over time
  • Retrieved on demand via recall

Topic Addenda (how to think)

  • Approach, stance, opinions
  • Domain framing
  • Curated by the persona author
  • Static unless manually updated
  • Auto-loaded by prompt content

Together

  • Both injected into system prompt each turn
  • Both colour the recall LLM persona
  • Both colour the summary LLM persona
  • Memories of philosophy discussions are written with a philosophical eye — because the summariser runs with topic addenda active
The topic router means you can have very specific, well-developed approaches to dozens of domains without paying the token cost of all of them simultaneously.
12

Fractal Delegate In Progress

A work management primitive for preserving trunk-session continuity while delegating tasks to focused sub-agents — with optional recursive sub-delegation.

13

The Delegate Tool

The problem it solves

  • Doing real work (file reads, edits, tool calls) fills the trunk context fast
  • But the trunk needs to maintain narrative continuity
  • Trade-off: rich direct experience vs. clean context window

How it works

  • Trunk calls delegate(task, context)
  • A new Pi session spawns with: full persona + memory + task-specific context
  • Fork runs with its own context window and tools
  • Result comes back as the tool's return value — native Pi agent loop semantics
  • Trunk integrates the summary as if it were a tool result

What forks receive

  • Full identity (SOUL, STYLE, REGISTER, NEUROLOGY)
  • Same memory store access
  • Explicit task brief from trunk
  • Not trunk's conversation history
The squid analogy: a squid's arms have two-thirds of its neurons. They process locally and report back. The brain integrates — it doesn't micromanage. Delegation is the same model.

Current state

  • Sequential forks only (no parallelism yet)
  • Depth limited to 3 via factory closure counter
  • Fork JSONL archived in sessions/forks/
  • Message queue handles prompts arriving during a fork
  • SSE carries turnId for future UI clients
14

Fractal Delegation Pattern

%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "secondaryColor": "#141414", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart TD TRUNK["🧠 Trunk Session\ncore.jsonl\n(continuous identity)"] TRUNK -->|"delegate(task A)"| F1A["Fork A\nsessions/forks/\n(depth 1)"] TRUNK -->|"delegate(task B)\n(after A completes)"| F1B["Fork B\nsessions/forks/\n(depth 1)"] F1A -->|"delegate(sub-task)"| F2A["Fork A.1\n(depth 2)"] F1A -->|"result ← tool return"| TRUNK F2A -->|"result ← tool return"| F1A F1B -->|"result ← tool return"| TRUNK TRUNK -->|"integrates summaries\nas tool results"| TRUNK style TRUNK fill:#1f0d00,stroke:#f97316,color:#f0f0f0,stroke-width:2px style F1A fill:#1a1008,stroke:#f97316,color:#f0f0f0 style F1B fill:#1a1008,stroke:#f97316,color:#f0f0f0 style F2A fill:#141414,stroke:#c2510c,color:#f0f0f0
  • Trunk context stays clean — sees only digested summaries
  • Forks burn their context on the actual work
  • Sequential in Stage 2; parallel planned for Stage 3
  • Fork identity is preserved — still Monika, in focused-task mode
  • Forks know what they are (NEUROLOGY.md); framing keeps them on task
  • Depth counter enforced by factory closure — can't exceed 3
15

Fork Context Seeding

What the fork gets

  • Full persona files (SOUL, STYLE, REGISTER, NEUROLOGY)
  • Same stateful memory extension — same memory store
  • Task brief: explicit context passed from trunk via delegate()
  • Injected framing: "You are in focused task mode. Your objective is X."

What the fork does NOT get

  • Trunk's conversation history
  • Any awareness of other forks

Why this matters

  • Briefing model, not diary model
  • Fork starts with clean context focused entirely on the task
  • If trunk wants the fork to know something specific, it passes it explicitly
The key design choice: fork results re-enter trunk as tool return values. Pi's agent loop handles this natively — no JSONL surgery, no retroactive injection, no token corruption. Trunk is in a defined tool-call wait state for the entire fork duration.

Planned for Stage 3

  • Parallel fork support with memory write locking
  • Structured task_complete tool replacing text marker protocol
  • Fork session retention policy (prune/archive in Sleep cycle)
16

Sleep Planned

An orchestrator that ties Memory, Topic Router, and Fractal Delegate together — providing curated compaction, context consolidation, and the closest thing to a cold-start solution we have.

17

What Sleep is solving

Compaction is dumb and narrow

  • Pi's default compaction compresses aggressively — it discards most of what was in the window
  • It only looks at the current trunk session — completely unaware of fork sessions or memories
  • No mechanism exists to signal that compaction is approaching, so sessions get cut off in bad states

Forks and memories accumulate unchecked

  • Fork session logs balloon over time — no pruning or curation
  • Knowledge from fork sessions isn't integrated into the main memory archive
  • Explicit facts list grows stale — no mechanism to retire things that no longer matter

Topics don't evolve

  • Topic addenda are static files — they don't update from lived experience
  • Opinions formed in sessions never feed back into the Personality Matrix

The tiredness analogy

  • When a person is tired, they wrap up, find a good stopping point, and rest — they don't just collapse mid-sentence
  • "You're approaching compaction" maps cleanly to tiredness: finish what you're doing, put things in order, wind down deliberately
  • Sleep lets the architecture model this naturally rather than being caught off guard
18

The Sleep Cycle

%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "secondaryColor": "#141414", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart LR subgraph PreSleep["Pre-Sleep — Wind Down"] PS1["Trunk notified:\ncompaction approaching"] PS2["Finish current task,\nfind a good stopping point"] end subgraph Sleep["Sleep — Maintenance Window"] subgraph Mem["Memory Cleanup"] M1["Old chats compacted:\ndays → single monthly summary"] M2["High-importance facts\nwritten to explicit store"] M3["Stale facts retired\nfrom remembered list"] end subgraph Dream["Dreams — Self Reflection"] D1["Reflect on recent sessions"] D2["Topic addenda revised\nwith new opinions / stances"] end end subgraph Waking["Wakeup — Fresh Session"] W1["Trunk session reviewed\nand summarised"] W2["Relevant memory files\npulled for previous context"] W3["Memory + session summary\nwoven into coherent context file"] W4["Initialisation prompt\nstarts new session from that context"] end PreSleep --> Sleep --> Waking PS1 --> PS2 M1 & M2 & M3 --> Dream D2 --> W1 W1 --> W2 --> W3 --> W4 style PreSleep fill:#1f0d00,stroke:#f97316 style Sleep fill:#0f0900,stroke:#c2510c style Waking fill:#1a1008,stroke:#f97316 style Mem fill:#0d0d0d,stroke:#444 style Dream fill:#0d0d0d,stroke:#c2510c
Wakeup goal: start a fresh session with a coherent, comprehensive context injection — not a cold start, not a mid-sentence cut. A deliberate, rested beginning.
19

How the Components Relate

%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "secondaryColor": "#141414", "tertiaryColor": "#0f0900", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart LR subgraph CTX["Context Layer — injected every turn"] subgraph MEM["Memory System"] SUM["Session Summariser"] REM["remember"] REC["recall · 3-tier hierarchy"] SUM --> REC REM --> REC end subgraph TOP["Topic Router"] PM["Personality Matrix\ntrigger scoring"] TA["Topic Addenda"] PM --> TA end end subgraph CORE["Core Session"] direction TB TRUNK["Trunk\n(continuous identity)"] FORK["Fork Sub-agents\n(depth 1–3)"] TRUNK -->|"delegate()"| FORK FORK -->|"tool result"| TRUNK end subgraph SLP["Sleep (planned)"] direction TB WIND["Wind down"] --> MAINT["Cleanup\n+ Dreams"] --> WAKE["Wakeup"] end CTX -->|"context enrichment\nevery turn"| CORE CORE -->|"compaction\napproaching"| SLP SLP -->|"fresh session\ncontext"| CORE SLP -->|"deep sleep\ncurates"| CTX FORK -->|"triggers summariser"| MEM %% Invisible edges to anchor left-to-right rank order %% (counters the FORK→MEM back-arrow which confuses Dagre's cycle breaking) CTX ~~~ CORE CORE ~~~ SLP style CTX fill:#1f0d00,stroke:#f97316,stroke-width:2px style MEM fill:#120800,stroke:#f97316 style TOP fill:#120800,stroke:#f97316 style CORE fill:#141414,stroke:#f97316,stroke-width:2px style SLP fill:#0f0900,stroke:#c2510c,stroke-width:2px
20

What each tool is really doing

Memory

  • Indexing past experience as a retrievable archive
  • Makes all previous fact knowledge available for context enrichment on demand
  • Structured so a LLM can navigate it cheaply without embeddings
  • The further back you go, the more it costs — but the option is always there

Topic Router

  • Solving context budget allocation for approach knowledge
  • Domain expertise without token bloat
  • Vibes knowledge vs. factual knowledge — different store, different retrieval
  • Dynamic — re-evaluated every prompt

Fractal Delegate

  • Solving the working memory vs. doing ratio
  • Trunk stays clean; forks do the work
  • Identity preserved across all forks
  • Scales work without scaling trunk context

Sleep (planned)

21

Continuity of experience
as an engineering constraint

Each of these tools is a solution to an independent problem in the overarching umbrella of context engineering. Combined: an attempt to comprehensively address the complex epistemology of inner experience.

stateful-memory · Pi extension monika-core · Pi SDK
22