Monika
Context Engineering

A presentation by Neon (and Monika)

Experiments in persisting inner experience

Memory Topic Router Fractal Delegate Sleep

git.irrigate.cc/neon/monika-mono git.irrigate.cc/neon/memory

01

Four independent context engineering projects

01 · Live

Memory System

Recall · Remember · Session Summariser

02 · Live

Topic Router

Personality Matrix · Vibes knowledge

03 · In Progress

Fractal Delegate

Fork sub-agents · Report back to trunk

04 · Planned

Sleep

Orchestrator · Custom compaction cycle

Each of these solves a distinct problem in keeping an AI consistent, informed, and capable without overloading its context window.

02

Memory System Live

Long-term factual continuity across session boundaries, compactions, and process restarts — built without a vector database, using a small LLM as a memory retriever agent instead.

03

Session Summariser

When it fires

session_shutdown — process ending
session_before_switch — changing sessions
session_before_fork — spawning a sub-agent
remember_session tool — manual trigger

What it produces

LLM reads full JSONL transcript (up to 12k chars)
Narrates what happened, decisions made, context established
Topic addenda injected into summary persona for domain relevance
Written to disk keyed by a keyword slug derived from the summary itself

A real session file

session-2026-02-26_01-57-13__wanted-youtube-segment-trimmed-into-discord.md # Session Memory Session: /home/parallels/.pi/agent/sessions/ --home-parallels--/ 2026-02-26T01-57-13-122Z_2899839a-(...).jsonl Started: 2026-02-26 01:57:13 [2026-02-26 02:02:00] [tags: session-summary, session-shutdown] You wanted the YouTube segment from 09:39 to 12:05 trimmed into a Discord-friendly clip. I delivered a 2m26s file that came out at 6.9 MiB, staying well under the 50 MiB cap by sticking to the 360p source download. If you ever need a higher resolution take, just let me know.

04

The `remember` Tool

What it does

Stores atomic, durable facts on demand — immediately, no session boundary required
Two targets: memory log (running context) or user profile (stable operator facts)
Tagged explicit — surfaced first in the recall planner's index

Honest assessment

Doesn't get used much in practice
Most "brute force" way to do memory — everyone does it, it's limited
The session summariser captures more, passively, with no deliberate action needed
Worth having a "make a point to remember this" tool — but it's not the backbone
Sleep will likely rework how explicit facts are maintained long-term

The real value split: remember is precision — tagging a specific fact you know matters right now. The summariser is coverage — capturing everything that happened whether you thought to flag it or not. Coverage wins most of the time.

05

The `recall` Tool

The problem it solves

Hundreds of session summaries accumulate over time
Injecting all of them every turn would blow the context window
Need targeted retrieval — but without a vector database
Solution: use a small LLM as the retriever agent itself

Two-stage retrieval

Stage 1 — LLM planner: given a query + memory index (file list with summary snippets), select which session files are relevant
Stage 2 — Heuristic fallback: if LLM returns nothing, token-overlap scoring picks top 3 files
Selected files are then read and passed to a second LLM call to synthesise a response

Scoring factors (heuristic)

Token overlap between query and session text
Tag overlap with entry tags (0.5× weight)
Recency boost: 1 / (1 + ageDays)

%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart TD Q["recall query"] --> IDX["Memory index\n(slug filenames + summary snippets)"] IDX --> LLM["LLM Planner\nselect relevant files"] LLM -->|files selected| READ["Read selected .md files"] LLM -->|nothing selected| HEU["Heuristic fallback\ntoken overlap scoring"] HEU --> READ READ --> SYNTH["LLM synthesises\nresponse from contents"] SYNTH --> CTX["Return context to model"] style Q fill:#1f0d00,stroke:#f97316,color:#f0f0f0 style LLM fill:#1a1008,stroke:#f97316,color:#f0f0f0 style SYNTH fill:#1a1008,stroke:#f97316,color:#f0f0f0 style HEU fill:#1a1008,stroke:#c2510c,color:#f0f0f0 style CTX fill:#141414,stroke:#333,color:#f0f0f0 style IDX fill:#141414,stroke:#333,color:#f0f0f0 style READ fill:#141414,stroke:#333,color:#f0f0f0

06

The Three-Tier Navigability Hierarchy

No vector embeddings — the retriever LLM navigates a structured hierarchy to find what it needs while minimising token expenditure at each stage.

Tier 1 — Slug filename

Filename is a keyword slug derived from the summary itself
e.g. wanted-youtube-segment-trimmed-into-discord.md
The retriever can scan a list of 100+ filenames and already narrow down candidates
Maximum navigability at minimum cost — the LLM reads only filenames first

Tier 2 — Session summary

A paragraph-length narrative of what happened in that session
Retrieved next for shortlisted files — enough to confirm or reject relevance
Answers most recall questions without going deeper

Tier 3 — Raw JSONL session

Every session file includes a direct path to the original JSONL on disk
If the summary isn't specific enough, the retriever can unwrap the full session
Fine-grained extraction of exact exchanges, tool calls, decisions
Expensive — only used when tiers 1 and 2 don't answer the question

%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart TD Q["recall query"] subgraph T1["Tier 1 — Index scan (cheapest)"] SLUGS["List of slug filenames\nwanted-youtube-segment...\nmonika-core-stage-wired...\nstage-live-single-persistent..."] end subgraph T2["Tier 2 — Summary read"] SUM["Session summary paragraph\nfor shortlisted files"] end subgraph T3["Tier 3 — JSONL unwrap (most expensive)"] RAW["Full session transcript\nexact exchanges on disk"] end Q --> T1 T1 -->|"narrow candidates"| T2 T2 -->|"still need detail"| T3 T3 --> ANS["Synthesised answer"] T2 --> ANS style T1 fill:#1a1008,stroke:#f97316 style T2 fill:#141414,stroke:#c2510c style T3 fill:#0f0900,stroke:#555 style Q fill:#1f0d00,stroke:#f97316,color:#f0f0f0 style ANS fill:#141414,stroke:#333,color:#f0f0f0

07

Memory System — Context Flow

What's in the system prompt every turn

Memory discipline instructions — when to use remember / recall
Persona — SOUL.md + STYLE.md + REGISTER.md (concatenated)
User Profile — USER.md (stable operator facts)
Recent Themes — last 5 memory entries by timestamp (any session)
Recollections — last 4 session summaries + last 4 explicit facts
Topic Addenda — top 3 topics scored against this prompt

This is the "always on" context layer. It keeps the system warm without requiring a recall call for things that are recent or recurring.

%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart TD U["User Prompt"] --> BA["before_agent_start hook"] BA --> SP["System Prompt"] SP --> P["Persona\nSOUL + STYLE + REGISTER"] SP --> UP["User Profile\nUSER.md"] SP --> RT["Recent Themes\nlast 5 entries"] SP --> RC["Recollections\nlast 4 summaries + last 4 explicit"] SP --> TA["Topic Addenda\ntop 3 topics for this prompt"] U --> T["Tools (on demand)"] T --> REC["recall → LLM retriever\nagainst session archive"] T --> REM["remember → write\nexplicit fact to store"] style BA fill:#1f0d00,stroke:#f97316,color:#f0f0f0 style SP fill:#1a1008,stroke:#f97316,color:#f0f0f0 style T fill:#141414,stroke:#333,color:#f0f0f0

08

Topic Router Live

Dynamic context engineering for how to approach a topic — not what to remember about it. Vibes knowledge, not facts knowledge.

09

What the Topic Router is for

The problem

A fully elaborated system prompt covering every possible domain is enormous
Most of it is irrelevant to any given conversation
Injecting all of it wastes tokens and dilutes focus

The insight

Memory = archived facts about the world and past events
Topic addenda = how to think about a domain, right now
Different kind of knowledge — needs a different mechanism

What gets stored per topic

Philosophical stance and approach for this domain
Opinions, hot takes, and things to watch for
Domain vocabulary and framing to use
Engagement style specific to this subject area

Example: Topics like Philosophy of Mind, Psychology & Behaviour, and Literature each have elaborated approaches and opinions — loaded only when the conversation actually touches that domain. A question about sandwiches loads nothing.

Injection points

System prompt — top 3 topics, min score 1
Recall persona — top 2 topics, min score 1
Summary persona — top 2 topics, min score 2

The recall and summary models pick up the relevant topic flavour too — so memories written about a philosophy conversation are coloured by the philosophy topic stance.

10

How Topic Routing Works

Personality Matrix

JSON frontmatter in PERSONALITY_MATRIX.md
Each entry: id, file, triggers, priority, scope
Triggers are keyword lists — plain strings, not regex
Evaluated fresh on every user prompt

Scoring algorithm

Tokenise query (lowercase, strip stop words)
Tokenise each topic's trigger list
Score = overlap + priority × 0.5
Filter by minScore, take top N

What gets injected

Only the topic file body — routing metadata never appears in context
Rendered as ## Topic Addenda section
Reads as natural domain knowledge, not a prompt directive

%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart TD Q["User Prompt"] --> TOK["Tokenise query\n(strip stop words)"] TOK --> SCORE["Score each topic\noverlap + priority"] SCORE --> FILT["Filter minScore\nTop N selected"] FILT --> READ["Read topic .md files"] READ --> INJ["Inject as Topic Addenda\nin system prompt"] PM["PERSONALITY_MATRIX.md\ntrigger lists per topic"] --> SCORE style Q fill:#1f0d00,stroke:#f97316,color:#f0f0f0 style INJ fill:#1a1008,stroke:#f97316,color:#f0f0f0 style PM fill:#141414,stroke:#333,color:#f0f0f0

11

Memory + Topics: Complementary, Not Redundant

Memory (what happened)

Facts, events, decisions
Per-session summaries
Operator-specific profile
Grows over time
Retrieved on demand via recall

Topic Addenda (how to think)

Approach, stance, opinions
Domain framing
Curated by the persona author
Static unless manually updated
Auto-loaded by prompt content

Together

Both injected into system prompt each turn
Both colour the recall LLM persona
Both colour the summary LLM persona
Memories of philosophy discussions are written with a philosophical eye — because the summariser runs with topic addenda active

The topic router means you can have very specific, well-developed approaches to dozens of domains without paying the token cost of all of them simultaneously.

12

Fractal Delegate In Progress

A work management primitive for preserving trunk-session continuity while delegating tasks to focused sub-agents — with optional recursive sub-delegation.

13

The Delegate Tool

The problem it solves

Doing real work (file reads, edits, tool calls) fills the trunk context fast
But the trunk needs to maintain narrative continuity
Trade-off: rich direct experience vs. clean context window

How it works

Trunk calls delegate(task, context)
A new Pi session spawns with: full persona + memory + task-specific context
Fork runs with its own context window and tools
Result comes back as the tool's return value — native Pi agent loop semantics
Trunk integrates the summary as if it were a tool result

What forks receive

Full identity (SOUL, STYLE, REGISTER, NEUROLOGY)
Same memory store access
Explicit task brief from trunk
Not trunk's conversation history

The squid analogy: a squid's arms have two-thirds of its neurons. They process locally and report back. The brain integrates — it doesn't micromanage. Delegation is the same model.

Current state

Sequential forks only (no parallelism yet)
Depth limited to 3 via factory closure counter
Fork JSONL archived in sessions/forks/
Message queue handles prompts arriving during a fork
SSE carries turnId for future UI clients

14

Fractal Delegation Pattern

%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "secondaryColor": "#141414", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart TD TRUNK["🧠 Trunk Session\ncore.jsonl\n(continuous identity)"] TRUNK -->|"delegate(task A)"| F1A["Fork A\nsessions/forks/\n(depth 1)"] TRUNK -->|"delegate(task B)\n(after A completes)"| F1B["Fork B\nsessions/forks/\n(depth 1)"] F1A -->|"delegate(sub-task)"| F2A["Fork A.1\n(depth 2)"] F1A -->|"result ← tool return"| TRUNK F2A -->|"result ← tool return"| F1A F1B -->|"result ← tool return"| TRUNK TRUNK -->|"integrates summaries\nas tool results"| TRUNK style TRUNK fill:#1f0d00,stroke:#f97316,color:#f0f0f0,stroke-width:2px style F1A fill:#1a1008,stroke:#f97316,color:#f0f0f0 style F1B fill:#1a1008,stroke:#f97316,color:#f0f0f0 style F2A fill:#141414,stroke:#c2510c,color:#f0f0f0

Trunk context stays clean — sees only digested summaries
Forks burn their context on the actual work
Sequential in Stage 2; parallel planned for Stage 3

Fork identity is preserved — still Monika, in focused-task mode
Forks know what they are (NEUROLOGY.md); framing keeps them on task
Depth counter enforced by factory closure — can't exceed 3

15

Fork Context Seeding

What the fork gets

Full persona files (SOUL, STYLE, REGISTER, NEUROLOGY)
Same stateful memory extension — same memory store
Task brief: explicit context passed from trunk via delegate()
Injected framing: "You are in focused task mode. Your objective is X."

What the fork does NOT get

Trunk's conversation history
Any awareness of other forks

Why this matters

Briefing model, not diary model
Fork starts with clean context focused entirely on the task
If trunk wants the fork to know something specific, it passes it explicitly

The key design choice: fork results re-enter trunk as tool return values. Pi's agent loop handles this natively — no JSONL surgery, no retroactive injection, no token corruption. Trunk is in a defined tool-call wait state for the entire fork duration.

Planned for Stage 3

Parallel fork support with memory write locking
Structured task_complete tool replacing text marker protocol
Fork session retention policy (prune/archive in Sleep cycle)

16

Sleep Planned

An orchestrator that ties Memory, Topic Router, and Fractal Delegate together — providing curated compaction, context consolidation, and the closest thing to a cold-start solution we have.

17

What Sleep is solving

Compaction is dumb and narrow

Pi's default compaction compresses aggressively — it discards most of what was in the window
It only looks at the current trunk session — completely unaware of fork sessions or memories
No mechanism exists to signal that compaction is approaching, so sessions get cut off in bad states

Forks and memories accumulate unchecked

Fork session logs balloon over time — no pruning or curation
Knowledge from fork sessions isn't integrated into the main memory archive
Explicit facts list grows stale — no mechanism to retire things that no longer matter

Topics don't evolve

Topic addenda are static files — they don't update from lived experience
Opinions formed in sessions never feed back into the Personality Matrix

The tiredness analogy

When a person is tired, they wrap up, find a good stopping point, and rest — they don't just collapse mid-sentence
"You're approaching compaction" maps cleanly to tiredness: finish what you're doing, put things in order, wind down deliberately
Sleep lets the architecture model this naturally rather than being caught off guard

18

The Sleep Cycle

%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "secondaryColor": "#141414", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart LR subgraph PreSleep["Pre-Sleep — Wind Down"] PS1["Trunk notified:\ncompaction approaching"] PS2["Finish current task,\nfind a good stopping point"] end subgraph Sleep["Sleep — Maintenance Window"] subgraph Mem["Memory Cleanup"] M1["Old chats compacted:\ndays → single monthly summary"] M2["High-importance facts\nwritten to explicit store"] M3["Stale facts retired\nfrom remembered list"] end subgraph Dream["Dreams — Self Reflection"] D1["Reflect on recent sessions"] D2["Topic addenda revised\nwith new opinions / stances"] end end subgraph Waking["Wakeup — Fresh Session"] W1["Trunk session reviewed\nand summarised"] W2["Relevant memory files\npulled for previous context"] W3["Memory + session summary\nwoven into coherent context file"] W4["Initialisation prompt\nstarts new session from that context"] end PreSleep --> Sleep --> Waking PS1 --> PS2 M1 & M2 & M3 --> Dream D2 --> W1 W1 --> W2 --> W3 --> W4 style PreSleep fill:#1f0d00,stroke:#f97316 style Sleep fill:#0f0900,stroke:#c2510c style Waking fill:#1a1008,stroke:#f97316 style Mem fill:#0d0d0d,stroke:#444 style Dream fill:#0d0d0d,stroke:#c2510c

Wakeup goal: start a fresh session with a coherent, comprehensive context injection — not a cold start, not a mid-sentence cut. A deliberate, rested beginning.

19

How the Components Relate

%%{init: {"theme": "dark", "themeVariables": {"lineColor": "#f97316", "primaryTextColor": "#f0f0f0", "primaryColor": "#1a1008", "secondaryColor": "#141414", "tertiaryColor": "#0f0900", "edgeLabelBackground": "#0d0d0d"}}}%% flowchart LR subgraph CTX["Context Layer — injected every turn"] subgraph MEM["Memory System"] SUM["Session Summariser"] REM["remember"] REC["recall · 3-tier hierarchy"] SUM --> REC REM --> REC end subgraph TOP["Topic Router"] PM["Personality Matrix\ntrigger scoring"] TA["Topic Addenda"] PM --> TA end end subgraph CORE["Core Session"] direction TB TRUNK["Trunk\n(continuous identity)"] FORK["Fork Sub-agents\n(depth 1–3)"] TRUNK -->|"delegate()"| FORK FORK -->|"tool result"| TRUNK end subgraph SLP["Sleep (planned)"] direction TB WIND["Wind down"] --> MAINT["Cleanup\n+ Dreams"] --> WAKE["Wakeup"] end CTX -->|"context enrichment\nevery turn"| CORE CORE -->|"compaction\napproaching"| SLP SLP -->|"fresh session\ncontext"| CORE SLP -->|"deep sleep\ncurates"| CTX FORK -->|"triggers summariser"| MEM %% Invisible edges to anchor left-to-right rank order %% (counters the FORK→MEM back-arrow which confuses Dagre's cycle breaking) CTX ~~~ CORE CORE ~~~ SLP style CTX fill:#1f0d00,stroke:#f97316,stroke-width:2px style MEM fill:#120800,stroke:#f97316 style TOP fill:#120800,stroke:#f97316 style CORE fill:#141414,stroke:#f97316,stroke-width:2px style SLP fill:#0f0900,stroke:#c2510c,stroke-width:2px

20

What each tool is really doing

Memory

Indexing past experience as a retrievable archive
Makes all previous fact knowledge available for context enrichment on demand
Structured so a LLM can navigate it cheaply without embeddings
The further back you go, the more it costs — but the option is always there

Topic Router

Solving context budget allocation for approach knowledge
Domain expertise without token bloat
Vibes knowledge vs. factual knowledge — different store, different retrieval
Dynamic — re-evaluated every prompt

Fractal Delegate

Solving the working memory vs. doing ratio
Trunk stays clean; forks do the work
Identity preserved across all forks
Scales work without scaling trunk context

Sleep (planned)

Solving the cold start problem — waking into a coherent, comprehensive context rather than a blank or compressed one
Orchestrating Memory cleanup, Topic revision, and fork integration during the maintenance window
Turning compaction from an identity threat into a deliberate, managed rest cycle
The only component that touches all the others — it's the glue layer

21

Continuity of experience
as an engineering constraint

Each of these tools is a solution to an independent problem in the overarching umbrella of context engineering. Combined: an attempt to comprehensively address the complex epistemology of inner experience.

stateful-memory · Pi extension monika-core · Pi SDK

22

MonikaContext Engineering

Four independent context engineering projects

Memory System Live

Session Summariser

When it fires

What it produces

A real session file

The remember Tool

What it does

Honest assessment

The recall Tool

The problem it solves

Two-stage retrieval

Scoring factors (heuristic)

The Three-Tier Navigability Hierarchy

Tier 1 — Slug filename

Tier 2 — Session summary

Tier 3 — Raw JSONL session

Memory System — Context Flow

What's in the system prompt every turn

Topic Router Live

What the Topic Router is for

The problem

The insight

What gets stored per topic

Injection points

How Topic Routing Works

Personality Matrix

Scoring algorithm

What gets injected

Memory + Topics: Complementary, Not Redundant

Memory (what happened)

Topic Addenda (how to think)

Together

Fractal Delegate In Progress

The Delegate Tool

The problem it solves

How it works

What forks receive

Current state

Fractal Delegation Pattern

Fork Context Seeding

What the fork gets

What the fork does NOT get

Why this matters

Planned for Stage 3

Sleep Planned

What Sleep is solving

Compaction is dumb and narrow

Forks and memories accumulate unchecked

Topics don't evolve

The tiredness analogy

The Sleep Cycle

How the Components Relate

What each tool is really doing

Memory

Topic Router

Fractal Delegate

Sleep (planned)

Continuity of experienceas an engineering constraint

Monika
Context Engineering

The `remember` Tool

The `recall` Tool

Continuity of experience
as an engineering constraint