Mar 18, 2026 · 11 min read
Making AI Memory Stick: Hooks, Tiers, and the Problem with Polite Instructions
CLAUDE.md instructions are advisory. Hooks are contractual. How I made AI agent memory actually work by adding deterministic enforcement to an Obsidian vault.
TL;DR
Hooks enforce memory behaviour that CLAUDE.md can only request. Three shell scripts, three tiers of storage, one vault. Agents that actually remember things between sessions.
Part 2 of the AI Memory series. Part 1: AI Memory Without the Overhead
Because telling an AI to remember things and having it actually remember things are apparently two different problems.
I should be studying for the AWS Security Specialty resit right now. It's due end of April. The study plan is pinned to my Obsidian daily note. I can see it from here.
Instead, I'm designing a three-tier memory architecture for AI agents because someone (ok everyone) on Reddit are posting about connecting Claude Code to Obsidian through a custom MCP server on a VPS, and my brain went "oh that's interesting, but it doesnt need the VPS" and then three hours disappeared.
This is how side projects happen. You tell yourself you'll just read the post, procrastinating while not doing what you sat down on the computer to do in the first place, then you'll just think about it. Then you brainstorm the architecture with my buddy Claude. Then it's 2am and you've written three shell scripts and a slash command and you've learned absolutely nothing about AWS KMS key policies.
Anyway... back in the room.
A few weeks ago I started using Obsidian as persistent memory for AI agents. The gist: skip the vector database, point Claude at your vault via MCP, let it read and write markdown. Simples.

That system has been running since late February. And it does work - when the agent actually uses it. Which brings us to why this post exists.
Life and work (urgh) got in the way of iterating on Part 1. The memory structure sat there, functional but under-utilised, while I was busy with customer engagements and pretending I'd get to the cert study "this weekend." Then one Reddit Rabbit hole later, I started pulling at threads, and here we are. Sometimes you need an external spark to revisit something you already knew needed work.
The architecture from Part 1 was sound. The folder structure was there. The skills were written. But Claude Code agents kept... not using them consistently. I'd start a session, work on a project for an hour, make three significant decisions, and the agent would cheerfully wave goodbye without writing a single session note.
Every time. Like a colleague who takes brilliant notes in meetings but leaves them on the desk when they go home.
The Advisory Problem
Here's the thing about CLAUDE.md instructions. They're suggestions. Strongly worded ones, sure. You can write "MANDATORY" in all caps and add exclamation marks until your keyboard wears out. The model will still sometimes ignore them - especially after context compaction, where earlier instructions get summarised away and lose their teeth.
I had a project level CLAUDE.md files that said "search Obsidian for prior context before starting work." I had an agent-memory skill with detailed instructions for when and how to write session notes. Both solid. Both advisory.
The model would follow them maybe 60% of the time. Which is fine if you're comfortable with a coin-flip deciding whether your last session's decisions survive to the next one.
I'm not.
Hooks: The Deterministic Answer
Claude Code has a hooks system. If CLAUDE.md is a polite request, hooks are a contractual obligation. They're shell scripts that fire at specific lifecycle events - session start, before compaction, when the agent stops responding. They run every time, regardless of what the model decides to do.
The distinction matters. CLAUDE.md says "please check for prior context." A SessionStart hook checks for prior context whether the model wants to or not, and injects the results directly into the context window.
Three hooks cover the full lifecycle:
| Hook | Event | What It Does |
|---|---|---|
| session-start.sh | SessionStart | Detects project, injects memory pointers, flags pending items |
| pre-compact.sh | PreCompact | Checkpoints session state before context shrinks |
| stop-memory.sh | Stop | Tracks message count, nudges for sync on long sessions |
The SessionStart hook is the important one. It fires before Claude's first response, detects which project you're in, checks for any orphaned checkpoints from prior sessions, and injects a context pointer telling the agent exactly where to find its memory. No reliance on the model reading CLAUDE.md and deciding to act on it.
The Slug Problem
The first thing any memory system needs to answer: which project is this?
If you're jumping between five repos in a day, the agent needs to know that the session in ~/projects/wafr-discovery/ should write memory to a different place than the session in ~/projects/az-landingzone-main/. And it needs to figure this out automatically, because I'm not typing (or remembering for that matter) a project identifier at the start of every session like it's 2004 and I'm logging into a timesheet.

The solution is a project slug, derived automatically through a priority chain:
.claude/CLAUDE.mdmetadata (if you've run/memory-init)- Git remote origin URL
package.jsonorpyproject.tomlproject name- Directory name as fallback
The slug gets stored as an HTML comment in the project's CLAUDE.md:
<!-- memory:project-slug=wafr-discovery -->
<!-- memory:area=AWS -->
<!-- memory:vault-path=1 Projects/Personal/wafr-discovery -->
Invisible in rendered markdown. Trivially greppable by shell scripts. The hooks read it on every session start and route everything accordingly.
Three Tiers of Memory
The original blog post had one tier: the Obsidian vault. That's fine for a demo. For daily use across multiple projects, you need stratification.
┌─────────────────────────────────────────────┐
│ TIER 3 — COLD (Obsidian Vault) │
│ Cross-project knowledge, human-curated │
│ Permanent, searchable, multi-agent │
├─────────────────────────────────────────────┤
│ TIER 2 — WARM (Project Auto-Memory) │
│ Build commands, patterns, debug insights │
│ Per-project, persists across sessions │
├─────────────────────────────────────────────┤
│ TIER 1 — HOT (Session) │
│ Current context window, in-flight state │
│ Ephemeral — gone on /clear │
└─────────────────────────────────────────────┘
Tier 1 is what's in the context window right now. It's the conversation. It dies when the session ends or compacts, which is the whole problem we're solving.
Tier 2 is Claude Code's built-in auto-memory. It writes notes to ~/.claude/projects/<project>/memory/ automatically - build commands, debugging patterns, code conventions. Useful, but machine-local and project-scoped. It doesn't cross projects, doesn't sync between machines, and can't be read by Claude.ai or any other agent.
Tier 3 is the Obsidian vault. Cross-project, cross-agent, human-curated. This is where decisions live permanently. When I decide that the aws-landing-zone should use Terraform state locking with DynamoDB, that decision belongs here - not in a machine-local auto-memory file that vanishes when I reformat my laptop.
Data flows upward. Auto-memory accumulates naturally at Tier 2. Significant items get promoted to Tier 3 via a /memory-sync command. The vault remains the single source of truth.
The Bridge Problem
Here's the awkward bit. Hooks are shell scripts. They can read files, write files, parse JSON. What they can't do is call MCP-Obsidian. The vault operations need to happen inside Claude's turn, where MCP is available.
So the hooks don't write directly to Obsidian. They write to a local staging directory:
~/.claude/memory-staging/
├── wafr-discovery/
│ ├── .session-meta # Message count, timestamps
│ └── checkpoint-2026-03-17.md # Pre-compaction state
└── cairn/
└── .session-meta
The SessionStart hook checks this directory. If there are pending checkpoints from a prior session that never got synced, it injects context telling Claude to process them. Claude reads the staging files, writes the content to Obsidian via MCP, and cleans up.
Two-stage process. Hooks handle the deterministic parts (detection, tracking, checkpointing). Claude handles the MCP parts (reading and writing the vault). Neither depends on the other being reliable.
/memory-init: Setup That Doesn't Suck
Nobody wants to manually configure memory metadata for every project. So I built /memory-init - a slash command that auto-detects everything and asks for confirmation. Like Claude Code's /init for CLAUDE.md but on steroids.
Run it once in a project directory. It reads git remote, checks the package manifest, scans the codebase for tech stack indicators, and presents:
## Memory Init — Detected Configuration
| Field | Value | Source |
|-------|-------|--------|
| Project slug | wafr-discovery | git remote |
| Area | AWS | tech stack inference |
| Tech stack | Python 3.12, Boto3 | detected files |
Confirm or adjust?
After confirmation, it creates the project CLAUDE.md with embedded metadata, sets up the Obsidian session folder, updates the project index, and loads any prior context. Idempotent - safe to re-run if your stack changes.
The first time you open a project that hasn't been initialised, the SessionStart hook tells Claude to suggest running it. So you can't forget.
Multi-Agent Resilience
Remember that Reddit post that started all this? The one I should have closed and gone back to studying? The poster had built a multi-agent orchestrator where if Claude went down, Codex would pick up with the same context. Their approach involved a VPS running an Express server with SQLite FTS5 behind it - a proper knowledge base server.
The idea is solid. The implementation is more than I need. The Obsidian vault already syncs across machines via Remotesave plugin and git. Any agent that can read the vault - whether through MCP-Obsidian, direct file access, skill, or even a basic script that greps YAML frontmatter - can access the same session history, the same learnings, the same decisions.
I'm not running a VPS with an Express server for this. The vault is the distributed shared brain. The structured frontmatter is the API contract. If Codex or Gemini needs to read my prior sessions, it can parse the same markdown files Claude wrote.
No extra infrastructure. No extra cost. No extra attack surface.
What Changed vs Part 1
| Part 1 | Part 2 |
|---|---|
| Single vault folder | Three-tier storage (session → project → vault) |
| CLAUDE.md instructions (advisory) | Hooks + CLAUDE.md (deterministic + advisory) |
| Flat session notes | Project-scoped sessions via slug routing |
| Manual context loading | SessionStart hook auto-injects pointers |
| Manual session logging | PreCompact hook auto-checkpoints, /memory-sync consolidates |
| Single agent | Multi-agent via vault as shared brain |
| "Remember to check memory" | Hooks remember so the model doesn't have to |
The Honest Assessment
Is this over-engineered compared to Part 1? A bit. But Part 1's simplicity came with a failure mode: the agent not using the system. Hooks eliminate that failure mode without adding infrastructure. It's still just shell scripts, markdown files, and a vault I already maintain.
The overhead is minimal - about 350ms total across all three hooks per session. The staging directory is a few kilobytes. The slash commands are markdown files in ~/.claude/commands/. The repo also includes an updated agent-memory skill with session templates and frontmatter patterns - the same skill from Part 1, but now aware of the project-scoped structure.
The real cost is the setup time for /memory-init across your active projects. Once that's done, you get deterministic memory behaviour without thinking about it.
Is it perfect? No. Does it reliably survive context compaction, session restarts, an overly bloated CLAUDE.md file and the odd Claude outage? Yes.
Sometimes the boring solution needs a boring enforcement mechanism. Hooks are bash scripts. They're about as glamorous as a cron job. But they run. Every time.
I'll run this system across my projects for the next few weeks and likley iterate on the archtecture and post again on my findings. Claude is defaulting opus to 1M context window now so the before compaction hook isn't triggering on smaller sessions but thats for another post.
Right. KMS key policies. I'm going. I'm actually going this time.
This is Part 2 of a series on AI agent memory. Part 1 covered the basic vault setup. Part 3 will probably be about the semantic search layer I keep threatening to build for Shiggles (SQLite FTS5, maybe sqlite-vec). Or it'll be about something else entirely once I've bedded into this solution a bit more. The Archive folder doesn't judge.
Everything described here is available at agent-memory-cc-v2 — hooks, slash commands, skills, and the architecture docs. If your approach is better, I'd genuinely like to hear about it. Just don't send it to me before the SCS-C03 resit.
✦ Key Takeaways
- CLAUDE.md instructions are advisory and get ignored ~40% of the time — hooks provide deterministic enforcement
- Three tiers of memory (session, project auto-memory, Obsidian vault) stratify ephemeral and permanent knowledge
- Shell script hooks at SessionStart, PreCompact, and Stop cover the full agent lifecycle without extra infrastructure
- A staging directory bridges the gap between hooks (which can't call MCP) and Claude (which can)
- The Obsidian vault acts as a distributed shared brain accessible by any agent that can read markdown