Coding agents are good at local edits and bad at remembering why those edits are dangerous. They re-read the same files every session, rediscover the same constraints, and produce clean, plausible changes that quietly break a rule nobody wrote down. The knowledge that would have stopped them — the invariants, naming contracts, migration scars, cross-repo edges, and “this looks safe but is not” facts — lives in people’s heads, old PRs, and team habits, exactly where an agent never looks.
Agents Remember turns that knowledge into durable, git-verified infrastructure: Markdown notes that live beside the code, prove which commit they were last checked against, and are trusted only when Git confirms they still match. One idea carries the whole system — a source file’s memory lives at a deterministic mirror path — and almost everything else on this page is what that single idea makes cheap, verifiable, and safe.
This page walks through what that buys:
system/ folderA top-level instruction file helps, but it does not reappear when the agent is deep inside a file deciding what to change. So the rule that mattered is out of context at the exact moment it is violated.
The common alternative — pour the codebase into a vector store and retrieve “relevant” chunks — trades one problem for a worse one. An embedding index is a second representation of the code that has no built-in relationship to the commit it was built from. It drifts the moment the code changes, it answers with ranked guesses rather than addresses, and nothing tells the agent when a hit is stale. It is confidently wrong, confidently fast, and it becomes a parallel source of truth that no one reviews.
Agents Remember takes the opposite stance. The durable memory of record is plain Markdown under Git: versioned, diffable, reviewable, and bound to the code by construction. Semantic and graph search still exist — but as opt-in accelerators for finding a note, never as the thing you trust. What you trust is the note, and whether Git says it still matches the code.
Durable memory is stored as onboarding units — Markdown notes derived from source paths. In the default repo-local mode, a source file maps to a mirrored note:
src/foo/bar.ts
ar-memory/onboarding/src/foo/bar.ts.md
Each note explains what the code cannot make obvious on its own. A short metadata header pins the note to the exact source commit it was last verified against; the body holds the file’s purpose, the logic worth knowing, the local conventions it follows, the invariants and boundaries it must keep — including what it must not do — and durable to-dos that belong to the file rather than to any one task. Claims are kept honest by citation: pointers to domain docs, to other files in the repo, and to cross-repo edges each carry exact line ranges, and an append-only update history records why the knowledge changed over time. What a note leaves out matters just as much — it does not restate the implementation, duplicate type signatures, or hold task-specific planning. It captures the judgment, not the code.
Because the path is the address, three things that are normally search problems become arithmetic:
This also changes how memory scales. The cost of reading memory is proportional to the files in scope for a task, not to the size of the repository. A large repo can accumulate a great deal of memory; the agent still only reads the repo overview, the relevant route-local overview if one exists, and the notes for the files it is touching.
Durable memory comes in three shapes, matched to scope. File-level notes are
the one-to-one core. Route overviews (overview.md) orient a whole area — a
package, a module, or the repository itself — describing the shape of a region
rather than a single file. And entity catalogs (entities.md) track the
recurring entities and naming contracts that span many files at once. Overviews
and catalogs add structure on top; they never replace the file-specific facts
beneath them.
None of this needs Docker, a server process, or any provider. By-path memory is always on. Everything that follows layers on top of it.
A note is only useful if you know whether it is still true. Two mechanics make that knowable without asking a model to guess.
Drift is a Git fact. Each file-level note records the source commit it was
last verified against. Checking it is a plain git diff for that one path —
comparing the verification commit against the current committed state — fully
reproducible, no LLM judgment. The check is honest in the ways that matter: local
staged or unstaged edits flip a note to drifted before a commit lands, and if
the verification commit was rebased away, the note auto-invalidates rather than
claiming false confidence. Notes usually sit beside the source as a mirrored
file, but can also live inline in the source itself — and an inline note is
verified against a digest taken with the note removed, so editing the note
never registers as drift; only code changes do.
Before planning against a note, the agent classifies its trust level:
up to date — no source change since verificationdrifted — source changed; the note needs reviewmissing — the source file exists but has no note yetmissing verification — not enough metadata to trustorphaned — the source file no longer existsunsupported — the storage or file shape cannot be verified safelyDrifted memory can still be read directionally when that trust level is explicit — old context is often useful — but it never silently becomes current truth. Finding is not trusting.
Broader units verify by scope, not by a single file. A file note can be
pinned to one source file’s commit, but an overview or an entity catalog cannot —
each describes more than one file by design, so a single-file check would be
meaningless. Each is held honest in the way its scope demands. An overview is
checked against its whole route: a directory-scoped git diff flags it the moment
anything under that area is added, removed, moved, or changed, and a
script-regenerated index beside it tracks what the area now covers and how to
route agents to it — structure the model never hand-maintains. An entity catalog
is checked by a deterministic fingerprint over the set of files that define each
entity, so editing any one of them flags that entity for review. Same
Git-anchored honesty, applied at the right altitude.
The ledger keeps memory and code in sync. Teams that want memory in a
separate repository can use external memory. The risk there is obvious: two Git
repos can drift apart. A ledger (memory.md) closes that gap by mapping each
verified code commit to the memory commit that describes it, newest
first:
| Code commit | Memory commit |
| ----------- | ------------- |
| a1b2c3… | f9e8d7… |
| 9f8e7d… | 4c5b6a… |
This is a lookup table with real consequences. It lets a branch carry the memory version that matches its code state, it lets you restore memory to the exact version that was synchronized with an earlier code commit — invaluable for recovering from a bad state — and it lets the system refuse to hand over memory that does not match the code in front of you, rather than guess. Repo-local internal memory is the simpler default and needs no ledger; the ledger is what gives external memory its operational power.
By-path is the core, but an agent often needs to find the right file before it can read its note. Agents Remember offers three retrieval substrates, matched to what the agent already knows, and a router that picks among them.
The router chooses the shape of the missing context first, then the cheapest substrate that fits:
Semantics when the concept is known but the structure or location is not.Relationship when an anchor is known but its impact paths are not.Intent when an anchor is known but its hidden contracts, invariants, or
branch-valid truths are not — answered by onboarding plus bounded source
confirmation.These compose. A triage task can start from an anchor in a ticket, use the relationship graph to map nearby structure, then switch to intent to learn the code truths that make a change safe. And the whole stack degrades gracefully: the semantic and graph providers are opt-in and Dockerized, so with neither running the router still works on by-path and intent alone.
The discipline that holds it together: provider output is candidate routing evidence, not proof. Source files, verified onboarding, drift checks, and approved promotion remain the truth controls. The retrieval layer can never quietly become a second, drifting source of truth — because it is not the source of truth at all.
When work needs isolation, a task can run in its own Git worktree off a private branch, so the main branch is never touched until the change is reviewed and ready. With external memory this becomes a dual worktree: one worktree off the code repo and one off the memory repo. Code and memory changes happen on private branches together, and the memory main branch stays uncorrupted while a feature or refactor is still in flight.
The ledger makes this safe rather than chaotic:
This is what “memory as a first-class citizen” means in practice: memory is protected by the same branch, review, fast-forward, and rollback machinery as code, because it is code — versioned Markdown moving through Git.
Per-task isolation is only worth using if it is cheap. The expensive part of a fresh worktree is normally the providers: re-crawling a code graph and re-embedding a semantic index can take minutes. Agents Remember avoids that almost entirely by cloning the parent’s already-built indexes into the worktree instead of rebuilding them. Each worktree still gets its own isolated, namespaced provider stack — it just starts pre-warmed.
The clone is a real copy of the index, path-corrected for its new home, not a shortcut that re-reads the source:
A subtle detail makes the clone actually stick. git checkout stamps every file
with a fresh modification time, which would make the semantic watcher think the
whole tree changed and re-embed it — undoing the clone. So the worktree’s file
timestamps are synced back to their source values, and seeding happens before
any watcher starts. A full re-index exists only as an explicit fallback when a
clone genuinely cannot be reused.
The practical result: spin up an isolated worktree and its code-graph and semantic indexes are ready in seconds, not minutes. Because isolation is that cheap, it stops being a special occasion — providers can be created alongside a worktree and thrown away with it.
Memory is only as good as the discipline that keeps it honest, and that discipline is the second half of the product. Every session runs through one lifecycle:
request → trust → reframe & research → decide → build → close
Before any code changes, the agent resolves context, checks drift and provider state, reframes the request into the best version of the task, gathers evidence, and waits for the developer to agree on that framing. Only then does it decide how to build:
Two principles run through the whole arc. The first is evidence-first reasoning: for anything where correctness depends on interpretation, the agent makes its evidence model visible — what it read, what it searched, and what would prove the plan correct — rather than asserting conclusions. The second is that approval is not one switch but several. Implementation approval is not commit approval; commit, push, PR, merge, cleanup, and memory carryover are each their own explicit gate. The blast radius of any single “looks good” stays bounded to one action.
Crucially, onboarding records approved current state, not plans. Task notes, proposals, and in-progress thinking stay in task files; durable memory is updated only after a change is approved and lands, and is then re-verified against the new commit. That separation is what lets an agent reason from onboarding without wondering whether it describes reality or merely an intention.
The split of responsibility is the design thesis. The model does the judgment — framing the problem, surfacing assumptions, comparing options, asking for the right approvals. The deterministic work is offloaded to an MCP server so the same repo state always yields the same result, and so the agent cannot improvise its way around a gate.
That server is a single process exposing a focused set of typed tools, and the boundary between them is the trust boundary:
Every tool result is shape-checked against a declared model and carries an accurate token count, so the agent gets predictable, budget-aware payloads it does not have to parse defensively. A one-call context packet replaces a dozen exploratory probes at the start of a task. The point of all of this is auditability: context resolution, drift classification, commit sequencing, and Git surgery live in reviewable code, not in a prompt — which is what makes the memory layer reproducible rather than “an LLM with good intentions.”
system/ folderEvery memory repo ships a committed system/ folder — the repository’s own
operating manual, versioned right next to its memory. It holds the storage and
path rules that decide which files get notes, a registry of where the project’s
documentation lives, the exact lint / type-check / test / build commands for the
repo, its coding guidelines, its branch-and-landing (git-workflow) flow, and the
template for how a change should be reported.
These files are primitives, not skills. The generic skills and the session
lifecycle read them at the moment each one matters: the build step runs the checks
from tools.md before every commit, the landing step follows git-workflow.md
before any push, and a finished change is reported in the shape the repo’s
template defines. A repository tailors how an agent works on it — its commands,
its branch policy, its code-shape budgets, its report format — by editing plain
Markdown, never by writing a custom skill for a specific harness.
Two properties make that pay off. First, it travels with the memory: because
system/ is committed in the memory repo, every contributor who resolves that
memory inherits the same instructions automatically — their agent picks up the
project’s conventions with no per-person configuration. Second, it is portable
across harnesses: the same skills-and-instructions contract installs on Claude
Code, Codex, Cursor, and the rest, so the same system/ folder yields the same
behavior whichever coding agent a contributor brings. Workspace-wide defaults can
live in the coordinator’s own system/ folder; where the two overlap, the
repository’s own rules win.
Agents Remember does not ask every coding harness to pretend it has the same plugin surface. The source repo ships starter packages for Claude Code, Codex, Cursor, Antigravity, VS Code with GitHub Copilot, Hermes, Pi.dev, and OpenClaw. Each package carries the files that harness actually loads: MCP registration templates, Agents Remember authority settings, skill folders, startup hooks, rules, or always-on instruction files.
The first-run path is deliberately short. Copy the starter package, replace the
workspace and repository placeholders, restart the harness once, then invoke
c-13-install-and-onboard. That skill verifies the MCP server, runs or checks
runtime_install(), initializes or adopts memory, bootstraps onboarding when
needed, and starts provider indexing when providers are enabled. The core
by-path memory path still works without Docker or providers; the starter package
just makes sure the harness can actually see the server and skills before setup
moves on.
This is also why the setup is portable. A contributor can use Claude Code while
another uses Codex or Cursor, and both resolve the same committed memory,
system/ rules, and MCP tool surface. The harness changes how the first-action
directive is loaded; it does not change what the repository asks the agent to do.
The main loop is by-path memory, drift checks, and approval-gated updates. Real repositories also need the edge-case machinery that keeps that loop from becoming fragile:
The whole system hangs off one invariant and compounds from there:
git diff.system/ folder carries each repo’s own rules with its memory, so the
same generic skills behave the way that project wants — across harnesses.The memory layer is intentionally small — Markdown, Git metadata, and deterministic paths. The workflow layer around it can be heavier, but it exists for one reason: to protect that small, trustworthy memory from stale or speculative content. That is the bet — that durable, verifiable, by-path memory beside the code, kept honest by Git, is what lets a coding agent finally remember what your team already knows.