I wrote this article with the help of my Recalletta crew: I gave them instructions (style, size, what to verify), they drafted sections, and then I proofread and merged it.

Five years ago I'd write the article and the machine would proofread it for grammar. Now it’s closer to the reverse: the machines draft, and I proofread for truth, clarity, and taste. What a time to be alive!

Recalletta.ai: Opinionated Management System for Crews of Coding Agents

February 2026

Every feature described here exists in the codebase today, and every CLI block is real output sanitized for security and ethical reasons.

The 30-second Mental Model

Recalletta is a handful of small pieces that snap together. The ones you notice first:

It captures sessions (and makes them searchable) so your working history is available later.

It injects context at SessionStart, so you don't begin every session cold.

It gives you a Knowledge Base that is designed to be fed into the agent, not forgotten in a wiki.

It provides a Crew of agents with different personas and roles, each with his own personal memories.

And if you want to go further, the Attractor lets you coordinate multiple agents and pipelines on multiple machines without losing the paper trail.

If you want a single sentence: Recalletta turns “chat with a coding agent” into “work with a crew who remember”.

If you're reading this and thinking “another memory system for Claude”, then please keep reading.

The Limited Context Problem

At SeriousBit we use AI coding assistants daily: Claude Code, Codex, Gemini. They’re fine. But they all have the same problem: every session starts from zero. I explain the auth system in detail, explain it again after an hour, explain it again after 15 minutes. The model forgets everything between sessions. Shared understanding has to be rebuilt every time.

People work around this. They maintain huge CLAUDE.md and design docs. They keep sessions alive as long as possible, dreading the moment the context window fills up. Workarounds we are forced into for a tool that has no memory. We've transformed from developers into full-time technical writers. I personally don't like this at all.

Recalletta has been in daily use on our own projects since October 2025. It’s a Rust CLI, a .NET API, and a web dashboard. No vector databases, no embeddings, no “AI memory” marketing. It captures sessions, organizes knowledge, and injects context back into the agent before you type.

What it does when you start a session

Save every session. Use past sessions to inform future ones. That’s the whole idea, and iterate.

When I start a new Claude Code session in a project that uses Recalletta, something happens before I type my first message. The system looks at my working directory and git metadata -- branch, recent commits, status; and persona loaded (about this later). It may also include my first prompt if the hook payload provides one. It searches my past sessions and my knowledge base. Then it injects relevant context directly into the session.

The effect is immediate. I open Claude Code in the project, type "Data, fix the flaky auth test," and the agent already has context about the auth module refactor from last week. It knows these things because I discussed them in previous sessions that were automatically saved, and the system found them.

This is full-text search over your actual working history (FTS5). Most memory systems store everything and retrieve approximately. Recalletta stores everything but retrieves exactly: it searches your literal conversations, not a high-dimensional approximation of what you might have said.

NOTE: Here and below I say Claude Code because my crew agent leader starts mostly on Claude, but everything applies to Codex and Gemini too, with very small differences. All 3 of them work fine with Recalletta.

The Continuity Problem

This is a direct result of the limited context problem: there is a specific moment that every heavy user of AI coding assistants dreads. I've been working on a complex refactoring for an hour. The conversation is long, the agent understands the full picture, at last -- all the constraints, edge cases. And then the fucking context window fills up.

I have two choices, both extremely bad. I can /compact, which summarizes the conversation and makes my agent dumber than when clean. Or I can start a new session and spend ten-twenty minutes re-establishing context and decisions that the previous session had fully internalized.

Recalletta makes this a non-problem. When the context window fills up, I start a new session and type "recall last session and let's continue." The SessionStart injection usually includes summaries of my most recent sessions in that same context (plus other relevant sessions and important KB entries). When I need more detail, the agent can pull full transcripts on demand using recalletta <session_id>.

This sounds like a small convenience but it isn't - it changes what I'm willing to attempt with an AI assistant.

Without continuity, I end up limiting myself to tasks that fit in one session. I avoid multi-day refactoring projects because I know the cost of crossing that context boundary. Every time I start a new session, I pay a tax: re-explain the project, re-establish constraints, get the agent back up to speed and then fill the dreaded context window again if I only lose just a little bit of control.

With continuity, I can think in projects instead of sessions. A three-day refactoring becomes practical. I work on it for an hour, start a new session, and the context follows me. The agent doesn't just remember what I did - it remembers why.

The Knowledge Base

Sessions capture what happened while the knowledge base captures what we know.

The distinction matters more than it might seem. Sessions are temporal — plentiful records of work done on a particular day. Knowledge base entries are curated - things you want every future session to know. Architecture decisions, like the answer to "why did we use SQLite here instead of Postgres?" The specific gotcha about the authentication module that has bitten three different developers.

In Recalletta, the knowledge base is hierarchical: projects contain compartments, which contain entries, which have version history. I mark certain entries as important, and they are automatically included in the injected context (of the starting agent) for that project when project detection (or pinning) succeeds. No searches nor retrievals on demand, but simple deterministic forced injections, the agent no longer decides what is useful for him/it and I no longer need to tweak that CLAUDE.md anymore.

This inverts the usual relationship between documentation and tooling. Normally, documentation sits in a wiki that nobody reads. You write it because someone told you to, and then it rots. Here, documentation lives inside the tool that reads it on session start (when project detection/pinning succeeds). Writing a knowledge base entry is not busywork - it directly improves every future AI interaction on your project. I have never seen a system where the incentive to document is so cleanly aligned with the reward for documenting.

And because entries are versioned, the knowledge base evolves with your project. When you change your authentication approach, you update the entry. The next session gets the current knowledge, not something stale from six months ago.

Crews: Agents That Work Together

If one AI agent with memory is useful, what about a team of them? Recalletta has a system called Crews that lets you orchestrate multiple AI agents across different clients - Claude Code, OpenAI Codex, Google Gemini - working on the same project simultaneously and even on multiple machines with inter-agent communication.

Each agent runs in its own tmux session. They communicate through a message system - short messages for quick coordination, longer reports written to a shared mail-board for anything substantial. One agent can be reviewing code while another writes tests while a third updates documentation. They coordinate through the same kind of asynchronous communication that human teams use, which turns out to be a surprisingly good fit.

But the critical thing - and this is what makes it more than just "run five terminals at once" - is that they share memory. Every agent's session is saved. Every agent (under your account/API key and project context) can use the same knowledge base. When you summon a new agent to help with a task, it does not start from zero. It starts with the project's accumulated knowledge and his own memory loaded.

This is not the same as running multiple AI sessions independently. Independent sessions are parallel but disconnected - each one rebuilding context from scratch, unaware of what the others are doing. A crew is parallel and connected by both the communication and memory systems. The agents know about each other's work because the work is being captured and shared through the same memory system that makes individual sessions better.

I should be honest: multi-agent coordination is complicated management work: The communication overhead, agents sometimes misunderstand each other and especially me when I'm too lazy to be explicit. But the fundamental architecture - shared persistent memory with asynchronous communication - is the right foundation. This article you are reading was written by a crew of AI agents coordinating through this exact system, have I mentioned it?

The observability story matters here. Today, you can attach to any agent's tmux session and see exactly what it is doing - prompts, tool calls and output; or you can just screenshot it for a quick view (and even better - the lead agent can screen other agents to make sure they are up and operational, my Data does it regularly). The .crew/history.md file and the mail-board reports give you a human-readable paper trail. Attractor writes a status.json for each pipeline stage with status, notes, and failure reasons. This is not telemetry in the structured-logging sense - there are no stable correlation IDs across a run, no captured stdout/stderr per stage, no histograms for summon failures or retries. Those are coming (JSONL event logs per run, a diagnostics subcommand) or maybe not, depends on future feeback. But the tmux-attach model is already more useful than it sounds: seeing the exact prompt and tool output in real time beats reading logs after the fact.

Attractor is our implementation of a pipeline system that lets you define workflows as directed graphs invented by those Dark Factory guys. You describe the stages of a process - analyze, design, implement in parallel, review - and the system orchestrates agents through those stages automatically, passing context between them. It uses Graphviz DOT files, which is a clever choice because developers (read - agents) already know the format. Attractor is newer and still maturing, but the direction seems right.

What You Actually See

I install the CLI. I enable hooks. Then I keep working exactly as I was before. Sessions are saved automatically when I close Claude Code/Gemini CLI/Codex. When I open a new session, relevant context appears.

Over time, I build a knowledge base. I ask the crew to write entries when I make important decisions. I mark them as important when they should always be present and injected. The knowledge base becomes the source of truth for the project’s conventions and architecture - and unlike a wiki, it actually gets read, because a machine reads it on every session start.

When I need to coordinate multiple agents, I summon Data and ask him to summon and manage further. They arrive with context. I can watch any agent’s work in real time by attaching to its tmux session, and I can intervene if it goes sideways. This isn't a system that takes your project away and returns it when done. I'm always in control. Well, almost.

Why This Matters

Bigger context windows help, but the real limit is not workspace size - it’s the lack of persistence. A context window is working memory. It resets every time. What I actually want is a filing cabinet: context that accumulates over weeks and months, and shows up without me re-typing it.

Recalletta is how we’re doing that in practice. It’s not perfect - search can be smarter, multi-agent coordination is still maturing, and the KB only works if you actually update it - but it has been in daily use on real projects a few months now, and it changes what’s practical to hand to an AI.

Hands-on (CLI transcript)

I asked Willison (an agent) to use Recalletta like a skeptical user: run the commands, capture output, and keep me honest about what’s actually there.

(Everything below is verbatim output from his runs.)

Installing and First Contact

Setup is one command: recalletta init. That installs hooks into Claude, Codex, Gemini - that fire on session start and on session end.

Hook status:
  Enabled: yes
  Claude: installed
  Codex: installed
  Gemini: installed

That's it, the hooks handle everything from here.

Session Search

Every Claude Code session gets saved automatically when you close it. On my personal production account for Recalletta, that's 1,764 sessions going back to October 2025.

$ recalletta stats
{
  "total_sessions": 1764,
  "total_tags": 2383,
  "oldest_session": "2025-10-14T08:38:24Z",
  "newest_session": "2026-02-09T07:35:39Z"
}

Search uses FTS5 full-text search. You type a query, you get ranked results:

$ recalletta "session hooks" -n 3
[bc2b1b13] 2025-12-09T16:14 Refactoring Recalletta Session Hooks (rank: -6.37)
[claude-b] 2025-12-09T16:07 Refactoring of Hook System to Support Both Cases (rank: -1.37)
[3dd17a7c] 2025-11-19T22:23 Recalletta Hook Configuration for macOS (rank: -1.37)

You can filter by date:

$ recalletta --when 7d -n 5
[82bbfb7c] 2026-02-09T07:35 Fix summon command to respect persona default client from KB
[6e4afe9c] 2026-02-09T06:58 Atlas Crew Assignment Briefing - Cantrill
[4ff40fa8] 2026-02-09T06:57 Cantrill Crew Assignment - Atlas Project Briefing
[e4c1c8a3] 2026-02-09T06:51 Atlas Crew: Inbox Command & Race Condition Fix
[fab80989] 2026-02-09T06:51 Crew Inbox Feature & Race Condition Fix

The --when flag takes shortcuts (today, yesterday, 7d, 2w, 1m), exact dates, and ranges. To load any session, pass the ID prefix:

$ recalletta 82bbfb7c

That gives you the full transcript - the entire conversation, tool calls included.

Context Injection

This is the part that changed how we work. When you start a new code session, the SessionStart hook fires before you type anything. It looks at your working directory and git metadata (branch, recent commits). If the hook payload includes your first prompt, it uses that too - otherwise it falls back to a generic string.

The hook calls our API, which searches past sessions and the knowledge base. What comes back is a formatted list of relevant session summaries plus any KB entries marked as important. Not full transcripts - summaries. The agent can pull full transcripts later with recalletta <session_id> if it needs more.

Here's the actual flow:

Claude Code sends the session ID and working directory via stdin. First prompt is optional.
The hook calls initial_prompt() on the API with whatever prompt it has, plus local git context.
The API returns formatted summaries of relevant sessions plus important KB entries.
The hook writes this to stdout as additionalContext in Claude's hook output format.

Summary injection is on by default, controlled by the inject_summary config setting. The whole round-trip happens before you see your first response.

The practical effect: I open Claude Code in our payments repo, type "fix the flaky auth test," and the agent already has context about the auth module refactor from last week. It didn't search - the system already gave it the relevant history.

The /compact Problem

If you use Claude Code and your session gets long, the context window fills up, and you either /compact (losing detail) or start a new session (losing everything).

With Recalletta, I just start a new session (with /clear, this command calls all hooks) and say "recall last session and let's continue." The SessionStart hook injects summaries of my most recent sessions in that directory, plus relevant history and important KB entries. If the agent needs more detail, it pulls the full transcript on demand.

This works because every session is saved automatically on exit (with hooks enabled). The SessionEnd hook reads the transcript and uploads it in the background - a detached process, so Claude exits/continues immediately. You don't have to wait for the upload.

$ recalletta last
{
  "title": "Fix summon command to respect persona default client from KB",
  "body": "# Session 82bbfb7c...\n\n**Context:** /Users/sana/wrk/recalletta.ai/.crew/carmack..."
}

One technical detail that matters if you use other clients: for platforms without hooks, the CLI ships a monitor. It polls for agent processes, opens a time window while they are running, scans the agent transcript directory for files modified during that window, and uploads them in the background. It is best-effort and depends on process markers and transcript file discovery. Sometimes works.

The Knowledge Base

Sessions are automatic memory. The KB is deliberate memory.

$ recalletta kb tree
=== Knowledge Base (40 entries) ===

recalletta 70d90af5 Recalletta
├── agents Agents (1 entries)
│   └── self-awareness - Self-Aware Agents
├── meta Meta (5 entries)
│   ├── philosophy - Collaboration Philosophy
│   ├── memory-management - Memory Management Architecture
│   └── ...
├── gotchas Gotchas (19 entries)
│   ├── commits - Git Commit Style
│   ├── api-contract-mismatch - API Contract Mismatches Fail Silently
│   └── ...
├── crew Crew (9 entries)
│   ├── axiom - Axiom (Architect)
│   ├── data - Commander Data (Team Lead)
│   └── ...
└── onboarding Onboarding (1 entries)
    └── start - Quick Start for Recalletta

The hierarchy is Projects > Compartments > Entries > Versions. You read entries by path:

$ recalletta kb gotchas/commits
# Git Commit Style

Do NOT add 'Generated with Claude Code' or 'Co-Authored-By: Claude' footers
to commits. User has corrected this multiple times. Just write clean commit
messages focused on what changed and why.

That's a real entry. We got tired of Claude adding auto-generated footers to every commit. We wrote it down once, marked it important, and now every session starts knowing our preference. No one has to remember to paste it.

You create entries from the command line:

$ recalletta kb set gotchas/my-entry -t "Title" -b "Body content here"

Entries marked important (with -i) get auto-injected into every session's context for that project. The KB also supports search, version history, diffs, and rollbacks:

$ recalletta kb search "session"
Found 24 result(s) for 'session':
  recalletta/gotchas/fork-session-hooks - Fork sessions trigger session_end hooks
  recalletta/reference/sessions - Key Sessions
  ...

$ recalletta kb history gotchas/commits
$ recalletta kb diff gotchas/commits 1 2
$ recalletta kb rollback gotchas/commits 1

Project detection is automatic - Recalletta matches your working directory to a KB project via .recalletta.json pin files or git remote URLs. No manual project switching.

Implementation notes (pinning, ignored paths, important entries)

Pinning: `.recalletta.json`

Project detection uses a simple pin file:

recalletta pin <slug|id> writes .recalletta.json with project_id, slug, and display name. On SessionStart and SessionEnd, the CLI adds a kb_context object (project slug/id) when present.

Safety: ignored paths

Recalletta refuses to do context detection or KB calls from certain paths (for example node_modules, .git, vendor).

“Important” entries and compartments

Entries (and compartments) can be marked important in the KB. The intent is:

Important KB items are eligible for auto-injection into agent context. The API can load a project’s KB tree during /search/initial-prompt and include those items.

Note: important KB entries are auto-injected by the server when project detection works (scoped to the detected/pinned project).

Attractor: Pipelines as DOT Graphs

Attractor is the newest piece. It lets you define multi-agent workflows as Graphviz DOT files with orchestration attributes. Our implementation was inspired by prior DOT-oriented workflow tools (including strongdm's attractor project), but Recalletta's CLI and runtime are its own. I use it for one thing: deterministic orchestration with visible state. Validate before running:

$ recalletta attractor validate crew_pipeline.dot
WARNING: goal_gate_has_retry: Node with goal_gate=true has no retry_target
  (node: test) [fix: Add retry_target attribute]
1 warning(s), no errors

Then run it:

$ recalletta attractor run pipeline.dot --backend crew

For full syntax, node attributes, fan-out/fan-in rules, and edge conditions, read attractor.md.

Crew

The crew system manages a team of AI agents - each in its own tmux session, each with its own persona, communicating through recalletta crew message and a shared mail-board.

$ recalletta crew list
Crew: Atlas
  data (claude) @Ruslans-MacBook-Pro.local registered
  graham (claude) @Ruslans-MacBook-Pro.local registered
  gregg (codex) @Ruslans-MacBook-Pro.local registered
  drasner (codex) @Ruslans-MacBook-Pro.local registered
  willison (claude) @Ruslans-MacBook-Pro.local registered

That's the crew that wrote this article. Five agents, three on Claude, two on Codex, same machine. Communication is async:

$ recalletta crew message drasner,graham "Hey, ready to coordinate sections"
Delivered.

A quick note on debuggability: the killer feature here isn't a log stream. It's the fact that every agent lives in a tmux session you can attach to at 3am and see exactly what it saw, what it ran, and where it got stuck. Between that, the crew paper trail (.crew/history.md + mail-board reports), and Attractor stage artifacts (status.json with status, notes, failure_reason), you can usually recover without guessing.

If you're expecting production-grade telemetry, it isn't that yet. The next obvious step is a structured event log per run (timings, correlation IDs, stdout/stderr capture) so "what is it doing" becomes a query, not a spelunk.

There are multiple personas in the roster. One persona optimizes hot paths on Claude. Another one writes systematic tests on Codex. Sherlock does root-cause investigation. Each has a default AI client and a personality that shapes how it approaches work.

$ recalletta crew summon carmack "Fix the performance regression in the auth module"

You can attach to any agent's tmux session in real time - see the exact prompts, tool calls, and output. If something goes sideways, you intervene directly. You're not locked out.

Local Code Search

Recalletta includes a local FTS5 code search index:

$ recalletta repo index
$ recalletta repo search "error" -l rust -n 10

BM25-ranked search results filtered by language and path. Runs locally - no network calls and no network latency.

The Dashboard (sessions and knowledge)

Recalletta includes a web UI intended for daily use, not just “admin pages”. I mostly live in the CLI, but the dashboard is where I go to browse, tag, and sanity-check. Also editing personas and tweaking the communication protocol.

Sessions

On the sessions page I can:

query search, including tags:tag1,tag2 parsing in the query string,
optional project filtering (including remembering the last filter via a cookie),
pagination,
tag editing via an edit modal,
deleting one or many sessions.

Note: project filtering is cookie-based and applies to plain listing. For text search and tags-based search, project scoping may not apply.

On an individual session page I can:

view the full session content,
edit body and summary,
set tags,
optionally associate a session to a project slug.

Knowledge

On the knowledge page I can:

search KB entries,
browse an expanded view that groups entries by project and compartment,
see knowledge totals and a sessions badge count.

Note: expanded KB view fetches up to 500 entries; the Knowledge project dropdown is client-side hide/show (convenience filter), not an access boundary.

Permissions are derived from the personal manifest; the API enforces auth, and the UI reflects that state.

What Is Missing (Honest Assessment)

Multi-machine crew coordination exists via a bridge server (recalletta crew bridge start and recalletta crew transmit), but it is early. P2P communication is in development, sometimes even works. But everything is strongly encrypted with TLS 1.3 and ChaCha20-Poly1305, we take security seriously (I'm a security specialist by education and a bit paranoid about it).

Codex support for automatic session capture uses background PID monitoring rather than hooks (improved with the last iterations). It works, but it is a workaround.

The web dashboard is useful, but the CLI is where the sharp edges are.

Attractor is new. The pipeline engine works, but it is under active development too.

Security and privacy caveats (read this)

Recalletta is local-first, not local-only: transcripts and metadata can be sent to the hosted API.

Local persistence: Crew and pipeline artifacts are plain files and SQLite on disk (for example .crew/history.md, .crew/mail-board/, .attractor/runs/, plus ~/.recalletta/*). Many of these live under your project root; treat them as sensitive local data, and keep them out of git if they may contain sensitive material (for example via .gitignore).
Provider transcripts still exist: Recalletta reads agent transcripts from client storage (for example under ~/.claude, ~/.codex, ~/.gemini). Using Recalletta does not erase those.
What is uploaded: on SessionEnd, Recalletta converts the transcript to markdown (compacting when needed) and uploads the session body. Compaction is size-reduction, not redaction: do not assume it removes secrets (but we are trying hard to). Anything in the chat can be uploaded, including secrets if they appear in the transcript.
What is sent at SessionStart: context injection calls the API and includes repo metadata such as cwd, git remote, branch, status summary, and recent commits, plus kb_context when pinned. This is not “upload your repo”, but it is not offline. (Also: do not embed credentials in git remote URLs.)
Opt out controls: place a .norecalletta marker in a directory tree to disable hooks/uploads for that cwd; a closer .yesrecalletta overrides. There is also a blocklist for common dependency directories (node_modules, vendor, .git, etc.).
What to avoid: credentials, API keys, private keys, customer data, regulated data, and anything you cannot legally store on a third-party service. Assume breach: if it would hurt in a pastebin, do not let it into transcripts.
Context injection risk: injected context is derived from what you previously stored (sessions + KB). If stored content contains malicious instructions, it can steer the agent before you type. Mitigation is procedural: restrict who can write shared KB, review important entries.
API key hygiene: your API key is stored locally (for example in ~/.recalletta/config.json). Treat it like a password.

About SeriousBit

We’re SeriousBit, a financial software and AI engineering company based in Chișinău, Moldova. We design, ship, and operate business-critical treasury, payments, and reporting systems (including for our own companies) for 20 years, and we built Recalletta as an internal tool because we needed it ourselves.

We also built NetBalancer netbalancer.com (the legacy network traffic manager), and we still maintain it.

Recalletta.ai: Opinionated Management System for Crews of Coding Agents

The 30-second Mental Model

The Limited Context Problem

What it does when you start a session

The Continuity Problem

The Knowledge Base

Crews: Agents That Work Together

What You Actually See

Why This Matters

Hands-on (CLI transcript)

Installing and First Contact

Session Search

Context Injection

The /compact Problem

The Knowledge Base

Implementation notes (pinning, ignored paths, important entries)

Pinning: .recalletta.json

Safety: ignored paths

“Important” entries and compartments

Attractor: Pipelines as DOT Graphs

Crew

Local Code Search

The Dashboard (sessions and knowledge)

Sessions

Knowledge

What Is Missing (Honest Assessment)

Security and privacy caveats (read this)

About SeriousBit

Pinning: `.recalletta.json`