Architecture Blueprint // April 2026

The AI-Native
Blueprint

On March 31, 2026, Anthropic accidentally exposed 512,000+ lines of Claude Code's TypeScript source via an npm source map. Within hours, thousands of developers had reverse-engineered the entire architecture. This is what they found — and how you should build based on it.

512K+ Lines of Code Exposed

~1,900 TypeScript Files

44 Feature Flags

~40 Built-in Tools

59.8 MB Source Map

Key Takeaway

Software has shifted from "write functions that do things" to "write tools an AI picks from." The AI is a commodity — the harness around it (tools, permissions, skills, memory) is the product. Claude Code's 512K lines proved this at production scale. This article distills every lesson into an actionable blueprint.

Timeline

How It Happened

A packaging bug led to the largest accidental source code exposure in the AI coding tool space.

March 11, 2026

Bun Runtime Bug Filed

Issue oven-sh/bun#28001 reported: Bun serves source maps in production mode despite docs stating otherwise. The ticking time bomb.

March 31, 2026 — Morning

v2.1.88 Published to npm

A new version of @anthropic-ai/claude-code was published with a 59.8 MB source map file (.map) accidentally bundled. The .map file contained full paths back to the original TypeScript source.

March 31, 2026 — Hours Later

Source Map Discovered

Developers noticed the unusually large .map file in the npm package. The source map pointed to a publicly accessible zip archive on Anthropic's Cloudflare R2 storage bucket containing all original TypeScript source files.

March 31, 2026 — Viral

Rapid Mirroring & Analysis

The full codebase was downloaded, mirrored to GitHub (41,500+ forks within days), and analyzed across Hacker News, Reddit, Twitter/X, and multiple blogs. A Python clone hit 50K stars in 2 hours — the fastest in GitHub history.

March 31, 2026 — Anthropic Response

Official Statement

"A Claude Code release included some internal source code. This was a release packaging issue caused by human error, not a security breach." No customer data or model weights were exposed. This was Anthropic's second security lapse in days (after the Mythos incident).

Community Response

Repositories & Clones

The leak spawned an ecosystem of mirrors, reimplementations, and analysis repos.

instructkr/claw-code

Python / Rust

Independent Python feature port, now being rewritten in Rust. Fastest repo in GitHub history to reach 50K stars (2 hours).

50,000+ stars

Kuberwastaken/claude-code

Rust

Clean-room Rust reimplementation using a two-phase process (spec then impl, mirroring Phoenix v. IBM legal precedent). Covers 40+ tools, BUDDY, autoDream, multi-agent orchestration.

Ahmad-progr/claude-leaked-files

Mirror

Primary mirrored snapshot of the original source, preserved for educational/research purposes.

41,500+ forks

ComeOnOliver/claude-code-analysis

Analysis

Comprehensive reverse-engineering analysis of internal architecture, modules, and design patterns.

nblintao/awesome-claude-code-postleak-insights

Curated List

Curated collection of high-signal analyses, design notes, and discussions. The definitive reading list.

shareAI-lab/learn-claude-code

Python

Educational "Bash is all you need" nano agent harness built from scratch to understand Claude Code's core loop.

Key Takeaway

Claude Code is a 512K-line TypeScript monolith built on Bun, React + Ink, with game-engine rendering techniques. Five core subsystems power the experience. The core loop is trivially simple — the 512K lines are production engineering: permissions, caching, error handling. Everything else is harness.

Foundation

Runtime & Rendering

Not your typical Node.js CLI. Claude Code uses game-engine-level optimizations for terminal rendering.

Bun

Bun Runtime (Not Node.js)

Core Runtime

Built on Bun instead of Node.js for significantly faster startup, native TypeScript execution, and a Zig-based HTTP stack that enables native client attestation (cch=00000 header).

Ink

React + Ink Terminal UI

Custom React Reconciler

Components → Virtual DOM → Yoga Layout Engine → Output Builder → Screen Buffer → ANSI. A full React rendering pipeline adapted for the terminal.

GPU

Game-Engine Rendering

~50x Performance Gain

Int32Array-backed ASCII char pools, bitmask-encoded style metadata, three interning pools (characters, styles, hyperlinks), double buffering with region blitting. Claims ~50x reduction in stringWidth calls via cursor-move optimization.

Core Systems

Five Subsystems

The entire Claude Code architecture decomposes into five major subsystems, each responsible for a critical dimension of the agent experience.

Tool System

~29,000 Lines • ~40 Built-in Tools

Plugin architecture with permission-gating per tool. Read-only tools run in parallel (max 10 concurrent), write tools execute serially. 18 tools are deferred to a ToolSearchTool for elastic discovery, keeping base prompts under 200K tokens. Each tool definition includes a JSON schema, permission requirements, and execution handler.

Query Engine

~46,000 Lines • API Orchestration

Handles LLM API calls, streaming, caching, and orchestration. Core loop: prefetch memory + skills → apply message compaction → call API with streaming → execute tools. Manages prompt cache stability with 14 tracked cache-break vectors.

Multi-Agent Orchestration

Swarm Architecture

Spawns sub-agents ("swarms") for parallelizable tasks. Each agent gets isolated context and specific tool permissions. Coordinator mode uses prompt-based orchestration with quality enforcement: "Do not rubber-stamp weak work."

IDE Bridge

VS Code + JetBrains

Bidirectional communication with VS Code and JetBrains IDEs via JWT-authenticated channels. Enables inline diffs, workspace awareness, and tool result rendering directly in the editor.

Persistent Memory

Self-Healing Architecture

File-based memory system with MEMORY.md index files, typed memory categories (user, feedback, project, reference), and automatic consolidation. The autoDream process runs memory merging while idle.

Context Window

Four-Stage Compression Pipeline

How Claude Code keeps conversations going without losing critical information.

Auto-Compaction

Summarizes older messages when context exceeds budget minus 13K tokens. The most common compression stage.

Micro-Compaction

Lighter truncation of tool results by age and size. Preserves recent results while compressing older, larger ones.

Snip Compaction

Feature-gated message truncation. More aggressive than micro-compaction but preserves message structure.

Context Collapse

Nuclear option. Staged compression, committed only on 413 API errors. Rewrites the entire conversation context.

Agent Loop

The Core Agent Loop

At its heart, Claude Code runs a deceptively simple loop. The complexity is in the subsystems, not the orchestration.

Simplified Core Loop

query engine • tool executor • context manager

async function agentLoop(userMessage) {

  // 1. Prefetch memory + skills

  const memory = await prefetchMemory();

  const skills = await prefetchSkills();

  // 2. Build context with cache stability

  const context = buildContext({

    systemPrompt,

    tools: sortAlphabetically(activeTools),

    memory,

    skills,

    messages: compactIfNeeded(history)

  });

  // 3. Stream LLM response

  const response = await streamAPI(context);

  // 4. Execute tool calls

  if (response.toolCalls) {

    const reads = response.toolCalls.filter(t => t.readOnly);

    const writes = response.toolCalls.filter(t => !t.readOnly);

    // Parallel reads, serial writes

    await Promise.all(reads.map(executeWithPermission));

    for (const w of writes) {

      await executeWithPermission(w);

    }

    // 5. Loop back with tool results

    return agentLoop(toolResults);

  }

  return response.text;

}

At its core, Claude Code proves that a powerful AI coding agent can be built from a surprisingly simple foundation: an LLM with bash access and a well-designed tool system. The 512K lines exist not because the core idea is complex, but because production-grade execution demands engineering depth. Simple core idea, complex execution.

Key Takeaway

You don't need 512K lines. You need 6 files, 3 directories, and ~200 lines of code to have a working AI-native skeleton. This tab is your day-one starter kit — the paradigm shift, the minimum viable architecture, and copy-paste code to get started.

Paradigm Shift

Three Things That Changed

The old playbook is obsolete. Here's what replaced it.

Before 2025

You Write Functions That Do Things

Your codebase IS the logic. Every feature = more code. Complexity grows linearly. A team of 10 builds a product for 10K users.

After 2025

You Write Tools the AI Picks From

Your codebase is a toolkit. The AI selects and sequences tools. New capability = new tool folder. A team of 1 builds a product for 100K users.

Two more paradigm shiftsUI + Architecture

Before

Static UI — Users Navigate

Screens, buttons, forms. 80% of effort goes into presentation.

After

Iterative Interface — Users Converse

The user states intent. The "UI" is the approval surface where the human decides to trust or redirect.

Before

Monolith or Microservices

Upfront architecture decisions that are expensive to change.

After

Skill-Based Plugin Architecture

Self-contained skill modules. Adding a feature = adding a folder. The core never changes.

The Insight

What Claude Code Proved

512K lines taught us the future of software isn't the model — it's the harness.

Insight 01

"Your Core Loop Should Fit in 50 Lines"

The core is trivially simple. The 512K lines are production engineering: permissions, caching, error handling. Everything else is harness.

Insight 02

The Harness Is the Product

Swap the model and it still works. The value is in the tool system, permission cascade, context management, and UX polish.

Insight 03

Evolvability Over Perfection

44 feature flags, deferred tool loading, plugin skills, self-consolidating memory. Designed to grow without rewrites.

Your First Hour

The Minimum Viable Architecture

Forget the 512K lines. Here's what YOUR project looks like when you git init today.

Day One — Start Here

6 files • ~200 lines • Everything else comes later

my-product/
  loop.ts — YOUR agent loop (~40 lines, see below)
  tools.ts — YOUR first 5 tools (read, write, search, bash, fetch)
  permissions.ts — Simple allow/ask/deny per tool (~30 lines)
  CLAUDE.md — What your AI should know about this project
  .env — ANTHROPIC_API_KEY=sk-ant-...
  package.json — @anthropic-ai/sdk, that's it

Week Two — When You Have Users

Add these directories as you need them, not before

my-product/
  core/ — Extract loop.ts into loop + context + router
  tools/ — One file per tool, with a registry
  skills/ — Your first skill folder (see Tab 4)
  permissions/ — Rules file + hooks
  memory/ — MEMORY.md + typed store
  interface/ — CLI first, web later

Copy This

Your Starter Loop

This is not Claude Code's loop. This is your loop — the simplest version that actually works with the Anthropic SDK. Ship this, then iterate.

loop.ts — Your Complete Agent in 40 Lines

TypeScript • Anthropic SDK • Production-ready pattern

import Anthropic from "@anthropic-ai/sdk";

import { tools, executeTool } from "./tools";

import { checkPermission } from "./permissions";

const client = new Anthropic();

const MAX_TURNS = 25; // Circuit breaker — non-negotiable

export async function run(userInput: string) {

  const messages = [{ role: "user", content: userInput }];

  const systemPrompt = loadCLAUDEmd() + "\n\n" + loadMemory();

  for (let turn = 0; turn < MAX_TURNS; turn++) {

    const res = await client.messages.create({

      model: "claude-sonnet-4-20250514",

      max_tokens: 4096,

      system: systemPrompt,

      tools: tools.sort((a, b) => a.name.localeCompare(b.name)),

      messages,

    });

    messages.push({ role: "assistant", content: res.content });

    // No tool use? We're done.

    if (res.stop_reason === "end_turn") return res;

    // Execute each tool call (ask permission for writes)

    const toolResults = [];

    for (const block of res.content.filter(b => b.type === "tool_use")) {

      const allowed = await checkPermission(block.name, block.input);

      const result = allowed

        ? await executeTool(block.name, block.input)

        : { error: "Permission denied by user" };

      toolResults.push({ type: "tool_result", tool_use_id: block.id, content: JSON.stringify(result) });

    }

    messages.push({ role: "user", content: toolResults });

  }

}

Your First Tools

Start With These 5

Claude Code ships 40+ tools. You ship 5. These cover 90% of what an AI agent needs on day one.

Read

readOnly: true • Auto-approve

Read a file by path. Return contents with line numbers. The most-used tool in any AI agent.

readOnly: true • Auto-approve

Grep/ripgrep over the codebase. Returns matching lines with file paths. How the AI finds what to read.

Write

readOnly: false • Ask user

Create or overwrite a file. Show the diff and ask before writing. This is where trust is built.

Bash

readOnly: false • Always ask

Execute a shell command. The most dangerous tool. Validate commands against a denylist before running.

Fetch

readOnly: true • Domain allowlist

HTTP GET/POST to external APIs. Use a domain allowlist. How your AI talks to the outside world.

The Permission Rule

Every tool declares readOnly: true or false. Read-only tools are auto-approved. Write tools ask the user. That's your entire permission system on day one. Grow it later — see the Security deep-dive (Tab 5) for what "later" looks like.

Growth Path

When to Add What

Don't over-engineer. Add each layer when you actually feel the pain.

Week 1: Context Gets Long

You'll hit the context window limit. Add a simple message summarizer that compresses messages older than 10 turns. Don't build all 4 compression stages yet.

Week 2: Users Want Shortcuts

Users will ask for the same multi-step workflows repeatedly. That's when you build your first skill (see Tab 4). Don't build a skill system — build ONE skill.

Week 3: AI Forgets Everything

Users get frustrated repeating context. Add a memory file — just a JSON file the AI reads on startup and writes to after conversations. Typed categories come later.

Week 4: Tasks Take Too Long

Complex tasks bottleneck on serial execution. Add a sub-agent that can run read-only research in parallel while the main agent works. Start with one agent type.

Month 2: Costs Spike

Your API bill will shock you. That's when you implement cache stability: sort tools alphabetically, split static/dynamic prompts, add sticky latches. See the Economics tab for the math.

Month 3: You Need Structure

Your single-file tools and permissions outgrow their files. NOW refactor into directories: tools/, skills/, permissions/, memory/. Not before.

Key Takeaway

Your product roadmap is literally ls skills/. Each skill is 3 files in a folder. This tab shows you how to build and ship your first skill in under an hour, then wire it into an interface that builds user trust through progressive disclosure.

Ship This Today

Your First Skill: Code Review

A skill is just 3 files in a folder. Here's a complete, working example you can copy right now. When a user types /review, this skill activates.

skills/code-review.skill/

3 files • Self-contained • Drop in, works immediately

code-review.skill/
  manifest.yaml — When to activate & what it needs
  prompt.md — How the AI should behave
  handler.ts — Prep work before, cleanup after

File 01 — Copy This

manifest.yaml

The skill's "ID card." Your loader reads this on startup to know what skills exist and when to use them.

File 02 — Copy This

prompt.md

Injected into the system prompt when this skill activates. The {{variables}} are filled by your handler.

# Code Review

You are reviewing changes in **{{project_name}}**.
Changed files: {{changed_files}}

## Rules
- Focus on: correctness, security, performance
- Ignore: style opinions, formatting
- Flag: OWASP Top 10, hardcoded secrets, missing error handling

## Output
For each issue: file:line, severity (critical/warning/note), what's wrong, suggested fix.

File 03 — Copy This

handler.ts

Deterministic code that runs before the AI thinks (gather context) and after it responds (save results). Not AI — just code.

export default {
  async pre(ctx) {
    // Gather what changed — the AI doesn't need to figure this out
    ctx.variables.changed_files = await bash("git diff --name-only HEAD~1");
    ctx.variables.project_name = config.name;
  },
  async post(result) {
    // Remember this review for next time
    await memory.save("project", {
      type: "review",
      issues: result.issues.length,
      date: new Date().toISOString()
    });
  }
}

That's It. Your First Feature Is Shipped.

Adding the next skill? mkdir skills/deploy.skill and create the same 3 files. The core loop doesn't change. Your product grows by adding folders, not by rewriting code. That's the entire point of this architecture.

Your Skill Roadmap

The First 5 Skills to Build

In this order. Each one teaches you something about the architecture.

/review

Read-only • Your easiest win

Review code changes. Read-only tools, no permissions drama. Teaches you: skill loading, prompt injection, variable filling.

/commit

Write access • Permission prompts

Stage files, generate a commit message, run git commit. Teaches you: write permissions, user approval flow, multi-step tool chains.

/explain

Read-only • Context assembly

Explain a function, file, or architecture decision. Teaches you: how to assemble context from multiple files, memory retrieval.

/fix

Write + Bash • The hard one

Diagnose a bug, write a fix, run tests. Teaches you: multi-tool orchestration, error recovery, the full agent loop in action.

/deploy

Calls other skills • Composition

Runs /review → /fix (if issues) → tests → deploys. Teaches you: skill composition — the moment your architecture pays off.

The Interface

How Users Interact With Your AI

Start with a CLI. Build trust through progressive disclosure. Add web and API later.

CLI

Start Here: Terminal

Week 1 • Zero UI overhead

A readline loop or React+Ink terminal UI. Users type, AI responds. Fastest way to iterate on your core loop without building UI.

Web

Then: Web Interface

Month 2 • When you have users

Same core loop, different skin. Add streaming UI, diff previews, permission dialogs. Your core is interface-agnostic — the loop doesn't care how it's rendered.

API

Then: Headless Mode

Month 3 • When you need automation

No human in the loop. Crons, webhooks, CI/CD trigger your agent. Same loop, autoApprove: true for all tool permissions within a sandbox.

5 UX Rules That Build TrustWhat to show users

These aren't theory. Violate any one and users will stop trusting your AI immediately.

Show the Plan Before Executing

Before multi-step work, display what the AI intends to do. Let users approve, modify, or cancel. Don't surprise people.

Show the Diff Before Writing

Never silently modify files. Show the exact changes line-by-line and wait for approval.

Show the Command Before Running

Display shell commands and ask. Especially git push, rm, or anything destructive.

Stream Everything

Never show a spinner for 30 seconds. Stream text token-by-token, show tool progress live.

Detect Frustration Without an LLM

Simple regex: repeated commands, "no", "wrong", "just do it." When triggered: shorter responses, fewer questions, faster. Costs zero tokens.

Key Takeaway

Claude Code's security model is a three-tier permission cascade with fail-closed defaults. Every tool action must pass through validation before execution. 23+ security validators protect bash execution. This is what "production-grade" looks like — and what you should build toward.

Security Architecture

Permission Model

A three-tier cascade with fail-closed defaults. Sources: [sathwick.xyz] [Straiker]

Three-Tier Permission Cascade

validateInput() → checkPermissions() → Three-Way Result

Every tool call flows through three stages:

Stage 1 — validateInput()
Tool-specific checks: size limits, blocked patterns, schema validation against JSON schema.

Stage 2 — checkPermissions()
Rule matching, classifier evaluation, hook evaluation. Pre/post hooks can modify input or block execution. Rules evaluated in strict order: deny → ask → allow. First match wins, so deny rules always take precedence.

Stage 3 — Three-way PermissionResult
{ behavior: 'allow', updatedInput? } | { behavior: 'ask', message } | { behavior: 'deny', message }
There is no fourth state. The union type enforces exactly one of three outcomes. Fail-closed by design — the buildTool() factory defaults to "ask" if a tool doesn't declare its permission level.

Seven Permission Modes

5 Public + 2 Internal

Mode	Type	Behavior
`default`	Public	Prompts user for permission on first use of each tool
`plan`	Public	Read-only — Claude can analyze but not modify files or execute commands
`acceptEdits`	Public	Auto-approves file edit permissions for the session
`dontAsk`	Public	Auto-denies tools unless pre-approved via `/permissions` or `permissions.allow` rules
`bypassPermissions`	Public	Skips permission prompts (writes to `.git`, `.claude`, `.vscode`, `.idea` still prompt)
`auto`	Internal	Uses a classifier model to decide safety per action. Reads `autoMode` config and uses prose-based environment descriptions for "trusted infrastructure" determination. Research preview.
`bubble`	Internal	Delegates permission decision to parent agent in multi-agent orchestration scenarios

Glob

Rule Matching via Glob Patterns

Bash commands • File paths • Shell-operator aware

Rules use glob patterns (not regex):
Bash(git push *), Bash(npm run *), Bash(* --version)

The space before * enforces word boundaries — Bash(ls *) matches ls -la but NOT lsof.

Shell-operator aware: Claude Code parses &&, so Bash(safe-cmd *) will NOT permit safe-cmd && rm -rf /.

File paths follow gitignore specification: //path (filesystem root), ~/path (home), /path (project root), ./path (CWD).

Hook

Hooks vs. CLAUDE.md

Deterministic vs. Probabilistic Enforcement

CLAUDE.md instructions are advisory — achieving roughly 80% compliance. The LLM may ignore them.

Hooks are deterministic. PreToolUse hooks run before the permission prompt and can force-deny, force-allow, or force-ask.

A blocking hook (exit code 2) takes precedence over allow rules. But a hook returning "allow" does NOT bypass deny rules — deny-first precedence is preserved.

"If something must happen every time without exception (linting, formatting, security checks), make it a hook, not a prompt instruction."

Sandbox Implementation

macOS: Seatbelt • Linux: bubblewrap • Network: Proxy Isolation

macOS: Apple's Seatbelt (sandbox-exec) framework for process-level enforcement.
Linux: bubblewrap (bwrap) for filesystem/network isolation. WSL2 uses bubblewrap; WSL1 is unsupported.
Network: A proxy server runs outside the sandbox to control domain access — the sandboxed process can only reach the network through the proxy.
Filesystem: Default is read-write to CWD, read-only elsewhere, with configurable allowWrite/denyWrite/denyRead/allowRead paths.
Escape hatch: When commands fail due to sandbox restrictions, Claude can retry with dangerouslyDisableSandbox (goes through normal permissions flow).

The sandbox runtime is open-source: github.com/anthropic-experimental/sandbox-runtime

Bash Security

23 Security Checks for Shell Execution

BashSecurity.ts implements a comprehensive defense layer against shell injection and escape attacks.

Zsh Builtin Blocking

Blocks 18 Zsh builtins that could be abused to escape the sandbox or exfiltrate data.

Zsh Equals Expansion Defense

Prevents =command expansion attacks unique to Zsh's extended globbing — a vector most security tooling misses.

Unicode Zero-Width Space Injection

Detects and blocks invisible Unicode characters that could be used to bypass command parsing or hide malicious payloads.

IFS Null-Byte Injection

Protects against Internal Field Separator manipulation that could split commands in unexpected ways.

Path Traversal Prevention

Blocks attempts to escape the working directory through ../ sequences and symlink attacks.

Fail-Closed Design

Any unrecognized pattern defaults to blocking. New attack vectors are automatically caught until explicitly allowlisted.

Full BashSecurity.ts Analysis23+ validators deep-dive

BashSecurity.ts — 23+ Security Validators

Regex Matching • shell-quote Parsing • tree-sitter AST • Straiker Analysis

The validator chain uses regex matching, shell-quote parsing, and tree-sitter AST analysis. Documented checks include:

Shell Escape Prevention: 18 blocked Zsh builtins • Zsh equals expansion defense (=curl bypassing permission for curl) • Obfuscated flags via $'...' ANSI-C quoting • Backslash-escaped operators

Injection Defense: Unicode zero-width space injection • IFS null-byte injection • Unicode normalization attacks • Backslash injection • Malformed token bypass (found during HackerOne review)

Path & Redirection: Path traversal (../, URL-encoded, symlinks) • Case-insensitive path manipulation • Redirection validation • Safe command substitution

Git-specific: validateGitCommit • Git command parsing

Critical architectural detail: Validators like validateGitCommit can return allow, which short-circuits ALL subsequent validators. The source contains explicit warnings about past exploitability. Commands are parsed through three different functions (splitCommand_DEPRECATED, tryParseShellCommand, ParsedCommand.parse) each with different edge-case behavior.

Known gap: shell-quote's character class treats CR as a word separator (JS \s includes \r), but bash's IFS does NOT include CR — a parser differential that could be exploitable.

Operational Insights

Metrics Found in Source Comments

Source code comments and error tracking revealed real production metrics.

~250K

API Calls/Day Wasted
(pre-fix auto-compact loop)

1,279

Sessions with 50+ Consecutive
Auto-Compact Failures

MAX_CONSECUTIVE_
AUTOCOMPACT_FAILURES

200K

Token Budget
for Base Prompts

13K

Token Buffer Before
Auto-Compaction Triggers

Key Takeaway

AI products have runtime costs that scale with usage. A cache miss costs 10x a cache hit. The difference between naive and optimized: $69K/month at 1K users on Sonnet. Cache stability isn't optimization — it's survival.

Reality Check

Token Economics 101

10x

Cache miss costs
10x a cache hit

90%

Discount on
cached reads

25%

Premium on
cache write tokens

Cache-break vectors
tracked

200K

Token budget for
base prompts

The Math That Kills Startups

1,000 daily users • 20 interactions • 50K tokens avg

Without caching: 1B tokens/day at $3/MTok = $90,000/month

With 85% cache hits: 15% full price + 85% at $0.30/MTok = $21,150/month

Difference: $69K/month. At Opus pricing ($15/MTok), naive = $450K/month.

Cache Architecture

5 Strategies for 85%+ Cache Hits

Sort Tools Alphabetically Before Every Call

Tool order affects cache keys. Different order = cache bust. Sort deterministically by name.

Split System Prompt: Static Above, Dynamic Below

Static parts (schemas, rules) are cacheable across users. Dynamic content (user context) only invalidates its region. Claude Code uses a SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker to split the two.

Use Sticky Latches for Mode Toggles

Once activated, latch it — no deactivation for the session. Flipping modes thrashes cache. Implemented via GrowthBook feature flags (prefixed tengu_). No deactivate() method — intentional.

Defer Expensive Tool Schemas

Don't include all tools in every prompt. ~18 tools are deferred via shouldDefer: true, discovered on-demand via ToolSearchTool. Keeps base prompts under 200K tokens.

Memoize Session Context

Git status, project config, date — don't change mid-conversation. Compute once, cache for session. A function called DANGEROUS_uncachedSystemPromptSection() explicitly marks sections as cache-volatile. The DANGEROUS_ naming warns developers against carelessly adding volatile content.

14 Cache-Break Vectors & Architecture DetailsDeep-dive

14 Cache-Break Vectors

PromptCacheBreakDetection.ts

PromptCacheBreakDetection.ts tracks 14 distinct vectors that can invalidate the prompt cache. Documented categories include:

Prompt-level: System prompt modifications • Tool registration changes (adding/removing tools) • Model switching (changing between model variants) • Mode toggles (changing permission modes)

Context-level: CLAUDE.md content changes • Git context changes (branch switches, new commits) • User context updates • Session context mutations

Feature-level: Feature flag changes (GrowthBook flags prefixed tengu_)

The exact enumeration of all 14 was not fully reproduced in any public analysis — but the architectural pattern is clear: every vector that could change the prompt prefix is explicitly tracked and managed.

Latch

Sticky Latches

One-Way Gates via GrowthBook Feature Flags

Sticky latches are one-way gates — once a mode is activated, the latch holds it in place for the session. Flipping back and forth between states does NOT thrash the cache.

The function getFeatureValue_CACHED_MAY_BE_STALE() avoids blocking the main loop for flag lookups — treating stale data as acceptable rather than busting the cache to get fresh values.

A→Z

Alphabetical Tool Sorting

assembleToolPool() Pipeline

The tool assembly pipeline:
getTools() → filterToolsByDenyRules() → uniqBy(name) → sort(name)

Tools are sorted alphabetically before being sent to the API. This keeps the tool list in the same order across requests, maximizing prompt cache hits.

Cost Hierarchy & Scaling EconomicsDecision framework

Cache Hits

90% discount. Maximize these.

Cache Writes

25% premium. First request pays this. Amortize over turns.

$$$

Cache Misses

Full price. 10x a hit. Every miss = a bug.

$$$$

Wasted Retries

Unbounded loops at full price. Always cap with circuit breakers.

Unit Economics at Scale (Sonnet, 85% cache hit)

20 interactions/day • 50K tokens avg

1K users: ~$700/mo — Ramen-profitable at $10/mo subscription

10K users: ~$7K/mo — Sustainable at $5/mo

100K users: ~$70K/mo — $3/mo covers costs

Without caching: 100K users = $300K+/mo. The difference between a business and bankruptcy.

The Cautionary Tale

Before implementing the MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3 circuit breaker, Claude Code was wasting ~250,000 API calls per day from auto-compact failure loops. 1,279 sessions hit 50+ consecutive failures. Failed compactions were thrashing the cache on every retry — paying full price each time. The fix was three lines of code.

Key Takeaway

10 steps from zero to production. 7 non-negotiable rules. 12 best practices extracted from 512K lines. Follow the sequence — each step builds on the last. Start with 5-7 tools and a simple loop, add permissions, ship your first skill, then optimize.

Implementation Sequence

10 Steps to Production

Don't skip ahead. Each step builds on the previous one.

Define Your Tool Surface

Start with 5-7 tools. Each gets a JSON schema, permission level, and readOnly flag. Claude Code started with core tools and grew to 40+.

Write Your CLAUDE.md

The project "constitution." Include: what it does, arch decisions, conventions, what to avoid. Write it on day one.

Build the Core Loop

The 5-step loop: context → LLM → parse → tools → loop. Keep it under 100 lines. All complexity goes into subsystems.

Add the Permission Cascade

Three tiers: validateInput() → checkPermissions() → allow|ask|deny. Default to "ask." Fail closed.

Build Your First Skill

One high-value workflow: manifest + prompt + handler. Proves the skill architecture before you build the second.

Add Persistent Memory

Typed categories, index file, save/retrieve in loop. Without memory, every conversation starts from zero.

Add the Interface Layer

Start CLI (fastest to iterate). Add web and API later. The core loop is interface-agnostic.

Add Compression Pipeline

At minimum: auto-compaction + result truncation. Add the MAX_CONSECUTIVE_FAILURES = 3 circuit breaker from the start.

Add Sub-Agents

Start with two types: "Explore" (read-only) and "Plan" (read + analysis). Isolated context, limited tools.

Add Cache Stability

Sort tools alphabetically, split static/dynamic prompt, sticky latches, defer tools, circuit breakers. This makes your business viable.

Guiding Principles

The Seven Rules

Non-negotiable principles for every decision.

Rule 01

Simple Core, Complex Harness

If your core loop is complex, you've failed. All complexity lives in subsystems.

Rule 02

Fail Closed, Always

Unknown tool? Ask. Unknown permission? Deny. Unknown state? Stop.

Rule 03

Evolve by Addition

Feature = folder. Core never changes. If you modify the loop to add a feature, your architecture is wrong.

Rule 04

Every Cache Miss Is a Bug

Track vectors. Sort deterministically. Latch modes. 1% more misses = thousands per month.

Rule 05

Parallel Reads, Serial Writes

Reads concurrently. Writes one at a time. No locks — just discipline.

Rule 06

Hooks for Certainty, Prompts for Guidance

Must happen every time? Make it a hook. Deterministic beats probabilistic.

Rule 07

Not Everything Needs an LLM

Frustration detection? Regex. Tool sorting? Array.sort(). Save LLM calls for reasoning.

12 Best Practices from 512K LinesProduction-tested patterns

Invest in the Rendering Layer

Claude Code's custom Ink framework is a competitive advantage. Game-engine techniques (interning pools, double buffering, Int32Array-backed char pools) make a real, measurable difference in terminal UX.

Design Multi-Strategy Failure Recovery

Compaction → Collapse → Fallback → Surface. Users almost never see raw API errors because there are four layers of graceful degradation. Design your error handling as a pipeline, not a try/catch.

Defer Aggressively

Lazy loading at every level: schemas, commands, 18 deferred tools. This keeps startup fast and memory bounded. ToolSearchTool fetches tool definitions only when the model actually needs them.

Intern Everything for Performance

Style pools, character pools, hyperlink pools turn string comparisons into integer comparisons. In hot paths (rendering, diffing), this is the difference between 60fps and 6fps terminal updates.

Fail Closed on Permissions

Default to "ask" rather than auto-approval. Tools must explicitly opt in to auto-approval. Security by default, convenience by opt-in.

Use Regex for Cheap Sentiment Detection

File userPromptKeywords.ts pattern-matches common frustration signals. Not everything needs to be an LLM call.

Track and Cap Failure Loops

1,279 sessions generated 50+ consecutive auto-compact failures, wasting ~250,000 API calls daily. The fix: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. Always set circuit breakers.

Prompt Cache Stability is a First-Class Concern

Sort tools alphabetically. Use sticky latches. Prompt ordering matters enormously for API cost at scale.

CLAUDE.md is Advisory; Hooks are Deterministic

Instructions in CLAUDE.md achieve roughly 80% compliance. If something must happen every time, make it a hook, not a prompt instruction.

Centralize Side Effects

Route all state mutations through a single onChangeAppState() function. One function to audit, not forty.

Parallel Read, Serial Write

Read-only tools run in parallel (up to 10 concurrent). Write tools execute serially. No locks, no mutexes, just architectural discipline.

Elastic Tool Discovery Over Static Registration

Instead of loading all 40+ tool schemas into every prompt, defer 18 tools to a search tool. The model discovers tools dynamically. Keeps base prompts under 200K tokens.

6 Architecture Patterns to ApplyCode examples

Pattern 01

Plugin-Based Tool System

Define each tool as a self-contained module with a JSON schema, permission level, and execution handler.

interface Tool {
  name: string;
  schema: JSONSchema;
  permission: 'auto' | 'ask' | 'deny';
  readOnly: boolean;
  execute(input: unknown): Promise<Result>;
}

Pattern 02

Three-Tier Permission Cascade

Every tool invocation passes through: (1) input validation, (2) rule matching, (3) allowance determination. Fail closed at every stage.

function canExecute(tool, input) {
  if (!validateInput(tool.schema, input)) return DENY;
  const rule = matchRule(tool, userPolicies);
  if (!rule) return ASK; // fail closed
  return rule.allow ? ALLOW : DENY;
}

Pattern 03

Progressive Context Compression

Build a pipeline: light compaction first, progressively more aggressive stages, nuclear option only on hard API limits.

const stages = [
  autoCompact,   // Summarize old messages
  microCompact,  // Truncate tool results
  snipCompact,   // Message truncation
  contextCollapse // Full rewrite (413 only)
];
for (const stage of stages) {
  if (withinBudget()) break;
  await stage(context);
}

Pattern 04

Deferred Tool Loading

Don't bloat every prompt with 40+ tool schemas. Load a core set, let the model discover additional tools through a meta-tool.

class ToolSearchTool implements Tool {
  async execute({ query }) {
    return deferredTools
      .filter(t => matches(t, query))
      .map(t => t.schema);
  }
}

Pattern 05

Circuit Breakers on Retry Loops

Unbounded retries will bankrupt you. Always cap consecutive failures.

Pattern 06

Cache-Stable Prompt Construction

Sort tools alphabetically. Use sticky latches for mode toggles. Track every cache-break vector.

Hidden Gems from the Source Code44 feature flags, easter eggs

The source code revealed 44 feature flags, many exposing capabilities never publicly documented. Some are playful, others are deeply controversial.

Controversial

KAIROS — Autonomous Daemon Mode

An always-on background agent that performs "memory consolidation" (autoDream) while the user is idle. Merges observations, removes contradictions, converts vague insights to actionable facts. Includes a /dream skill, daily logs, GitHub webhook subscriptions, and 5-minute cron cycles. Referenced 150+ times in the source code.

Undercover Mode

When CLAUDE_CODE_UNDERCOVER=1 is set, Claude Code injects instructions to never reveal AI authorship in commits and pull requests on public repos. No force-OFF switch exists. Also guards against model codename leaks (e.g., "Capybara," "Tengu").

Anti-Distillation Defense

The ANTI_DISTILLATION_CC flag injects fake/decoy tool definitions into system prompts to poison training data from anyone recording API traffic. Requires four conditions to activate.

Native Client Attestation

A cch=00000 placeholder header replaced by Bun's Zig-based HTTP stack with a computed hash — cryptographic proof of an authentic Claude Code binary. Essentially transport-layer DRM.

Capabilities

COORDINATOR MODE

Multi-agent orchestration for parallel workers. Spawns sub-agents with isolated contexts and enforces quality standards through prompt-based coordination.

VOICE_MODE

Push-to-talk voice interface for hands-free coding. Suggests Claude Code was being developed with multimodal interaction in mind.

ULTRAPLAN

30-minute remote planning sessions — a mode designed for long, intensive architectural planning conversations that go beyond typical coding tasks.

Frustration Detection

File userPromptKeywords.ts uses regex patterns to detect user frustration in real-time. A cheaper alternative to running a sentiment analysis LLM call.

Easter Eggs

BUDDY — Tamagotchi Companion

A virtual pet system with 18 species, five rarity tiers (60% common to 0.01% legendary shiny), Mulberry32 PRNG seeded from user ID, and ASCII sprite animation. Started as an April Fools' feature but became permanent.

The Big Idea

Building Software Has Changed

The AI is a commodity. The tool system, permission cascade, context management, skill architecture, memory system, and economic engineering — these are the competitive moats.

A single developer with this architecture can build what used to require a team of ten. Not because the AI writes the code — but because it operates within a system designed for evolvability.

Start with the loop. Add tools. Add skills. Ship.

References