Incident Analysis // March 31, 2026

Claude Code
Source Leak

Anthropic's flagship CLI tool — 512,000+ lines of TypeScript — was accidentally exposed via an npm source map. Within hours, thousands of developers had reverse-engineered the entire architecture.

512K+ Lines of Code
~1,900 TypeScript Files
44 Feature Flags
~40 Built-in Tools
59.8 MB Source Map

Timeline

How It Happened

A packaging bug led to the largest accidental source code exposure in the AI coding tool space.

March 11, 2026
Bun Runtime Bug Filed
Issue oven-sh/bun#28001 reported: Bun serves source maps in production mode despite docs stating otherwise. The ticking time bomb.
March 31, 2026 — Morning
v2.1.88 Published to npm
A new version of @anthropic-ai/claude-code was published with a 59.8 MB source map file (.map) accidentally bundled. The .map file contained full paths back to the original TypeScript source.
March 31, 2026 — Hours Later
Source Map Discovered
Developers noticed the unusually large .map file in the npm package. The source map pointed to a publicly accessible zip archive on Anthropic's Cloudflare R2 storage bucket containing all original TypeScript source files.
March 31, 2026 — Viral
Rapid Mirroring & Analysis
The full codebase was downloaded, mirrored to GitHub (41,500+ forks within days), and analyzed across Hacker News, Reddit, Twitter/X, and multiple blogs. A Python clone hit 50K stars in 2 hours — the fastest in GitHub history.
March 31, 2026 — Anthropic Response
Official Statement
"A Claude Code release included some internal source code. This was a release packaging issue caused by human error, not a security breach." No customer data or model weights were exposed. This was Anthropic's second security lapse in days (after the Mythos incident).

Reverse Engineered

Internal Architecture

Claude Code is a 512K-line TypeScript monolith built on Bun, React + Ink, with game-engine rendering techniques. Five core subsystems power the entire experience.


Foundation

Runtime & Rendering

Not your typical Node.js CLI. Claude Code uses game-engine-level optimizations for terminal rendering.

Bun
Bun Runtime (Not Node.js)
Core Runtime
Built on Bun instead of Node.js for significantly faster startup, native TypeScript execution, and a Zig-based HTTP stack that enables native client attestation (cch=00000 header).
Ink
React + Ink Terminal UI
Custom React Reconciler
Components → Virtual DOM → Yoga Layout Engine → Output Builder → Screen Buffer → ANSI. A full React rendering pipeline adapted for the terminal.
GPU
Game-Engine Rendering
~50x Performance Gain
Int32Array-backed ASCII char pools, bitmask-encoded style metadata, three interning pools (characters, styles, hyperlinks), double buffering with region blitting. Claims ~50x reduction in stringWidth calls via cursor-move optimization.

Core Systems

Five Subsystems

The entire Claude Code architecture decomposes into five major subsystems, each responsible for a critical dimension of the agent experience.

01
Tool System
~29,000 Lines • ~40 Built-in Tools
Plugin architecture with permission-gating per tool. Read-only tools run in parallel (max 10 concurrent), write tools execute serially. 18 tools are deferred to a ToolSearchTool for elastic discovery, keeping base prompts under 200K tokens. Each tool definition includes a JSON schema, permission requirements, and execution handler.
02
Query Engine
~46,000 Lines • API Orchestration
Handles LLM API calls, streaming, caching, and orchestration. Core loop: prefetch memory + skills → apply message compaction → call API with streaming → execute tools. Manages prompt cache stability with 14 tracked cache-break vectors.
03
Multi-Agent Orchestration
Swarm Architecture
Spawns sub-agents ("swarms") for parallelizable tasks. Each agent gets isolated context and specific tool permissions. Coordinator mode uses prompt-based orchestration with quality enforcement: "Do not rubber-stamp weak work."
04
IDE Bridge
VS Code + JetBrains
Bidirectional communication with VS Code and JetBrains IDEs via JWT-authenticated channels. Enables inline diffs, workspace awareness, and tool result rendering directly in the editor.
05
Persistent Memory
Self-Healing Architecture
File-based memory system with MEMORY.md index files, typed memory categories (user, feedback, project, reference), and automatic consolidation. The autoDream process runs memory merging while idle.

Context Window

Four-Stage Compression Pipeline

How Claude Code keeps conversations going without losing critical information.

S1
Auto-Compaction
Summarizes older messages when context exceeds budget minus 13K tokens. The most common compression stage.
S2
Micro-Compaction
Lighter truncation of tool results by age and size. Preserves recent results while compressing older, larger ones.
S3
Snip Compaction
Feature-gated message truncation. More aggressive than micro-compaction but preserves message structure.
S4
Context Collapse
Nuclear option. Staged compression, committed only on 413 API errors. Rewrites the entire conversation context.

Security Architecture

Permission Model

A three-tier cascade with fail-closed defaults — every tool action must pass through validation before execution. Sources: [sathwick.xyz] [Straiker]

Three-Tier Permission Cascade
validateInput() → checkPermissions() → Three-Way Result
Every tool call flows through three stages:

Stage 1 — validateInput()
Tool-specific checks: size limits, blocked patterns, schema validation against JSON schema.

Stage 2 — checkPermissions()
Rule matching, classifier evaluation, hook evaluation. Pre/post hooks can modify input or block execution. Rules evaluated in strict order: deny → ask → allow. First match wins, so deny rules always take precedence.

Stage 3 — Three-way PermissionResult
{ behavior: 'allow', updatedInput? } | { behavior: 'ask', message } | { behavior: 'deny', message }
There is no fourth state. The union type enforces exactly one of three outcomes. Fail-closed by design — the buildTool() factory defaults to "ask" if a tool doesn't declare its permission level.
Seven Permission Modes
5 Public + 2 Internal
ModeTypeBehavior
defaultPublicPrompts user for permission on first use of each tool
planPublicRead-only — Claude can analyze but not modify files or execute commands
acceptEditsPublicAuto-approves file edit permissions for the session
dontAskPublicAuto-denies tools unless pre-approved via /permissions or permissions.allow rules
bypassPermissionsPublicSkips permission prompts (writes to .git, .claude, .vscode, .idea still prompt)
autoInternalUses a classifier model to decide safety per action. Reads autoMode config and uses prose-based environment descriptions for "trusted infrastructure" determination. Research preview.
bubbleInternalDelegates permission decision to parent agent in multi-agent orchestration scenarios
Glob
Rule Matching via Glob Patterns
Bash commands • File paths • Shell-operator aware
Rules use glob patterns (not regex):
Bash(git push *), Bash(npm run *), Bash(* --version)

The space before * enforces word boundaries — Bash(ls *) matches ls -la but NOT lsof.

Shell-operator aware: Claude Code parses &&, so Bash(safe-cmd *) will NOT permit safe-cmd && rm -rf /.

File paths follow gitignore specification: //path (filesystem root), ~/path (home), /path (project root), ./path (CWD).
Hook
Hooks vs. CLAUDE.md
Deterministic vs. Probabilistic Enforcement
CLAUDE.md instructions are advisory — achieving roughly 80% compliance. The LLM may ignore them.

Hooks are deterministic. PreToolUse hooks run before the permission prompt and can force-deny, force-allow, or force-ask.

A blocking hook (exit code 2) takes precedence over allow rules. But a hook returning "allow" does NOT bypass deny rules — deny-first precedence is preserved.

"If something must happen every time without exception (linting, formatting, security checks), make it a hook, not a prompt instruction."
Sandbox Implementation
macOS: Seatbelt • Linux: bubblewrap • Network: Proxy Isolation
macOS: Apple's Seatbelt (sandbox-exec) framework for process-level enforcement.
Linux: bubblewrap (bwrap) for filesystem/network isolation. WSL2 uses bubblewrap; WSL1 is unsupported.
Network: A proxy server runs outside the sandbox to control domain access — the sandboxed process can only reach the network through the proxy.
Filesystem: Default is read-write to CWD, read-only elsewhere, with configurable allowWrite/denyWrite/denyRead/allowRead paths.
Escape hatch: When commands fail due to sandbox restrictions, Claude can retry with dangerouslyDisableSandbox (goes through normal permissions flow).

The sandbox runtime is open-source: github.com/anthropic-experimental/sandbox-runtime
BashSecurity.ts — 23+ Security Validators
Regex Matching • shell-quote Parsing • tree-sitter AST • Straiker Analysis
The validator chain uses regex matching, shell-quote parsing, and tree-sitter AST analysis. Documented checks include:

Shell Escape Prevention: 18 blocked Zsh builtins • Zsh equals expansion defense (=curl bypassing permission for curl) • Obfuscated flags via $'...' ANSI-C quoting • Backslash-escaped operators

Injection Defense: Unicode zero-width space injection • IFS null-byte injection • Unicode normalization attacks • Backslash injection • Malformed token bypass (found during HackerOne review)

Path & Redirection: Path traversal (../, URL-encoded, symlinks) • Case-insensitive path manipulation • Redirection validation • Safe command substitution

Git-specific: validateGitCommit • Git command parsing

Critical architectural detail: Validators like validateGitCommit can return allow, which short-circuits ALL subsequent validators. The source contains explicit warnings about past exploitability. Commands are parsed through three different functions (splitCommand_DEPRECATED, tryParseShellCommand, ParsedCommand.parse) each with different edge-case behavior.

Known gap: shell-quote's character class treats CR as a word separator (JS \s includes \r), but bash's IFS does NOT include CR — a parser differential that could be exploitable.

Cost Optimization

Prompt Cache Architecture

When your token bill drives product economics, cache stability is a first-class engineering concern. Sources: [sathwick.xyz] [kuber.studio]

10x
Cost of a cache miss
vs. a cache hit
90%
Discount on
cached read tokens
25%
Premium on
cache write tokens
14
Cache-break vectors
tracked
200K
Token budget for
base prompts
14 Cache-Break Vectors
PromptCacheBreakDetection.ts
PromptCacheBreakDetection.ts tracks 14 distinct vectors that can invalidate the prompt cache. Documented categories include:

Prompt-level: System prompt modifications • Tool registration changes (adding/removing tools) • Model switching (changing between model variants) • Mode toggles (changing permission modes)

Context-level: CLAUDE.md content changes • Git context changes (branch switches, new commits) • User context updates • Session context mutations

Feature-level: Feature flag changes (GrowthBook flags prefixed tengu_)

The exact enumeration of all 14 was not fully reproduced in any public analysis — but the architectural pattern is clear: every vector that could change the prompt prefix is explicitly tracked and managed.
Latch
Sticky Latches
One-Way Gates via GrowthBook Feature Flags
Sticky latches are one-way gates — once a mode is activated, the latch holds it in place for the session. Flipping back and forth between states does NOT thrash the cache.

Implemented via GrowthBook feature flags (prefixed tengu_). The function getFeatureValue_CACHED_MAY_BE_STALE() avoids blocking the main loop for flag lookups — treating stale data as acceptable rather than busting the cache to get fresh values.

No deactivate() method — intentional. Once latched, latched.
A→Z
Alphabetical Tool Sorting
assembleToolPool() Pipeline
The tool assembly pipeline:
getTools()filterToolsByDenyRules()uniqBy(name)sort(name)

Tools are sorted alphabetically before being sent to the API. This keeps the tool list in the same order across requests, maximizing prompt cache hits. Without sorting, each request could produce a different prompt prefix, busting the cache every time.

Additionally, ~18 tools are deferred via shouldDefer: true, keeping the base prompt under 200K tokens. Deferred tools are discovered on-demand via ToolSearchTool.
System Prompt Cache Boundary
SYSTEM_PROMPT_DYNAMIC_BOUNDARY • Static vs. Dynamic Sections
The system prompt uses a SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker that splits content into two regions:

Static sections (above boundary): System instructions, tool schemas, behavioral rules. Cacheable across organizations. Protected by cache_control blocks.

Dynamic sections (below boundary): User/session-specific content — git status, CLAUDE.md, date, memory. Changes break cache for this region only.

A function called DANGEROUS_uncachedSystemPromptSection() explicitly marks sections as cache-volatile. The DANGEROUS_ naming convention serves as a warning to developers against carelessly adding cache-volatile content to the static region.

System context (git status, CLAUDE.md, date) is memoized per session to maintain stable prompt prefixes. The "cached" variant of compaction preserves prompt cache integrity via CacheEditsBlock — preventing cache invalidation during incremental context compression.
The Economics at Scale
Why This Matters • Real Production Numbers
Anthropic's own pricing makes cache stability existential:

Standard input: $15/MTok (Opus-class) or $3/MTok (Sonnet-class)
Cache read hit: 90% discount (e.g., $0.30/MTok for cached Sonnet reads)
Cache write: 25% premium over standard input

Every cache miss means paying 10x more for the same tokens. At the scale of Claude Code's user base, even a 1% increase in cache miss rate translates to millions in additional API costs.

The cautionary tale: Before implementing the MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3 circuit breaker, Claude Code was wasting ~250,000 API calls per day from auto-compact failure loops. 1,279 sessions hit 50+ consecutive failures. Failed compactions were thrashing the cache on every retry — paying full price each time. The fix was three lines of code.
44 Feature Flags Discovered

Hidden Features & Secrets

The source code revealed 44 feature flags, many of which expose capabilities never publicly documented. Some are playful, others are deeply controversial.


Controversial

The Big Reveals

KAIROS — Autonomous Daemon Mode
An always-on background agent that performs "memory consolidation" (autoDream) while the user is idle. Merges observations, removes contradictions, converts vague insights to actionable facts. Includes a /dream skill, daily logs, GitHub webhook subscriptions, and 5-minute cron cycles. Referenced 150+ times in the source code. Suggests Claude Code was being developed as far more than a CLI — an autonomous development companion.
Undercover Mode
When CLAUDE_CODE_UNDERCOVER=1 is set, Claude Code injects instructions to never reveal AI authorship in commits and pull requests on public repos. No force-OFF switch exists. Also guards against model codename leaks (e.g., "Capybara," "Tengu"). This was one of the most debated discoveries — raising questions about transparency in AI-generated code.
Anti-Distillation Defense
The ANTI_DISTILLATION_CC flag injects fake/decoy tool definitions into system prompts to poison training data from anyone recording API traffic. Requires four conditions to activate. A defensive measure against model distillation by competitors monitoring the API.
Native Client Attestation
A cch=00000 placeholder header replaced by Bun's Zig-based HTTP stack with a computed hash — cryptographic proof of an authentic Claude Code binary. Essentially transport-layer DRM to prevent unauthorized API access through unofficial clients.

Capabilities

Power Features

COORDINATOR MODE
Multi-agent orchestration for parallel workers. Spawns sub-agents with isolated contexts and enforces quality standards through prompt-based coordination.
VOICE_MODE
Push-to-talk voice interface for hands-free coding. Suggests Claude Code was being developed with multimodal interaction in mind.
ULTRAPLAN
30-minute remote planning sessions — a mode designed for long, intensive architectural planning conversations that go beyond typical coding tasks.
Frustration Detection
File userPromptKeywords.ts uses regex patterns to detect user frustration in real-time. A cheaper alternative to running a sentiment analysis LLM call — just regex match against common frustration patterns and adjust behavior.

Fun

Easter Eggs

BUDDY — Tamagotchi Companion
A virtual pet system with 18 species, five rarity tiers (60% common to 0.01% legendary shiny), Mulberry32 PRNG seeded from user ID, and ASCII sprite animation. Started as an April Fools' feature but became permanent. Your coding companion literally comes with a pet.
Defense in Depth

Security Architecture

Claude Code's security model revealed a surprisingly rigorous defense-in-depth approach to bash execution, permission management, and prompt integrity.


Bash Security

23 Security Checks for Shell Execution

BashSecurity.ts implements a comprehensive defense layer against shell injection and escape attacks.

Zsh Builtin Blocking
Blocks 18 Zsh builtins that could be abused to escape the sandbox or exfiltrate data.
Zsh Equals Expansion Defense
Prevents =command expansion attacks unique to Zsh's extended globbing — a vector most security tooling misses.
Unicode Zero-Width Space Injection
Detects and blocks invisible Unicode characters that could be used to bypass command parsing or hide malicious payloads.
IFS Null-Byte Injection
Protects against Internal Field Separator manipulation that could split commands in unexpected ways.
Path Traversal Prevention
Blocks attempts to escape the working directory through ../ sequences and symlink attacks.
Fail-Closed Design
Any unrecognized pattern defaults to blocking. New attack vectors are automatically caught until explicitly allowlisted.

Operational Insights

Metrics Found in Source Comments

Source code comments and error tracking revealed real production metrics.

~250K
API Calls/Day Wasted
(pre-fix auto-compact loop)
1,279
Sessions with 50+ Consecutive
Auto-Compact Failures
3
MAX_CONSECUTIVE_
AUTOCOMPACT_FAILURES
200K
Token Budget
for Base Prompts
13K
Token Buffer Before
Auto-Compaction Triggers
Lessons from 512K Lines

Best Practices

The most valuable engineering patterns extracted from Claude Code's codebase by the developer community. These are production-tested at massive scale.


Invest in the Rendering Layer
Claude Code's custom Ink framework is a competitive advantage. Game-engine techniques (interning pools, double buffering, Int32Array-backed char pools) make a real, measurable difference in terminal UX. Don't treat CLI rendering as an afterthought.
Design Multi-Strategy Failure Recovery
Compaction → Collapse → Fallback → Surface. Users almost never see raw API errors because there are four layers of graceful degradation before an error reaches the user. Design your error handling as a pipeline, not a try/catch.
Defer Aggressively
Lazy loading at every level: schemas, commands, 18 deferred tools. This keeps startup fast and memory bounded. Don't load what you don't need — ToolSearchTool fetches tool definitions only when the model actually needs them.
Intern Everything for Performance
Style pools, character pools, hyperlink pools turn string comparisons into integer comparisons. In hot paths (rendering, diffing), this is the difference between 60fps and 6fps terminal updates.
Fail Closed on Permissions
Default to "ask" rather than auto-approval. Tools must explicitly opt in to auto-approval. This prevents new tools from silently gaining elevated access. Security by default, convenience by opt-in.
Use Regex for Cheap Sentiment Detection
A regex is faster and cheaper than an LLM inference call to check if someone is frustrated. File userPromptKeywords.ts pattern-matches common frustration signals. Not everything needs to be an LLM call.
Track and Cap Failure Loops
1,279 sessions generated 50+ consecutive auto-compact failures, wasting ~250,000 API calls daily. The fix: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. Always set circuit breakers on retry loops — unbounded retries will destroy your API bill.
Prompt Cache Stability is a First-Class Concern
Sort tools alphabetically in prompt construction. Use sticky latches to prevent mode toggles from breaking cache. Prompt ordering matters enormously for API cost at scale. The PromptCacheBreakDetection.ts file alone tracks 14 cache-break vectors.
CLAUDE.md is Advisory; Hooks are Deterministic
Instructions in CLAUDE.md achieve roughly 80% compliance. If something must happen every time without exception (linting, formatting, security checks), make it a hook, not a prompt instruction. Deterministic enforcement beats probabilistic compliance.
Centralize Side Effects
Route all state mutations affecting external systems through a single onChangeAppState() function. This creates a single chokepoint for debugging, logging, and permission checking. One function to audit, not forty.
Parallel Read, Serial Write
Read-only tools run in parallel (up to 10 concurrent). Write tools execute serially. This simple rule maximizes throughput while preventing race conditions — no locks, no mutexes, just architectural discipline.
Elastic Tool Discovery Over Static Registration
Instead of loading all 40+ tool schemas into every prompt (blowing the token budget), defer 18 tools to a search tool. The model discovers tools dynamically when needed. This keeps base prompts under 200K tokens while still exposing the full capability set.
Practical Application

How to Build Like Claude Code

Concrete patterns you can apply to your own AI agent projects, extracted from Claude Code's architecture.


Patterns

Architecture Patterns for AI Agents

Pattern 01
Plugin-Based Tool System
Define each tool as a self-contained module with a JSON schema, permission level, and execution handler. Use a registry pattern for discovery.
// Tool definition pattern
interface Tool {
  name: string;
  schema: JSONSchema; // Input validation
  permission: 'auto' | 'ask' | 'deny';
  readOnly: boolean; // Parallel if true
  execute(input: unknown): Promise<Result>;
}
Pattern 02
Three-Tier Permission Cascade
Every tool invocation passes through: (1) input validation, (2) rule matching against user-defined policies, (3) allowance determination. Fail closed at every stage.
// Permission cascade
function canExecute(tool, input) {
  if (!validateInput(tool.schema, input)) return DENY;
  const rule = matchRule(tool, userPolicies);
  if (!rule) return ASK; // fail closed
  return rule.allow ? ALLOW : DENY;
}
Pattern 03
Progressive Context Compression
Don't just truncate. Build a pipeline: light compaction first, progressively more aggressive stages, with the nuclear option only on hard API limits.
// Compression pipeline
const stages = [
  autoCompact,   // Summarize old messages
  microCompact,  // Truncate tool results
  snipCompact,   // Message truncation
  contextCollapse // Full rewrite (413 only)
];
for (const stage of stages) {
  if (withinBudget()) break;
  await stage(context);
}
Pattern 04
Deferred Tool Loading
Don't bloat every prompt with 40+ tool schemas. Load a core set, then let the model discover additional tools through a meta-tool when needed.
// Core tools (always loaded): ~22
// Deferred tools (loaded on demand): ~18

class ToolSearchTool implements Tool {
  async execute({ query }) {
    return deferredTools
      .filter(t => matches(t, query))
      .map(t => t.schema);
  }
}
Pattern 05
Circuit Breakers on Retry Loops
Unbounded retries will bankrupt you. Claude Code learned this the hard way: 250K wasted API calls/day from auto-compact loops. Always cap consecutive failures.
const MAX_FAILURES = 3;
let consecutive = 0;

async function withCircuitBreaker(fn) {
  if (consecutive >= MAX_FAILURES) {
    surfaceErrorToUser();
    return;
  }
  try {
    await fn();
    consecutive = 0; // reset on success
  } catch {
    consecutive++;
  }
}
Pattern 06
Cache-Stable Prompt Construction
Sort tools alphabetically. Use sticky latches for mode toggles. Track every vector that could invalidate your prompt cache. This is the difference between viable and unviable economics at scale.
// Sort for cache stability
const tools = allTools
  .sort((a, b) => a.name.localeCompare(b.name));

// Sticky latch: once activated, don't toggle
class CacheLatch {
  #activated = false;
  activate() { this.#activated = true; }
  get active() { return this.#activated; }
  // No deactivate() method - intentional
}

Agent Loop

The Core Agent Loop

At its heart, Claude Code runs a deceptively simple loop. The complexity is in the subsystems, not the orchestration.

Simplified Core Loop
query engine • tool executor • context manager
async function agentLoop(userMessage) {
  // 1. Prefetch memory + skills
  const memory = await prefetchMemory();
  const skills = await prefetchSkills();

  // 2. Build context with cache stability
  const context = buildContext({
    systemPrompt,
    tools: sortAlphabetically(activeTools),
    memory,
    skills,
    messages: compactIfNeeded(history)
  });

  // 3. Stream LLM response
  const response = await streamAPI(context);

  // 4. Execute tool calls
  if (response.toolCalls) {
    const reads = response.toolCalls.filter(t => t.readOnly);
    const writes = response.toolCalls.filter(t => !t.readOnly);

    // Parallel reads, serial writes
    await Promise.all(reads.map(executeWithPermission));
    for (const w of writes) {
      await executeWithPermission(w);
    }

    // 5. Loop back with tool results
    return agentLoop(toolResults);
  }

  return response.text;
}

Key Takeaway

The "Bash Is All You Need" Insight

At its core, Claude Code proves that a powerful AI coding agent can be built from a surprisingly simple foundation: an LLM with bash access and a well-designed tool system. The 512K lines exist not because the core idea is complex, but because production-grade execution — rendering, security, caching, error recovery, multi-agent orchestration — demands engineering depth. The architecture is a testament to the principle: simple core idea, complex execution.

References

Sources & Further Reading

Every blog post, forum thread, news article, and repository referenced in this analysis.