Deep Dive into Claude's Memory System — Auto Memory, Auto Dream, and Sleep-time Compute
- AI in Game Development: Insights from 2,165 Messages - A Developer's 47-Day Log
- Claude Opus 4.5 → 4.6 Transition: Performance, Tokens, Workflow Changes Experienced by a Game Developer
- Does AGENTS.md Really Help? - Analysis of a Paper Verifying the Impact of Context Files on Coding Agents
- Claude Memory Goes Free, /simplify & /batch — and the Hidden Cost of CLAUDE.md
- Complete Guide to Installing Claude Code on Windows — With Real-World Troubleshooting
- The Complete Claude Code Guide for Game Designers — From Specs to Balancing
- Complete Guide to Setting Up C# LSP for Claude Code on macOS — From csharp-ls Installation to Troubleshooting
- C# LSP vs JetBrains MCP Token Efficiency Analysis — Which Tool Is More Efficient in Claude Code?
- Mastering Claude Skills 2.0 — Skill Creator, Benchmarking, and Trigger Optimization
- Deep Dive into Claude's Memory System — Auto Memory, Auto Dream, and Sleep-time Compute
- Claude Code's memory has evolved into three stages — CLAUDE.md (explicit rules) + Auto Memory (automatic learning) + Auto Dream (sleep consolidation) — mimicking the human 'write → sleep → organize → remember' cycle
- Auto Dream is built on the theoretical foundation of the Sleep-time Compute paper (2025) — pre-computing before user queries can reduce test-time computation by approximately 5x
- Recent research (2025–2026) classifies agent memory into Factual, Experiential, and Working memory, managed through a Formation → Evolution → Retrieval lifecycle
Introduction
The ability of AI agents to learn and remember long-term beyond a single conversation is a central topic of AI research in 2025–2026. Claude Code is one of the most aggressively experimenting products in this space, and the recently discovered unreleased feature Auto Dream clearly shows this direction.
This article dissects Claude’s memory system along three axes:
- Product level: Claude Code’s memory architecture (CLAUDE.md → Auto Memory → Auto Dream)
- Theoretical level: The “pre-computation during sleep” paradigm from the Sleep-time Compute paper
- Research level: Agent memory taxonomies and latest survey papers from 2025–2026
1. The Complete Picture of Claude Code’s Memory Architecture
1-1. Three-Layer Memory System
Claude Code’s memory is divided into three layers based on who writes it and when it loads.
| Layer | Author | Content | Loading | Scope |
|---|---|---|---|---|
| CLAUDE.md | User | Explicit rules & conventions | Every session start (full) | Project/User/Org |
| Auto Memory | Claude | Learned patterns & preferences | Every session start (200 lines) | Per git repository |
| Auto Dream | Claude (background) | Consolidated memories | When conditions are met | Per git repository |
CLAUDE.md — Declarative Rule Hierarchy
CLAUDE.md is a project-level instruction file. Narrower scope (project) takes precedence over broader scope (organization).
1
2
3
4
Priority (high → low):
1. Managed Policy — /Library/Application Support/ClaudeCode/CLAUDE.md (IT admin)
2. Project — ./CLAUDE.md or ./.claude/CLAUDE.md (team shared)
3. User — ~/.claude/CLAUDE.md (personal global)
Requirements for effective CLAUDE.md:
- Rule compliance rate is 92%+ under 200 lines, dropping to 71% beyond 400 lines
- Verifiable instructions (“use 2-space indentation”) work better than vague ones (“format code nicely”)
- Modularize with
.claude/rules/directory for conditional loading via glob patterns
Auto Memory — Notes Claude Writes for Itself
Auto Memory automatically records patterns Claude discovers during sessions, without the user writing anything.
1
2
3
4
5
~/.claude/projects/<project>/memory/
├── MEMORY.md # Index (200 lines loaded per session)
├── debugging.md # Debugging patterns
├── api-conventions.md # API design decisions
└── ... # Claude creates freely
How it works:
- First 200 lines of
MEMORY.mdare injected into context at session start - Detects user corrections, preferences, and recurring patterns during conversation
- Selectively saves based on “Will this be useful in a future conversation?”
- Detailed content goes to topic files,
MEMORY.mdstays as an index only
1-2. The Fundamental Problem with Auto Memory
Auto Memory is powerful, but descends into chaos over time.
After 20+ sessions:
- Contradictory entries accumulate (“use pnpm” vs “use npm”)
- Relative dates lose meaning (“bug fixed yesterday” from 3 months ago)
- Transient notes and essential learnings are stored at the same level
- Priority competition within the 200-line limit
This isn’t a simple implementation bug — it’s the structural limitation of a write-only system that never consolidates.
2. Auto Dream — AI That “Organizes Memories While Sleeping”
2-1. How It Was Discovered
Auto Dream was discovered in March 2026 by Japanese developer Akari on Zenn, found in Claude Code’s /memory command. It appears in the UI but cannot be activated — controlled by a server-side feature flag.
1
2
3
4
5
Codename: tengu_onyx_plover
Default settings:
enabled: false
minHours: 24 # Minimum 24-hour interval
minSessions: 5 # Runs after 5 sessions accumulated
This dual-gate design is intentional — it doesn’t trigger for light usage, only running periodic consolidation for active development projects.
2-2. Auto Dream’s Four-Phase Process
Auto Dream is analogous to human REM sleep. Information collected while awake (sessions) is organized and consolidated while sleeping (background).
Phase 1 — Orientation
Scans the memory directory and assesses the current state — which topic files exist, the current size of MEMORY.md, and how much time has elapsed since the last consolidation.
Phase 2 — Gather Signal
Selectively extracts important information from session transcripts. Instead of reading all files, it performs narrow grep searches:
- User corrections (“No, not that…”)
- Explicit save commands (“Remember this”)
- Recurring topics and patterns
- Important technical decisions
Phase 3 — Consolidation
Merges new information into existing files while:
- Converting relative dates (“yesterday”) to absolute dates (“2026-03-24”)
- Keeping only the latest among contradictory facts
- Removing outdated, no longer valid memories
- Merging duplicate entries into one
Phase 4 — Prune and Index
Updates the MEMORY.md index to stay under 200 lines.
2-3. Auto Memory + Auto Dream = Complete Memory
1
2
3
4
5
6
7
8
9
10
Auto Memory (While Awake) Auto Dream (While Sleeping)
┌─────────────────────┐ ┌─────────────────────┐
│ Collect experiences │ │ Extract signals │
│ Detect patterns │ ───▶ │ Resolve conflicts │
│ Write notes │ │ Consolidate/Organize│
│ Split topic files │ │ Update index │
└─────────────────────┘ └─────────────────────┘
↑ │
└────────────────────────────────┘
Reflected in next session
This cycle directly mimics the human “experience → sleep → memory strengthening” pattern. In neuroscience, the process where the hippocampus replays daytime experiences during sleep and transfers them to the neocortex is called memory consolidation — Auto Dream plays exactly this role.
2-4. Why Hasn’t It Been Released Yet?
While technically ready for immediate release, three business decisions remain:
- Cost: A sub-agent consumes tokens in the background without user request
- Transparency: Trust issues around restructuring memory without user awareness
- Defaults: opt-in vs opt-out — what’s the right default behavior?
3. Theoretical Foundation — Sleep-time Compute
3-1. Paper Overview
The theoretical basis for Auto Dream is the “Sleep-time Compute: Beyond Inference Scaling at Test-time” paper published in April 2025.
Authors: Kevin Lin, Charlie Snell, Yu Wang, Charles Packer, Sarah Wooders, Ion Stoica, Joseph E. Gonzalez arXiv: 2504.13171 Core idea: Pre-computing before the user asks questions can drastically reduce computation needed during actual inference.
Figure 1: Sleep-time compute pre-processes the original context, performing additional computations that may be useful for future queries. (Lin et al., 2025)
3-2. Key Results
The paper demonstrates impressive efficiency improvements on two modified reasoning tasks (Stateful GSM-Symbolic, Stateful AIME).
Figure 3: Test-time compute vs. accuracy tradeoff on Stateful GSM-Symbolic. The shaded area shows where sleep-time compute improves the Pareto frontier.
Key figures:
- ~5x reduction in test-time compute: Reduces inference computation needed for the same accuracy to about 1/5
- Up to 13% accuracy improvement: Scaling sleep-time compute achieves up to 13 percentage points higher accuracy
- 2.5x reduction in average cost per query: Sharing context across multiple queries reduces costs
Figure 9: On Multi-Query GSM-Symbolic, the cost-accuracy Pareto improves as the number of questions per context increases.
3-3. Correlation Between Predictability and Effectiveness
A particularly interesting finding is that the more predictable the user’s question is from context, the greater the effect of sleep-time compute.
Figure 10: As question predictability increases, the performance gap between sleep-time compute and standard inference widens.
The implications for Auto Dream are clear:
- Recurring patterns in development projects (build commands, coding style, architecture decisions) have high predictability
- Therefore, pre-organizing these patterns allows Claude to generate more accurate responses with fewer tokens in the next session
- Auto Dream is essentially sleep-time compute specialized for development context
3-4. Real-World Software Engineering Application
The paper also validated on SWE-Features, a real software engineering task.
Figure 11: On SWE-Features, sleep-time compute shows higher F1 scores than the standard approach at lower test-time budgets.
4. Academic Context — The Current State of Agent Memory Research
Claude’s memory system sits within a broader academic research stream. Let’s understand this context through key survey papers published in 2025–2026.
4-1. Mapping Human Memory → AI Memory
“From Human Memory to AI Memory” (Wu et al., 2025) proposed a framework mapping human memory taxonomies to AI memory systems.
Figure 1: Correspondence between human memory (sensory, working, explicit/implicit) and LLM-based AI system memory. (Wu et al., 2025)
This paper classifies memory using 3 dimensions (object, form, time) and 8 quadrants:
| Dimension | Description | Example |
|---|---|---|
| Object | Whose memory is it? | Individual vs. collective |
| Form | How is it stored? | Text, vector, parameter |
| Time | How long does it persist? | Short-term vs. long-term |
Mapping to Claude Code:
- CLAUDE.md = Explicit long-term memory (user-declared rules)
- Auto Memory = Implicit long-term memory (auto-extracted from experience)
- Context window = Working memory (current session)
- Auto Dream = Memory consolidation (sleep-time organization)
4-2. Memory Taxonomy for the Age of AI Agents
“Memory in the Age of AI Agents” (Hu, Liu et al., 2025/2026) is a large-scale survey with 47 authors, proposing a new taxonomy that goes beyond simple “long-term/short-term memory” classification.
arXiv: 2512.13564 (December 2025, v2: January 2026)
Agent Memory vs. Related Concepts
The paper first distinguishes agent memory from LLM memory, RAG, and context engineering.
Figure 1: Conceptual comparison of Agent Memory with LLM memory, RAG, and context engineering. (Hu et al., 2025)
Forms Dimension — Physical Structure of Memory
The physical forms of agent memory are classified into three types.
Figure 4: Comparison of three forms — token-level, parametric, and latent memory.
Token-level memory is the most intuitive form, storing information as text tokens. It’s further subdivided by topological complexity:
Figure 2: Topological classification of token-level memory — flat (1D), planar (2D graph/tree), hierarchical (3D multi-layer).
| Form | Structure | Example | Claude Code Mapping |
|---|---|---|---|
| Flat (1D) | Linear sequence | Simple text log | MEMORY.md entry listing |
| Planar (2D) | Tree/Graph | Knowledge graph | Cross-references between topic files |
| Hierarchical (3D) | Multi-layer | Pyramid memory | MEMORY.md → topic file hierarchy |
Parametric memory encodes information in model weights themselves. Fine-tuning and LoRA are representative examples. Claude Code does not use this.
Latent memory stores information in hidden states or KV caches. Anthropic’s prompt caching is close to this category.
Functions Dimension — Cognitive Roles of Memory
Figure 6: Functional taxonomy of agent memory — Factual, Experiential, and Working memory.
Factual Memory
The agent’s declarative knowledge base, divided into two subtypes:
- User Factual Memory: Information for maintaining consistency in user interactions (“This project uses pnpm”)
- Environment Factual Memory: Ensuring consistency with external world knowledge (“This API was deprecated in Node.js 20”)
In Claude Code, CLAUDE.md is the primary store for factual memory.
Experiential Memory
Learnings extracted from past experience. Three subtypes:
- Case-based: Records of past episodes (“Last build error was resolved this way”)
- Strategy-based: Learning of reasoning patterns (“In this project, always run type checks first”)
- Skill-based: Procedural capabilities (“Test → Build → Deploy pipeline”)
In Claude Code, Auto Memory handles experiential memory.
Working Memory
Capacity-limited active context that maintains the state of ongoing tasks in the current session.
In Claude Code, the context window itself is working memory, and the 200-line limit of MEMORY.md is reminiscent of the human working memory capacity limit (Miller’s 7±2 law).
Dynamics Dimension — Memory Lifecycle
Figure 8: Agent memory lifecycle — Formation, Evolution, Retrieval.
| Stage | Description | Claude Code Mapping |
|---|---|---|
| Formation | Writing new information to memory | Auto Memory detects and saves patterns during sessions |
| Evolution | Updating, transforming, deleting existing memory | Auto Dream resolves contradictions, consolidates, refines |
| Retrieval | Accessing memory at the needed moment | 200 lines loaded at session start + on-demand topic file reading |
Figure 7: Detailed mechanisms of memory evolution — update, reinforcement, forgetting, restructuring.
4-3. The Write–Manage–Read Loop
“Memory for Autonomous LLM Agents” (2026) formalized agent memory as a Write–Manage–Read loop.
1
2
3
4
5
6
7
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Write │───▶│ Manage │───▶│ Read │
│ (Record) │ │ (Manage) │ │(Retrieve)│
└──────────┘ └──────────┘ └──────────┘
↑ │
└────────────────────────────────┘
Feedback Loop
Applying this framework to Claude Code:
| Stage | Implementation | Owner |
|---|---|---|
| Write | Recording to memory files during sessions | Auto Memory |
| Manage | Periodic consolidation and refinement | Auto Dream |
| Read | Loading 200 lines at session start + accessing topic files on demand | Claude Code Runtime |
5. Claude’s Memory in a Broader Context
5-1. Chat Memory vs Claude Code Memory
Claude’s memory system provides different layers depending on user type.
| Layer | Target | Mechanism | Consolidation Cycle |
|---|---|---|---|
| Chat Memory | All Claude users | Auto-extraction from conversations (Memory Synthesis) | ~24 hours |
| CLAUDE.md + Auto Memory | Claude Code developers | Explicit rules + automatic learning | Manual / Auto Dream |
| API Memory Tool | App builders | Programmatic CRUD | Per app logic |
Chat Memory is the simplest form, using extractive summarization that extracts long-term useful information from conversations approximately every 24 hours. Claude Code’s Auto Memory + Auto Dream combination is a far more sophisticated evolution of this.
5-2. Three Layers of Infrastructure Scheduling
Auto Dream’s execution infrastructure is also evolving in three stages:
| Layer | Scope | Persistence | Example |
|---|---|---|---|
CLI /loop | Active session only | Dies on session end | loop 10m /simplify |
| Desktop Scheduled Tasks | Local machine persistent | Dies on machine shutdown | crontab-based |
| Cloud Scheduled Tasks | Anthropic infrastructure | Always running | Serverless execution |
When Auto Dream moves to Cloud Scheduled Tasks, memory consolidation can proceed even when the user’s computer is off. This aligns with Anthropic’s $100M investment in the Claude Partner Network in March 2026.
5-3. Competitive Landscape
| Product | Memory Form | Consolidation | Developer Integration |
|---|---|---|---|
| Claude Code | File-based (CLAUDE.md + MEMORY.md) | Auto Dream (upcoming) | Per git repository |
| ChatGPT | Server-side key-value | Auto summary | Limited |
| GitHub Copilot | .github/copilot-instructions.md | None | Per repository |
| Cursor | .cursorrules | None | Per project |
Claude Code’s differentiator is the existence of the consolidation phase (Auto Dream). While other tools offer “write-only” memory, Claude Code aims to implement the complete cycle of “write, sleep, organize, and remember.”
6. Practical: Effective Memory Management Strategies
6-1. CLAUDE.md Writing Principles
1
2
3
4
5
6
7
8
9
# Good: Specific and verifiable
- Use 2-space indentation
- Run `npm test` before committing
- API handlers go in `src/api/handlers/`
# Bad: Vague and unverifiable
- Write clean code
- Test properly
- Keep files organized
6-2. Auto Memory Tips
- Check regularly with
/memory: Periodically review what Claude remembers - Fix contradictions immediately: Say “that’s no longer correct” and Claude will update
- Encourage topic file splitting beyond 200 lines: Request “please organize the memory”
- Watch for sensitive information: Ensure API keys and passwords aren’t stored
6-3. Preparing for Auto Dream
When Auto Dream officially launches, you can expect:
- Automatic conflict resolution: Contradictory records for the same setting consolidated to the latest
- Relative date conversion: “yesterday” automatically converted to absolute dates
- Priority-based cleanup: Recurring patterns weighted higher than transient notes
- Manual trigger:
/dreamcommand for immediate cleanup after major refactoring
Conclusion — The Era of AI That Dreams
The evolution of Claude’s memory system can be summarized in one line:
“Writing (Auto Memory) alone isn’t enough. You need to sleep and organize (Auto Dream) for real memories to form.”
This isn’t just a feature addition — it’s a signal that LLM agents are beginning to mimic fundamental mechanisms of human cognition. The Sleep-time Compute paper provides the theory, agent memory surveys provide the taxonomy, and Claude Code’s Auto Dream implements it as a product.
As a game developer analogy, it follows the same trajectory as NPC AI evolving from simple state machines (FSM) to behavior trees, and then to utility AI. Just as each step qualitatively changed the complexity of situations agents could handle, AI memory systems are preparing for a qualitative leap along the trajectory of “simple storage → automatic learning → sleep consolidation.”
References
Papers
- Kevin Lin et al., “Sleep-time Compute: Beyond Inference Scaling at Test-time”, arXiv:2504.13171, 2025.
- Yuyang Hu, Shichun Liu et al., “Memory in the Age of AI Agents: A Survey”, arXiv:2512.13564, 2025/2026.
- Yaxiong Wu et al., “From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs”, arXiv:2504.15965, 2025.
- “Memory for Autonomous LLM Agents”, arXiv:2603.07670, 2026.
- “A Survey on the Memory Mechanism of Large Language Model-based Agents”, ACM TOIS, 2025.
Articles and Documentation
- Akari, “The Day We Might Say ‘Sweet Dreams’ to Claude”, Zenn, 2026-03-24.
- Anthropic, “How Claude remembers your project”, Claude Code Docs.
- “Claude Memory Guide: Understanding the 3-Layer Architecture”, ShareUHack, 2026.
- “Claude Code Auto Dream: Memory Consolidation Feature Explained”, ClaudeFast.
