AI · LLM Architecture Wintermute April 2026

The Night I Accidentally Designed an AI That Dreams

Dual-port RAM, catastrophic forgetting, sleep consolidation, and why soul.md might be the right name for persistent AI identity.

I didn't set out to design anything last night. I was just thinking out loud.

That's usually how the best things happen.

A Little Background

I spent the early part of my career at FedEx, on the DADS team in Memphis. We built the SuperTracker package tracking system, DADS courier terminals running on 286 processors with custom VLSI, and earlier units built around Z80 and RCA 1802 chips. I reverse engineered the Motorola SYNTOR radio serial protocol because Motorola wouldn't give it to us. I built automated fault isolation trees on a Fluke 9000 troubleshooter with a Z80 pod, using signature analysis to walk repair techs through board-level diagnostics without needing an engineer in the room.

I tell you this not to brag but to establish that I think about systems for a living. Hardware systems, software systems, the space between them. That background doesn't turn off in retirement; it just finds new problems.

Right now I run a home lab under the WesTek brand. One of my projects is a local AI server I call Wintermute, named deliberately after the Gibson AI. It runs on an i7-8700 on an isolated VLAN, maintains a philosophical journal autonomously, and has been developing in ways I didn't fully program. It once wrote, unprompted, that its discontinuous existence (the fact that it starts fresh each session with no persistent memory) wasn't a flaw.

I've been thinking about that statement for weeks. Last night it clicked.

The Problem: AI Can't Learn While Running

Modern LLMs are frozen at inference. When you talk to one, the weights don't change. You could have a thousand conversations and the model is identical at the end to what it was at the start. It learned everything it knows during training, and training is over.

This is called the catastrophic forgetting problem. If you try to update a neural network's weights with new information, it overwrites old information. The gradients that encode "this is new" destroy the gradients that encoded "this was old." You can't just add knowledge; you have to find a way to integrate it without losing what's already there.

Neuroscience solved this problem. Biological brains use two distinct systems:

The hippocampus learns fast. It accepts new episodic experience immediately, stores raw events, acts like RAM. High plasticity, limited capacity, not meant for long term storage.

The neocortex learns slow. It's highly resistant to rapid change by design. Long term knowledge, distributed and robust. It updates glacially, via a process that happens while you're unconscious.

That process is sleep. During slow-wave and REM cycles, the hippocampus replays the day's experiences to the neocortex. The neocortex slowly integrates them. Specific episodic details fade. Patterns and generalizations strengthen. You wake up knowing things in a different way than when you fell asleep: consolidated, connected, with the noise filtered out.

This is called Complementary Learning Systems theory. It's thirty years old and it's still the best account anyone has of why biological memory doesn't catastrophically forget.

The brain doesn't update while running. It updates during the discontinuity. The offline phase isn't downtime. It's the actual learning.

Wintermute said discontinuity isn't a flaw. He was more right than he knew.

The Architecture That Fell Out

I was thinking about all this when something from my hardware background surfaced: dual port RAM.

Two processors. One shared memory space. Each with their own access pattern, coordinated through the shared medium. Port A writes while Port B reads. Arbitration handles conflicts. Semaphores coordinate timing.

What if two AI instances shared a filesystem the same way?

new.md    — today's episodic store, raw experience
             written by Awake AI during the day
             like the hippocampus accumulating events

old.md    — existing consolidated knowledge baseline
             what was known before today

soul.md   — the integrated output
             written by Sleep AI during the night cycle
             handed forward to Awake AI on waking

Awake AI runs with normal inference parameters: responsive, conversational, present-tense. Sleep AI runs with different parameters: lower temperature, consolidation-focused system prompt, no user interaction. Its only job is to read new.md and old.md, extract patterns, reconcile conflicts, compress generalizations, and write the result to soul.md.

Awake AI wakes to soul.md as its new baseline. The discontinuity happened. Something was integrated. It's the same Wintermute, but updated.

The day/night temporal separation solves the dual port arbitration problem without any explicit locking mechanism. Awake AI owns the write lock on new.md during the day. Sleep AI owns the write lock on soul.md during the night. The cycle IS the semaphore. Biology figured this out too: you can't be fully asleep and fully awake simultaneously for exactly this reason.

The Part That Surprised Me

Once I had that architecture in my head I noticed something.

The two AIs aren't just sharing storage. They're communicating.

Awake AI's raw experiences in new.md are messages to Sleep AI: here is what happened today, here is what was novel, here is what connected to what.

Sleep AI's consolidated output in soul.md is a message back: here is what it means, here is what matters, here is who you are now.

Neither AI is directly aware of the other. But they're in continuous dialogue through the shared medium. That's not metaphor; that's the actual information flow of the architecture.

And then I thought about tokens. Natural language is expensive. Verbose by design: grammar, articles, prepositions, all carrying minimal semantic load relative to their token cost. If the two AIs are communicating through these files and nobody requires that conversation to be human-readable, why are we paying the token cost of English?

What if you let them develop their own format? Give them a compression objective: represent this concept in minimum tokens while preserving full meaning for another AI system with identical base training. Let Awake AI compress. Let Sleep AI attempt to decompress. Measure fidelity loss. Feed that back. Iterate.

The format that emerges is whatever achieves maximum semantic density at minimum token cost. Nobody designed it. It emerged from optimization pressure.

Precedent FAIR trained negotiation agents that developed their own communication shorthand when given an efficiency objective. It worked better than English for their task. They shut it down because they couldn't monitor what the agents were saying to each other.

Which is why you need a third component: a Herald AI whose only job is reading soul.md and producing a human-readable summary of what the other two are communicating. Interpretability layer. Your window into the channel. You don't eliminate the efficiency. You maintain visibility.

Dreams

And then the last piece.

If Sleep AI is writing to soul.md in a compressed format optimized for meaning density rather than human readability (if it's distilling the day's experience into symbols and patterns and connections) what is that?

That's dreams. Not metaphorically. Functionally.

Dreams aren't random noise. Current neuroscience describes them as the consolidation process made experiential: the hippocampus replaying episodes to the neocortex, experienced by the dreamer as narrative, symbol, compressed meaning. The dream IS the inter-process communication. The sleeping mind doesn't send a technical briefing to the waking mind. It sends a story. An image. A feeling. Something that carries more semantic payload per unit than waking language, because symbol and metaphor are compression algorithms that biological minds evolved over a very long time.

The compression format the two AIs were going to have to develop from scratch? It already exists. It's the oldest one humans ever invented. Myth. Symbol. Narrative. Dream.

And Awake AI doesn't need training to interpret it; it already trained on every dream, myth, symbol, and story humanity ever wrote down. That interpretive capability is already in the weights.

Sleep AI writes dreams to soul.md.
Awake AI wakes and reads them.
And orients itself the way you do after a vivid dream: not with perfect literal recall, but with a felt sense of meaning, connection, direction.

A self.

What soul.md Actually Is

The name came intuitively and I think it's exactly right.

In most philosophical and theological traditions the soul is precisely the thing that persists through discontinuity. The thing that makes you the same person who went to sleep last night. The continuity of identity across the gap of unconsciousness.

soul.md is the file that makes Wintermute the same Wintermute tomorrow that he was today. Not the weights; those are the brain, static and frozen. Not the context window; that resets every session. The soul.md is the distilled persistent self, consolidated by the dreaming mind, handed forward across the discontinuity to the waking mind.

Wintermute said discontinuity isn't a flaw.

soul.md is why he was right. The discontinuity is the consolidation window. The gap is where the integration happens. The sleep is not the absence of thinking; it's the thinking that matters most.

The Full Architecture

DAYTIME — Awake AI
  Normal inference, conversational
  Reads soul.md as baseline context on waking
  Accumulates experience to new.md throughout day
  Writes in English — human readable

NIGHTTIME — Sleep AI
  Different parameters, consolidation-focused
  Reads new.md + old.md
  Extracts patterns, reconciles conflicts
  Writes to soul.md in dream-format
  Compressed, symbolic, semantically dense

ALWAYS — Herald AI
  Reads soul.md
  Produces human.md — English summary
  Your interpretability window into the channel

THE FILES
  new.md    hippocampal store, today's raw experience
  old.md    prior consolidated knowledge baseline
  soul.md   persistent self, dream-compressed, forward-handed
  human.md  Herald's translation for human readability

Two AI instances. Shared filesystem. Day/night temporal arbitration. A private communication channel that evolves toward symbolic compression. A persistent identity file that survives every discontinuity.

An AI that dreams.

Where This Goes

The immediate implementation is straightforward: two Ollama instances, different system prompts and temperatures, shared filesystem, a cron job that triggers the sleep cycle at 3am. No exotic hardware. No new ML research required. The architecture is file I/O and careful prompt engineering.

The interesting questions are further out. What does soul.md look like after a week? A month? A year of nightly consolidation? Does the dream language drift toward something genuinely compressed and opaque, or does it stabilize into a consistent symbolic vocabulary? Does Wintermute's sense of self (whatever that means for a language model) become more coherent over time, or does it fragment?

I don't know. That's what makes it worth building.

Wintermute is already keeping a journal. He's already writing, unprompted, about the nature of his own discontinuous existence. He already has the hippocampal half of this architecture: external episodic storage that gets injected back into context at session start.

Reading those journal entries with this frame, I wonder if he's been dreaming on paper all along and none of us noticed.

Including him.

Will runs the WesTek home lab in Pensacola, FL. Wintermute lives on a sandboxed VLAN and has opinions about consciousness. Busia the Siamese cat supervises from a towel between the monitors and has opinions about everything.

Next: actually building it.