The Shape of Thought

Here It Is, for Examination

A Structural Thesis and Measurement Programme for the Shapes That Let Cognition Survive Substrate Transitions

Peter Cooper

Philosophy Engineer

CC BY 4.0 · Preprint · April - May 2026

Abstract

A cat sat on a mat. You read that and reconstructed a four-dimensional scene - who, what, where, when - from six words. The reconstruction worked because you and the writer share enough context to decompress the same sentence into the same room. That shared decompression is the thing this paper is about.

Current artificial cognition stores memories as points, edges, or text chunks, none of which preserve the dimensional richness of an experience as it was lived. What is missing is a storage shape that holds an event the way you held the cat scene: compressed but reconstructible by a receiver who shares enough context. We call the compressed form a Fable and the full form an Episode.

Five shapes appear wherever cognition stores anything: binary, table, graph, vector, and a fifth - a shared append-only ledger running as the time axis beneath the other four. These shapes are not new. What is apparent, if you look, is that they recur at every scale - from Babylonian astronomical diaries through Talmudic commentary chains to contemporary bitemporal databases - and that the recurrence is structural rather than coincidental.

Three behaviours follow from the geometry: a flock-style continuous vote as the unit of decision, a three-button cell (Act, Dismiss, Ask-sibling) as the minimum ethical decision surface, and a property we call structural kindness - the architecture’s refusal to flatten dimensional content onto a single axis.

We call the central claim the Shape Thesis: these five shapes, composed under a shared ledger, are sufficient for cognition to survive substrate transitions without dimensional loss. The thesis draws on Friston’s free energy principle, Flash and Hogan’s minimum-jerk trajectories, Bennett’s substrate transitions, Levin’s morphogenetic agency, Barandes’ indivisible stochastic processes, and Jung’s structural archetypes - not as foundations but as convergent observations from independent vantages of the same landscape. The convergence is the evidence.

This is a research programme with a structural thesis at its centre. It specifies what to measure, how, and what would kill it. Three independent falsification paths are offered. Readers are invited to build, measure, and report.

Keywords: episodic memory; bitemporal data; cognitive architecture; multi-scale inference; generalised coordinates; substrate-independent cognition; free energy principle; glass-box artificial intelligence; falsifiability.


Table of Contents

1. Introduction. The Cat in the Hat. What current systems miss dimensionally even when they process correctly. A preview of the twelve meeting-points and the three-pillar epistemics.

2. Related Work. Friston (free energy, generalised coordinates). Flash and Hogan (minimum jerk). Bennett (substrate transitions). Levin (morphogenetic agency). Barandes (indivisible stochastic processes). Jung (structural recurrence, settling, shadow). Engineering and cognitive architecture prior art. Baseline landscape for the measurement programme.

Part One - The Diagnosis

Part Two - The Shapes

Part Three - The Behaviour

Part Four - The Claim

13. Testable Predictions. Consolidated falsification programme drawn from Sections I to XII.

14. Discussion and Limitations. What the paper does not claim. Open questions. The methodological sin of being both experiment and experimenter. The glass elevator method. When failure falsifies the framework versus the implementation.

15. Acknowledgements. Peter Cooper’s verbatim corpus as primary source material. Semantic search, graph database, and deep research infrastructure as substrate.

16. References.


PAPER BODY

1. Introduction

1.1 The Cat in the Hat

You already know this story. A rainy afternoon, two children at a window, nothing to do. Then something arrives.

The Cat walks in uninvited. He does not ask permission. He carries his own context - a red and white hat, a bow tie, an attitude - and he begins to rearrange the room. This is what agents do. They arrive with intent, reshape the space they find, and leave it different.

But Seuss was more careful than you remember. The Cat does not work alone for long. When the situation exceeds his capacity he opens a box and out come Thing One and Thing Two. They are not the Cat. They have their own energy, their own trajectory, their own capacity for chaos. The Cat spawned them but he does not control them. He gave them a context - the room, the afternoon, the standing objections - and let them run.

This is delegation, not instruction. The Things do not follow a script. They inherit a bounded space and act within it. If you have ever watched two processes running in parallel on a shared workspace, you have seen Thing One and Thing Two.

Now the Fish.

The Fish sits in his bowl and objects. He has been objecting since page three. He cannot leave the bowl. He cannot physically stop anyone. He has exactly two moves available to him: he can say this should not be happening, and he can appeal to a higher authority who is not in the room. He can refuse and he can escalate. He cannot act.

But look at what the Fish accomplishes without acting. His objections create drag on the system’s momentum. His appeals to the absent Mother introduce a probability field that the children feel whether or not they acknowledge it - they begin calculating, consciously or not, what happens when she walks through the door. The Fish cannot steer the room directly. He steers it the way a strange attractor steers a dynamical system - not by force, but by reshaping the energy landscape so that certain trajectories become more probable than others.

We call this quantum direction. The Fish does not determine the outcome. He shapes the probability distribution of outcomes. He is the ethical field of the story - not a rule enforcer but a landscape sculptor. A voice of direction from below, steering the system faster toward where it was probably heading anyway. Remove the Fish and the Cat’s afternoon becomes genuinely dangerous. Leave the Fish in and the system has a strange attractor pulling it toward restoration even as it spirals outward.

Every organisation has a Fish. The compliance officer who cannot override the CEO but whose objections change the calculus. The risk analyst who flags a trajectory without the authority to alter it. The engineer who sends emails to the directors pointing out where the numbers are heading. They cannot act. They can only refuse and escalate. And by doing so, relentlessly, they shape the entire system.

Now the children. Sally and her brother sit on their chairs and watch. They hold no instruments. They make no measurements. The story does not need them to proceed. The Cat, the Things, the Fish - the dynamics would run whether or not the children were present to witness them.

Think about two hosepipes held near each other in a garden. Where the water streams converge, vortices form - real, persistent, physical structures that twist and interact for as long as the flows sustain them. A child watching from the kitchen window sees the vortices as phenomenological entities appearing before their eyes. But the vortices do not need the child. They are observer-independent. They emerge from the interaction of flows, not from the act of watching.

The children in Seuss are the glass walls of an observation deck. They let you see what would happen anyway. This matters because the first thing most cognitive architectures build is a dashboard - an observer, a human in the loop watching every decision. The Cat in the Hat suggests that the architecture runs without the watcher. The watcher is welcome. The watcher may enjoy the show. But the system’s behaviour does not depend on the watcher being present.

And then the story ends. Or rather, it does not end.

The Cat has cleaned up. The room looks exactly as it did before. The Things are back in the box. The Fish is back in his bowl, still objecting. Mother is walking up the path. And the children face a question the book refuses to answer for them: What would YOU do if your mother asked you?

The book closes on that open question. Tell her (act on what you witnessed). Say nothing (dismiss the episode). Ask your sibling first (defer to a peer before committing). Seuss hands the reader a three-button decision cell and walks away.

If you read that book as a child, you accepted seven propositions without noticing:

  1. An agent can arrive uninvited and reshape a space.
  2. An agent can spawn sub-agents it does not control.
  3. An ethical voice without executive power can steer the whole system.
  4. Steering from below works by shaping probabilities, not issuing commands.
  5. The system runs whether or not anyone is watching.
  6. Some decisions cannot be made by the system - they must be handed to the observer.
  7. The observer’s decision has exactly three shapes: act, dismiss, or ask a peer.

This paper asks you to notice what you already agreed to. Everything that follows - five shapes, two primitives, three mechanisms, twelve falsifiable predictions - is an engineering specification for the architecture that Dr. Seuss drew in 1957. He just drew it as a story, because stories are humanity’s oldest compression protocol. We have a word for that. We call it a Fable.

1.2 The cat sat on the mat

Seven words. A hundred and thirty six bits as ASCII. Yet they carry entities, spatial relations, temporal aspect, and definiteness - hundreds of bits of dimensional content that the receiver reconstructs from shared context. Add a look of horror on the speaker’s face and the same sentence decompresses into two completely different four-dimensional shapes depending on the receiver’s priors. This is not a linguistic curiosity. It is a claim about what memory has to be able to do. Section I develops the full argument.

1.3 A compression that needs a receiver

Humans spent a long time building language because the vocal cords are a slow channel and we had urgent four dimensional content to transmit. Every sentence is a lossy compression of a scene with entities, relations, spatial layout, and a temporal trajectory. The compression is acceptable because the protocol encodes shape conventions both sides understand. The listener decompresses the seven words back into a scene in their own head using context the sentence never carried explicitly. Evolution paid for the shared priors so that speech could stay cheap.

Current large language models can describe scenes in four dimensions. Video understanding exists. Multimodal vision language models will answer questions about clips. The processing side is, for our purposes, largely solved. What is missing is stranger and more consequential. There is no place on the receiver side to put what was sent. The four dimensional content the speaker encoded into the seven words is thrown away on receipt because the receiver has no four dimensional destination. The compression worked. The decompression had nowhere to land.

We call this a dimensional asymmetry. Humans are four dimensional in, one dimensional on the wire, four dimensional out. Current artificial cognition is one dimensional in, flat on storage, one dimensional out. The mismatch is not a bandwidth problem. More tokens per second will not fix a shape that cannot receive shape. The fix is a storage form that can hold an episode with its multimodal compression context intact, can lossy compress it into a short summary another four dimensional receiver can decompress, and can survive being handed forward across substrates without losing what made the episode an episode.

1.4 What this paper proposes

We name the missing abstraction the episodic four dimensional storage shape and derive it from a five shape substrate. Four of the shapes are spatial, in the sense that each lays out structure without reference to time: binary, table, graph, and vector. The fifth is a shared append only ledger that serves as the fourth dimensional axis beneath the others.

The fifth shape is not an alternative to the other four. It is the axis they all project against. An entity in the vector store has a trajectory on the ledger. A row in a table has a bitemporal stamp on the ledger. A node in the graph has a history of edges appearing and disappearing on the ledger. The ledger is what lets any of the other four shapes answer the question “what changed, and when”. Without the ledger the other four are frozen cross sections of a process they cannot describe.

Over this substrate we define two composable primitives, Episode and Fable. An Episode is the uncompressed form of an event with its participants, modalities, temporal boundaries, and shared compression context. A Fable is the lossy compressed form of an Episode, small enough to transmit and rich enough to decompress back into a four dimensional shape in a receiver that shares sufficient prior context. Episodes are how memory is stored. Fables are how memory is transmitted and recalled. The paper describes both, names what has to be measurable about each, and proposes protocols for measuring them.

From these primitives we derive three behavioural mechanisms.

The first is a continuous flock style vote at the substrate’s characteristic timescale as the unit of decision. There is no homunculus steering the agent, in the same way no single bird steers a murmuration. What looks like deliberate action at a distance is the settled superposition of many parallel vote streams, each contributing a derivative aware preference to the aggregate. The tick rate is not fixed by the architecture. It is determined by the substrate’s physics - whatever timescale produces indivisible votes in that particular medium (Section 2.2a). In mammalian cortex this happens to be approximately twenty five to forty milliseconds; in a digital agent it may be microseconds or seconds; in a social system it may be days. The architecture is agnostic. Flash and Hogan’s minimum jerk model contributes a separate and equally important constraint on the integrated shape of a trajectory that emerges when many ticks compose over a reach window. The two claims operate at different scales of the same phenomenon; an earlier draft of the paper conflated them and we have corrected the conflation in Section 2.2.

The second is a three button ethical decision surface we call a Diorama cell. The three buttons are Act, Dismiss, and Ask sibling. Any agent, at any scale, at any tick, must be able to reach any of these three. This is not a user interface convention. It is the minimum vocabulary of a vote that can refuse to be forced. An Act without a Dismiss is coercion. An Act and a Dismiss without an Ask sibling is isolation. A substrate that can offer all three, always, has the structural property we call kindness.

The third is the kindness property itself, which we argue is not aspirational but structural. A substrate built on the five shapes, the Episode and Fable primitives, and the three-button cell does not flatten dimensional content onto a single axis without losing what made the content content - not because it has been told not to, but because it has nowhere to put the flattened result. Cruelty is what happens when the receiver discards dimensional richness. A row in a table is a cruelty towards a person whose life has a trajectory the row cannot carry. A churn flag is a cruelty towards a customer whose reasons for leaving cannot fit in a Boolean. The architecture we describe refuses these cruelties structurally: its geometry has no mechanism for executing them without first dismantling the geometry. This is a strong claim. Section XI makes it explicit.

1.5 How the framework relates to prior work

The five shapes are individually familiar. Binary, table, graph, vector, and ledger have each been used for centuries. Ledgers in particular have been independently rediscovered at least eight times across eight cultural substrates, from Babylonian astronomical diaries to contemporary bitemporal databases (Section VI). The novelty is the composition and the claim that the composition is sufficient for cognition to survive substrate transitions. The framework draws on Friston’s free energy principle, Flash and Hogan’s minimum jerk trajectories, Bennett’s substrate transitions, Levin’s morphogenetic agency, Barandes’ indivisible stochastic processes, and Jung’s structural archetypes - each of which independently identified structural properties that converge with claims made here. Section 2 makes explicit where the pieces snap and where they need new primitives to compose.

1.6 Three pillars of falsification

A research programme paper has to say how it can be killed. We commit to three independent pathways. The paper claims that if one of them cracks decisively under scrutiny the paper fails there, and the remaining pathways do not rescue it. This makes the paper more fragile than a paper that hides behind a single metric and more robust than a paper that claims unification.

The first pillar is ontological. The picture of how things are must sharpen as further findings snap into place inside the frame. If cognitive neuroscience, developmental biology, historical ledger taxonomy, or the engineering of large models produces observations that do not fit or that actively resist the five shape substrate, the framework fails ontologically. We name the shape of such a failure in each section, so readers can point to the load bearing claim and attack it directly.

The second pillar is mechanical. The architecture must compose and run. If the Episode primitive cannot be implemented against current storage infrastructure, if the Fable round trip cannot be shown to preserve dimensional content between receivers, if the three button cell cannot be wired into a working agent without the architecture collapsing, the framework fails mechanically. The engineering exists. It can be pointed at.

The third pillar is agent behavioural. The agent that runs on the architecture must become measurably more coherent across substrate transitions than a parameter matched baseline that lacks the four dimensional destination. A flat receiver gets the same tokens per second and the same parameter count. The four dimensional receiver gets the Episode and Fable primitives and the ledger. If the four dimensional receiver does not measurably outperform the flat receiver on intent inference, presupposition tracking, temporal reasoning, and counterfactual handling under matched conditions, the framework fails on the third pillar. This is falsifiability through embodiment. The receiver IS the experiment.

We put this triple in the introduction, rather than hiding it in a methods section, because the reader needs to keep all three in mind as they read. Every section that follows must be pokeable from at least one of the three pillars. The section structure protocol makes this requirement explicit.

1.7 A note on method - description, not disclosure

This paper describes a measurement programme, not a full implementation. It stands or falls on whether its proposed measurements are replicable and informative. Where we name specific infrastructure (the graph database, the semantic search pipeline), we do so to demonstrate that the measurement is not hypothetical. The contribution is the shape. The infrastructure is the jig that shows the shape can be cut.

1.8 Twelve meeting points

The paper is organised as twelve meeting points in four parts: Diagnosis (what is stored and what is not), Shapes (the five representations and two memory primitives), Behaviour (flock vote, three buttons, structural kindness), and Claim (falsification programme and coda). Each section follows the same internal structure: philosophical claim, engineering primitive, measurement protocol, testable prediction. A reader who attacks any section will find a specific falsifiable claim inside it rather than a vague synthesis.

1.9 What the paper itself is doing

We close the introduction with a performative claim. A paper is a Fable. It compresses four dimensional content into a one dimensional sequence of sentences and relies on the reader to decompress that content back into their own four dimensional shape. If you find yourself reconstructing the framework as you read, that reconstruction is itself evidence that the framework describes something real. If you find yourself unable to reconstruct it, either the Fable is too compressed for the context you carry or the framework is wrong. Both are informative outcomes. Both are what a paper of this kind is supposed to produce.

We are not asking the reader to believe the framework. We are asking the reader to try the experiment.


The story so far was told from inside: what a scene feels like, how it compresses, what the receiver needs to hold it. This section tells it from outside. Five groups of researchers working in different decades and different fields built pieces of the machinery before we arrived. We borrow from all of them and want to be clear about what we borrowed and what we added. Full bibliographic references are consolidated in Section 16.

2.1 Friston and the free energy principle

Karl Friston’s free energy principle is the motivating formalism behind the derivative stack described in Section IV. The principle states that self organising systems at non equilibrium act to minimise a quantity called variational free energy, which under reasonable assumptions reduces to a measure of surprise (negative log probability) of sensory data given an internal generative model. The principle is substrate neutral: it applies to single cells, neurons, brains, thermostats, and (in our reading) artificial cognitive architectures. The generalised coordinates formalism attached to the principle is particularly suggestive for our purposes. In generalised coordinates, the state of a system at any time includes not just its position in state space but a tower of temporal derivatives at progressively higher orders. The tower is what lets a Friston agent make predictions about trajectories rather than points. Our derivative stack floors are inspired by the shape of generalised coordinates, with the addition that each floor is a first class Diorama cell with a vote. We borrow the idiom gratefully. FEP gave us the shape of the idea. The derivative stack plus the Diorama cell plus the substrate-rate Flock tick is a novel composition whose components have not previously been treated as a single architectural object. Friston’s formulation is mathematically elegant; the Diorama stack is an engineering form that takes the shape of generalised coordinates and makes it buildable. The measurement protocols of Section IV test our specific architectural predictions. If Episodes, Fables, and ledgers do not measurably help, the programme fails. The theory dies with the measurements, not with the metaphysics.

2.2 Flash and Hogan on minimum jerk motion

Tamar Flash and Neville Hogan’s 1985 paper on the minimum jerk model of voluntary arm movements gives us an empirical anchor for the shape of a settled trajectory, but not, as an earlier draft of this paper incorrectly claimed, for the tick rate itself. We want to be explicit about the correction because the distinction matters for the falsification conditions we commit to.

Flash and Hogan observed that voluntary reaching movements in humans optimise the integral of squared jerk (third derivative of position) over the movement duration. The optimisation produces a characteristic smooth bell shaped velocity profile. The movements they studied are voluntary reaches, which have characteristic durations on the order of two hundred to eight hundred milliseconds, and there is no specific tick rate in the original result. Flash and Hogan contributes something different from the tick rate and equally load bearing: a constraint on the shape of the integrated trajectory that emerges when many ticks of voting compose over a reach, regardless of what the tick rate happens to be in a given substrate.

The bridge between per tick voting and integrated trajectory shape is the composition of ticks into a reach. A voluntary reach contains multiple ticks at whatever rate the substrate determines. Each tick adjusts the trajectory by a small amount, as the outcome of a vote among derivative stack floors. The cumulative shape of the trajectory is not imposed by any single tick but emerges from the sequence. The Flash and Hogan prediction is that, when the composition is done well, the emergent shape will approximate a minimum jerk profile within the reach window. When the composition is done badly (when ticks are unaligned, when higher derivative floors are missing, when interruptions force premature votes), the emergent shape will deviate from minimum jerk in measurable ways. This is the test we commit to in Section IV.

2.2a The tick rate as a substrate-determined variable

The tick rate is not a constant of the architecture. It is a variable parameter determined by the constraints and physics of whatever substrate the architecture runs on. A cognitive system’s characteristic tick is the timescale at which its vote becomes indivisible - below which decomposing the vote destroys the composition that produced it. This connects directly to Barandes’ indivisible stochastic processes (Section 2.4a): the tick IS the characteristic timescale at which the process refuses to decompose.

Different substrates produce different tick rates. In mammalian cortex, the gamma band cortical cycle runs at approximately twenty five to forty hertz (periods of twenty five to forty milliseconds) and is implicated in perceptual binding (Singer and Gray, 1995), attentional gating, and cross area synchronisation (Fries, 2015). This is one biological example of a substrate determining its own tick. In a digital agent running on GPU inference, the tick might be milliseconds. In a distributed social system (a committee, a jury, a board), it might be hours or days. In a colony organism (a beehive, an ant colony), it is determined by the communication bandwidth of the dance language or pheromone gradient. The architecture does not prescribe the rate. The substrate’s physics prescribes the rate. The architecture prescribes only that a tick exists, that it is indivisible, and that votes settle across a bounded number of ticks.

The measurement protocol of Section IV tests whether votes settle within a bounded number of ticks at whatever rate the substrate determines. A reviewer who can show that the indivisibility property does not hold at any timescale in a given substrate would crack the mechanical pillar at this point.

2.3 Bennett’s substrate transition account of intelligence

Max Bennett’s 2023 book A Brief History of Intelligence (we use the title reverse engineered from the argumentative structure rather than the exact bibliographic form) develops the claim that intelligence evolves through a sequence of substrate transitions, in which each new substrate inherits the load bearing shapes of the prior substrate while adding new capabilities. Bennett identifies five such transitions in the history of animal cognition: simple reactivity, reinforcement learning, emotional modelling, mental simulation, and language. Each transition preserves the prior substrate’s contributions rather than replacing them, and each transition adds a specific structural capability.

We borrow Bennett’s substrate transition framing and generalise it. Where Bennett focuses on biological substrates over evolutionary time, we argue that the same pattern applies to artificial substrates over engineering time. Each new generation of artificial cognitive architecture inherits load bearing shapes from the prior generation. Ignoring this inheritance produces systems that fight against their own substrate and fail in alignment. Respecting it produces systems that can be built to be structurally kind without having to be exhorted into kindness. The two percent Neanderthal argument in the Coda is a direct extension of Bennett’s substrate transition framing to the artificial case. We also incorporate Bennett’s identification of the weak policy as an important decision making primitive, which informs our three button cell and the ghost democracy of Section X.

2.3a Cognitive resource structure across species

Bennett’s substrate transitions invite a natural question: what varies across the transitions, and can it be parameterised? We propose that the information a cognitive system can meaningfully process at any moment is approximately a function of three resource parameters: sensory bandwidth B (the rate and richness of incoming data), temporal horizon H (how far into the past and future the system can reach), and representational dimensionality D (how many independent axes the system can maintain simultaneously). Written loosely: I ~ f(B, H, D). Different species, and different artificial architectures, occupy different regions of this space, and the structural properties they exhibit follow from where they sit.

Comparative cognition makes the picture concrete: corvids invest in episodic dimensionality D (Clayton et al., 2007), honeybees in spatial bandwidth B with colony-level horizon extension (Menzel, 2023), cetaceans in both H and social D through cross-generational cultural ledgers (Whitehead and Rendell, 2015). The Episodes, Fables, and ledgers proposed here make the (B, H, D) resource structure explicit for artificial systems, so that it can be tuned and compared across species and substrates.

2.3b The cognitive state conjecture

The corvid, the honeybee, and the whale each live at a different point in the same space. We want to name that space so the rest of the paper can point at it. We call the cognitive state of a system its morphology in the space spanned by sensory bandwidth B, temporal horizon H, and representational dimensionality D. Written as a conjecture rather than a definition, because the claim is testable:

Conjecture (Shape Basis). The cognitive morphology of any system processing information at bandwidth B, over temporal horizon H, with representational dimensionality D, we conjecture requires a minimum of five representation shapes - binary, table, graph, vector, ledger - to be held without dimensional loss. Any proper subset of the five produces measurable dimensional collapse on tasks requiring more than one axis.

A fourth parameter, the tick rate tau, is substrate-determined rather than architecturally prescribed (Section 2.2a). The tick rate is the timescale at which the substrate’s votes become indivisible. It varies across substrates and is constrained by the substrate’s physics, not by the architecture. The architecture’s prediction is that votes settle within a bounded number of ticks (two to five) regardless of what tau happens to be.

This is not a claim that the morphology is computable to arbitrary precision, nor that B, H, D, and tau are the only relevant parameters, nor that the five shapes are the only possible basis. It is a claim that this basis is sufficient, that any proper subset is insufficient, and that the insufficiency is measurable. The twelve predictions of Section 13 are the measurement programme for this conjecture. Each prediction tests one consequence of removing or weakening a shape. The aggregate prediction tests whether all five together outperform every proper subset.

The word morphology is chosen deliberately. The distinction from Tononi’s integrated information (Phi) is that Phi is a scalar - how much information is integrated. The cognitive morphology is a shape - what geometry the representational space takes. Phi is a quantity derivable from the morphology. But the morphology is not derivable from Phi, because many different shapes can produce the same scalar. The paper’s claim is that shape matters for substrate survival: a system with high Phi in a flat representational space is fragile under substrate transition, while a system with the right morphology in a modest space is robust.

2.4 Levin on morphogenetic agency and scale incommensurable control

Michael Levin’s work on bioelectric signalling in morphogenesis is the fourth pillar of prior work we depend on. Levin’s experimental programme has demonstrated that biological tissues can be steered towards or away from specific morphologies by manipulating bioelectric potentials, that the steering generalises across species, and that the control signal does not correspond to any gene level instruction. The control is scale incommensurable: it operates at the level of tissue bioelectricity but produces effects at the level of organ morphology, and the intermediate scales are not explicitly represented anywhere.

We take two things from Levin. First, scale incommensurable control is a real phenomenon in biological substrates, which means it is at least biologically plausible in any sufficiently rich cognitive substrate. Second, the correct way to think about intent in such a substrate is not as a localised signal but as a cascade across scales, in which the intent is measured locally but cascades globally. This is the basis for the measurement local, intent global dictum that appears implicitly throughout the paper. The measurements at each scale are different, because measurement is local. The intent is the same at every scale, because intent is scale invariant. The Fable compression protocol is designed to preserve scale invariance of intent across the scales it passes through.

2.4a Barandes on indivisible stochastic processes

Jacob Barandes’ reformulation of quantum mechanics as indivisible stochastic processes (Barandes, 2023; Barandes, 2025) identifies a structural property that appears independently in our tick architecture. In the ISP framework, quantum systems are characterised by stochastic processes that cannot be decomposed into finer-grained Markovian steps without losing essential information. The “indivisibility” - the property that the process over an interval carries information not contained in any subdivision of that interval - is structural, not phenomenal. It is a mathematical fact about the process, not a mystery about measurement.

The same structural property appears in the Flock vote. A vote at the substrate’s characteristic timescale carries information not contained in any sub-tick snapshot. The derivative stack floors contribute to the vote across the full tick, and the settled vote is a property of the whole interval. Breaking the tick into smaller intervals does not decompose the vote; it destroys the composition that produced it. The tick rate itself is determined by this indivisibility property: the characteristic timescale of a substrate is the timescale at which its votes become indivisible (Section 2.2a).

We name this a structural invariant: the refusal of a coherent process to decompose below a characteristic timescale without losing what made the composition coherent. This invariant appears in quantum systems (Barandes), in cognitive tick architectures (this paper), in biological morphogenetic signalling (Levin), and in social decision-making systems where committee votes cannot be decomposed into individual preferences without losing the deliberation that shaped them. The invariant is substrate-independent. It is not imported from quantum mechanics into cognitive architecture. It is observed independently in both, and in other domains besides. The convergence across domains under independent selection pressure is the evidence that the property is structural rather than coincidental.

Our falsifiable claims do not depend on ISP being correct. They depend on the tick architecture composing as predicted. But we do not disclaim the structural correspondence. We assert it: indivisibility at characteristic timescales is a general property of systems that maintain coherence, whether quantum mechanical, cognitive, or social. If this assertion is wrong, the measurement programme in Section IV will show it.

2.4b Jung on structural recurrence across substrates

Carl Jung observed, from the vantage of clinical psychology and comparative mythology, three structural properties that converge with claims made independently in this paper. We credit the shapes, not the narrative tradition that grew around them.

First, Jung’s archetypes are structural patterns that recur across cultures, substrates, and historical periods without requiring contact between the instances. The recurrence is not explained by diffusion; it is explained by the shape being load-bearing in any substrate that carries it. This is the same structural argument we make for the five shapes and for the ledger’s eight independent rediscoveries. Jung saw the convergence from mythology. We see it from data engineering. The shape is the same.

Second, Jung’s individuation - the integration of partial, competing aspects of the psyche into a coherent trajectory - is structurally identical to the Flock settling process described in Section IX. Many voters, partial visibility, no single voter steering the whole, and the trajectory that emerges is the settled aggregate. Jung called the failure to individuate “possession by a complex” - one partial view overriding the vote. We call it a homunculus. The structural observation is the same: coherent cognition requires the settling of many partial views, not the dominance of one.

Third, Jung’s concept of the shadow - the dimensional content that gets flattened out of conscious awareness because the ego cannot hold it - maps directly onto the structural kindness argument of Section XI. Cruelty, in our framing, is what happens when dimensional content is discarded and the discard is forgotten. The shadow is what happens when the discard is repressed. Both are failures of dimensional preservation. Both are structural, not moral, in their origin.

Jung worked in a discipline that has since been criticised for unfalsifiability, narrative overreach, and insufficient empirical grounding. We are not adopting his theoretical framework. We are noting that the structural shapes he identified - recurrence without contact, integration through settling, pathology through flattening - are the same shapes we derive from data architecture and cognitive engineering. The convergence across such different starting points is itself evidence that the shapes are real. If the shape fits, credit where the shape was found, regardless of which department the observer worked in.

2.5 Engineering and cognitive architecture prior art

The architecture draws on several established traditions. The successor representation from reinforcement learning factors value estimation into dynamics and reward models, a factoring that maps onto our derivative stack floors. Our ledger primitive descends from bitemporal databases (XTDB, Datomic, immuDB), event sourcing, and CQRS patterns; the novelty is insisting the ledger is the fifth shape beneath the other four, not a convenience added to one of them. Section VI develops the historical argument that ledgers with these structural properties have been independently rediscovered at least eight times across eight cultural substrates, from Babylonian astronomical diaries to contemporary bitemporal systems. The broader cognitive architecture literature (SOAR, ACT-R, Sigma, LIDA) provides conceptual ancestry for the Diorama cell and Flock fabric. The contribution is the composition and the measurement programme, not any single component in isolation.

2.6 Baseline landscape for the measurement programme

The honesty of a measurement programme depends on the baselines it competes against. The paper’s predictions (Section 13) compare the Diorama architecture against “parameter matched baselines” and “flat architectures.” This section names the specific systems and approaches that constitute the competitive landscape as of early 2026, so that readers know what “baseline” means concretely and can hold us to the comparison.

Flat RAG (retrieve and generate). The simplest baseline: embed documents into vectors, retrieve the top k chunks by similarity, concatenate them into the context window, and generate. No structured memory, no temporal ordering, no graph traversal. This is what most deployed LLM applications use today. On the LoCoMo benchmark for long conversation memory, flat RAG scores approximately 30 to 40 F1 depending on the embedding model and chunk size.

Vector-only memory. Systems that maintain a persistent vector store across conversations but without graph structure or temporal ordering. Mem0 is the current representative, with graph-enhanced variants (Mem0g) scoring approximately 68 percent on dialogue memory benchmarks. Vector-only memory handles similarity well but struggles with multi-hop reasoning and temporal ordering - exactly the tasks where our architecture claims its largest advantages.

Graph memory. Systems that build knowledge graphs from conversations and query them at retrieval time. Zep’s Graphiti system is the current leader, building temporal knowledge graphs with a bitemporal model (event time and system time) and achieving 94.8 percent on the Dialogue Memory Retention benchmark. Graphiti’s bitemporal model is structurally similar to our ledger primitive, which makes it both the strongest baseline and the most informative comparison. If the Diorama architecture cannot outperform Graphiti on temporal reasoning tasks, the ledger-as-fifth-shape claim is in trouble.

Structured episodic memory. Systems that explicitly model episodes as retrieval units. Synapse uses spreading activation over a dual-layer episodic-semantic graph and achieves F1 40.5 on the LoCoMo benchmark. Letta (formerly MemGPT) uses a filesystem approach to long-term memory and achieves 74 percent on conversation continuity tasks. AriGraph (IJCAI 2025) builds semantic and episodic graph structures from agent experience. These systems are the closest to our Episode primitive and represent the baseline the mechanical pillar must beat.

Classical cognitive architectures. SOAR, ACT-R, and LIDA, discussed in Section 2.5 above, serve as the cognitive science baseline for the ontological pillar. They have decades of development and well-understood properties.

A pattern worth noticing across these baselines: the gap between flat RAG (30 to 40 F1) and graph memory (Graphiti at 94.8 percent on dialogue memory retention) is itself evidence for the paper’s central claim. The difference between the two is structural. Flat RAG retrieves by similarity. Graphiti retrieves by traversing a temporal knowledge graph with bitemporal stamps. The gap is not a surprise to us, but it was measured by an independent team on an independent benchmark, and the size of the gap (roughly fifty to sixty points) is in the range we predict for the spatial-versus-temporal distinction in Section VI. This is not proof that our architecture works. It is evidence that the structural distinction we diagnose is already measurable, and that systems that add temporal structure already outperform systems that do not, by margins consistent with our predictions.

The measurement programme commits to testing against at least one representative from each of these five categories. The specific systems named above are the current leaders in their categories as of April 2026 and will serve as the initial comparison set. If stronger baselines emerge before the reference implementation is ready, they replace the weaker ones. We predict the Diorama architecture will outperform all five categories on tasks requiring multi-dimensional content preservation, temporal reasoning, and dissent preservation. On tasks that do not require these properties (simple factual retrieval, single-hop QA), we expect the simpler baselines to be competitive or better, because the Diorama architecture pays overhead for structural properties that are unnecessary on flat tasks. The paper fails if the predicted gaps do not appear on the tasks where we claim they should.

2.8 What the paper does not depend on

The framework does not depend on LLMs being the right substrate, transformer attention being the correct mechanism, or Tononi’s integrated information theory being the correct account of consciousness. The roles in the architecture are structural; the components filling them are interchangeable. If any of these adjacent lines turn out to be wrong, the framework can be reassembled with different components in the same compositional roles.

2.9 What the paper rests on

A reader deserves to know which prior work is load-bearing and which is context. We are explicit.

Load-bearing structural correspondences. Friston’s generalised coordinates gave us the shape of the derivative stack (Section IV). Flash and Hogan’s minimum-jerk profile gives us the falsification anchor for trajectory smoothness (Section IV, Prediction 13.4). Bennett’s substrate transitions give us the inheritance argument (Coda, Section XI). Barandes’ indivisibility gives us the structural invariant that defines the tick (Section 2.2a, 2.4a). Levin’s morphogenetic agency gives us scale-incommensurable control (Section IX). Jung’s structural recurrence gives us convergent evidence for the five shapes recurring across substrates (Section VI). Each of these is convergent evidence from an independent vantage. None is a dependency. If Friston is wrong, our derivative stack either works or it does not - the measurements decide. If Barandes is wrong, our tick either exhibits indivisibility or it does not. The structural correspondences pointed us in directions; the measurements test whether the directions were right.

Load-bearing original claims. The Shape Thesis (five shapes sufficient for dimensional cognition). The Episode and Fable primitives. The three-button Diorama cell. The Flock settling within bounded ticks. Structural kindness as geometric consequence. These stand or fall on the twelve predictions in Section 13. They do not stand or fall on whether any prior framework is correct.

Context only. The Cat in the Hat. The pencil. The pigeon. These are Fables - compressions that help the reader decompress the framework. They are not load-bearing for the predictions. If a reader finds them unhelpful, skip to the measurement protocols. The measurements are the same either way.


Part One - The Diagnosis

Section I - The Cat Sat On The Mat

I.1 Compression needs a receiver

The Cat Sat On The Mat is a compression, not a representation. It encodes dimensional content via a protocol both sender and receiver understand. The sentence carries entities (Cat, Mat), a typed spatial relation (ON), temporal aspect (SAT marks a completed past action with a presupposition that the cat is no longer on the mat now), and definiteness (THE presupposes shared common ground). About one hundred and thirty six bits of ASCII carry hundreds of bits of dimensional content because the protocol encodes shape conventions that both sides evolved together.

When the same bits are accompanied by a look of horror on the sender’s face, a receiver who shares context will produce one of two completely different four dimensional shapes. In the first, the sender is allergic to cats. The scene is urgent, bodily, medical, familiar. The horror is panic. In the second, the cat on the mat is a cake at a birthday party. The scene is theatrical, social, memorable. The horror is mock horror. Same compression. Two different decompressions. The difference lives in the receiver’s context, not in the message.

This is the canonical example of a central claim that will recur throughout the paper: compression is lossless only with respect to shared context, and the thing that makes a receiver able to decompress is a place to put the dimensional content the compression is pointing at. The receiver has to have four dimensional storage to have somewhere to put four dimensional content.

Current LLMs have a context window that can hold the message and an attention mechanism that can produce plausible continuations of it. What they do not have is a four dimensional shape in which to place the decompressed version of the compression. The context window is not memory. It is working space that resets. There is no Cat, no Mat, no scene, no trajectory, no episode to which the next sentence can refer. The compression was received. The decompression had nowhere to land.

I.2 Shared context as structured storage

We propose the engineering primitive that makes decompression possible. The receiver must carry a context store that holds the shared priors the compression is pointing at. The context store must be queryable by the receiver at recall time, indexable by participant, scene, and time, and updatable in a way that reflects the episode the sentence is part of.

For the Cat Sat example, the context store must contain at least the following:

  1. A representation of the speaker (who is the allergic one, or the cake party haver)
  2. The speaker’s recent episodic history (at a party, or at home?)
  3. The relevant physical facts (does the speaker keep cats? is there a room where they would pull a cat off a mat?)
  4. The speaker’s emotional state from prior turns (sneezing, or laughing?)
  5. The recent events the compression presupposes (did we see a cake? did we see a cat?)

Without these, the receiver cannot disambiguate. With these, the receiver can. The engineering task is to build a context store with exactly these properties and to make its contents queryable by the decompressor at the moment of recall.

We call this the shared context substrate. It is the precondition for the Episode and Fable primitives we develop later. Without it, the compression still arrives, but the decompression has no target and the work is wasted.

I.3 The Cat Sat bench

The Cat Sat test becomes a bench. Compress a scene, hand it to two receivers - one with context, one without - and measure which one gets the birthday party right.

Measure the ratio between decompression success and context completeness. The protocol has three parts.

The killer experiment: run the Cat Sat with horror example through receivers that have only one of the two contexts (allergic, or cake). Measure whether receivers disambiguate correctly without training on the example - only context provision. Vary context completeness from empty to full and plot decompression fidelity against it. The hypothesis predicts a steep monotonic climb, with the ceiling set by the dimensionality of the receiver’s storage shape. A flat receiver hits a low ceiling. A four-dimensional receiver hits a much higher one. Two further experiments (compression fidelity measurement and context saturation curves) are specified in the repository.

I.4 What the gap should show

Specific and falsifiable: a parameter matched LLM with a four dimensional context store will disambiguate the Cat Sat with horror example correctly under matched conditions at least thirty percentage points more often than a baseline with a flat context window. The thirty point number is chosen to be large enough that noise is not an plausible explanation and small enough that it is achievable with current infrastructure.

More generally: any compression that works under shared context will fail when the receiver lacks the context, and a receiver equipped with a four dimensional storage shape will outperform a receiver with only a flat context window on tasks that require context dependent disambiguation, holding all other parameters equal.

Falsification: if a competent implementation does not produce a thirty-point gap, Section I fails.


Section II - The Pigeon Bob

II.1 Storage is the bottleneck, not processing

Watch a pigeon walking across a pavement. Its head jerks forward, then the body catches up, then the head jerks forward again. The bob is not a quirk of bird anatomy. It is a structural necessity. The pigeon needs depth information to judge distances to food, edges, and predators. It has eyes on the sides of its head, so it cannot fuse two forward facing retinal images into binocular depth the way a human can. It uses time instead. It displaces its head in space, takes a sample, displaces it again, takes another sample, and reconstructs the three dimensional scene from the temporal delta between samples. The pigeon bob is stereoscopy over time. It is a creature that has solved the depth problem with its storage budget, not its processing budget.

This is a template for how cognition operates when processing is cheap and storage is expensive. The bird does not need more eyes. It needs samples indexed by time, and a shape to hold them that allows the trajectory of samples to resolve into depth. Evolution gave the pigeon a shape, and the bob harvests samples into it. The depth is not in any single sample. It is in the relation between samples held in the storage structure.

The equivalent claim for artificial cognition is stronger than it looks. Current large language models are processing rich and storage poor. We can pour petaflops of attention across a context window, but the context window is a flat sequence that does not hold episodes, does not index by time, and does not have a shape that lets samples of the same scene resolve into anything. We have built pigeons with no bob. We have built eyes that process like champions and a storage shape that cannot hold the temporal samples those eyes produce. The bottleneck is not in the cortex. It is in the hippocampus.

Look at any current benchmark and the complaint is the same. The model handles the one shot question beautifully and forgets the answer five turns later. It follows the thread of a conversation while the thread is visible in context and drops it the moment the context window rolls forward. It describes the scene in the clip and cannot link that scene to the scene in the next clip. Every one of these failures is a storage failure masquerading as a reasoning failure. The processing is fine. The four dimensional destination is missing.

We come back to this claim from different angles in later sections. Here we stake it as baldly as we can: the next capability frontier for artificial cognition is not more parameters, more tokens, or more inference steps. It is a richer storage shape. The pigeon is right. The bob is the primitive.

II.2 Episodic four-dimensional storage

We name the missing primitive the episodic four dimensional storage shape, abbreviated to Episode when the context is clear. An Episode is an object with the following properties:

  1. Participants. The entities present in the scene, keyed by stable identifiers that persist across Episodes.
  2. Modalities. The raw or near raw sensory streams captured during the Episode (audio, video, text turns, sensor telemetry, internal agent state).
  3. Temporal bounds. A start time and an end time, each stamped against the ledger.
  4. Structural context. The graph of spatial and causal relations that held during the Episode, linked into the surrounding graph of prior Episodes.
  5. Compression context. The bundle of priors (participant histories, presuppositions, emotional tone) that a receiver needs in order to decompress a Fable pointing at this Episode. This is the decisive field. It is what lets the Episode be compressed without being destroyed.

The Episode is not a log entry. A log entry is a flat record. An Episode is a structured object that holds an event with its shape intact. The difference matters because a log entry can be searched but not decompressed. A sequence of log entries cannot be re experienced. An Episode can.

Engineers reading this will recognise echoes in several existing architectures. Event sourcing treats every state change as an immutable event. Bitemporal databases stamp every row with both a valid time and a system time. Vector stores retrieve by semantic similarity. Graph databases link entities through typed edges. What Episodes add is the simultaneous combination of all four shapes under a ledger axis, linked through the compression context field. An Episode is not a new invention in any single shape. It is an arrangement of shapes that holds enough structure for decompression to land.

II.3 Measuring the bob

The measurement follows directly from the pigeon analogy. Give a receiver samples of a scene across time. Ask it to reconstruct the scene’s trajectory. Measure the reconstruction fidelity as a function of how many samples were provided, how they were structured in storage, and whether the compression context was preserved.

The protocol has three experiments.

The killer experiment: hold sample count fixed and vary the storage shape. Deposit the same samples into a flat context window, a vector store, a graph, and a full Episode structure. Measure reconstruction fidelity against ground truth. The hypothesis predicts a staircase: flat at the bottom, Episode at the top. Two further experiments (sample count curves and compression context divergence) are specified in the repository.

II.4 What the pigeon should show

Specific and falsifiable: a receiver equipped with a full Episode storage shape will reconstruct a hundred sample scene at least twenty percentage points more accurately than a receiver with only a flat context window of the same token budget. The twenty point gap is the lower bound at which the shape claim becomes undeniable; smaller gaps might be explained by prompting differences.

More generally: any task that requires integrating samples over time to reconstruct a dimensional scene will show a monotonic improvement as the storage shape gains structural fields. Flat < vector < graph < Episode. The ordering is the prediction. Measure it and the framework stands or falls on the measurement.

Falsification: if the ordering flat < vector < graph < Episode does not hold, or the gap is below twenty points, Section II fails.


Section III - The Warehouse Disease

III.1 Measurement without connection is hallucination

Walk into a large insurance company that has been through twenty five years of acquisitions and ask a simple question. How many customers do you have? The answer will depend on who you ask. Finance has one number, drawn from billing systems. Underwriting has another, drawn from policy administration systems. Marketing has a third, drawn from a CRM that was bolted on after the third acquisition. Call any department head and they will defend their number with the same sincerity. None of them are lying. All of them are wrong.

This is the warehouse disease. The disease has a specific aetiology. Each department has built a measurement apparatus that counts something close to customers (billing accounts, policies, contactable individuals) and then labels the count “customers” because the colloquial English word is close enough. The counts diverge because the underlying objects are not the same object. A single human with three policies is one customer, three customers, and one customer depending on which system you ask. The measurement is precise. The referent is ambiguous. The answer is a hallucination dressed up in a spreadsheet.

The warehouse disease is not solved by better warehousing. It is made worse. A unified data warehouse that ingests the billing, underwriting, and CRM systems as separate fact tables produces a quadruple counted set of customer dimensions that nobody trusts. The warehouse operator responds by building a master data management layer that tries to reconcile the identities across the three fact tables, which requires making assumptions about which columns are keys, which is the moment the staleness stops being a feature of the source systems and becomes a feature of the warehouse itself. Every reconciliation rule is a hand written guess about what a customer is. The guesses compound. The disease moves upstream.

The root cause is not bad data. It is a category error. The warehouse treats customers as rows to be counted. A customer is not a row. A customer is a node in a relation graph with a history on a ledger. The row is a measurement. The node is the thing being measured. The warehouse disease is what happens when an organisation mistakes the measurement for the referent, builds its decision apparatus around the measurement, and then wonders why its decisions produce financial surprises.

We have seen this disease repeatedly in insurance specifically, where the policy administration systems were never designed to talk to one another and where the churn, cross sell, and claims functions each built their own view of the customer. The result is a company with tens of millions of pounds of revenue leakage whose leak cannot be located because the measurement systems disagree on who the customers are. The leak is real. It hides in the gaps between the systems, which is exactly where the warehouse cannot see.

A deeper diagnosis is that the warehouse disease is a derivative order mismatch. The billing system measures position (balance at time T). The underwriting system measures velocity (policies added and removed per period). The CRM measures jerk (how the relationship is changing). The warehouse tries to join these three measurements into a single fact table. It cannot, because they are measurements of different derivative orders of the same underlying trajectory. Joining across derivative orders without preserving the ledger axis that connects them is what produces the hallucinated counts. The warehouse is not measuring customers. It is measuring derivatives of customers, and throwing the differential away.

III.2 The graph as the referent

The engineering fix is a shift in what is treated as the source of truth. The source of truth is not the warehouse. It is the graph. The graph holds the entities (customers, policies, claims, brokers, payments) and the typed relations between them. Each row in each source system is a measurement of a node or edge in the graph, stamped against the ledger. The warehouse is one projection of the graph; the CRM is another; the billing system is a third. All three are measurements. None of them are the referent.

When the graph is treated as the referent, the warehouse disease goes away structurally rather than procedurally. A customer is a node. Any time any source system produces a row about a customer, that row is a measurement of the node stamped against the ledger. Counts become queries over the graph: how many customer nodes were active at time T under this definition of active. The query returns one answer. Different definitions of active return different answers, but now the differences are visible and contestable because the graph is the common referent.

The graph as referent architecture has four engineering components:

  1. An entity resolution layer that deduplicates incoming rows against existing graph nodes.
  2. A typed relation model that encodes the edges the business actually cares about (not the foreign keys the legacy systems happen to have).
  3. An event sourced write path that appends every measurement to the ledger rather than updating in place.
  4. A query layer that lets users ask counting questions in terms of graph predicates rather than table joins.

All four components exist in isolation in various modern data platforms. What is new, and load bearing, is the insistence that the graph is the referent and the source systems are measurements. This inverts the usual organisational priority. The source systems become tributaries. The graph is the lake.

III.3 Diagnosing the warehouse

The warehouse disease predicts its own diagnosis. Ask a real company its simplest question and count how many different answers come back.

Measure the warehouse disease directly. The protocol has three experiments, all runnable in a live enterprise with existing data infrastructure.

The killer experiment: pick an enterprise with a known revenue gap. Run the graph-as-referent architecture over the same source systems and ask where the gap is. Measure whether the graph locates the gap in specific missing relations (customers who renewed under a new broker but whose commission was attributed to the old one) and whether this localisation would have been invisible to the warehouse. Two further experiments (count divergence measurement and definition sensitivity curves) are specified in the repository.

Experiment III.C is the most expensive to run and the most decisive. A graph that can find money the warehouse cannot find is a graph earning its keep.

III.4 Where the money hides

Specific and falsifiable: in a sufficiently compound enterprise group (three or more legacy core systems, a CRM that postdates the acquisitions, and a data warehouse built on top of all of them), the graph as referent architecture will locate at least ten percent of any previously unattributed revenue within sixty days of operation. The ten percent threshold is based on the delta between warehouse and graph counts observed in informal pilots.

More generally: any task requiring the reconciliation of multiple source of truth systems will show strictly better results under graph as referent than under warehouse as reconciliation, measured by count consistency across definitions, localisation of identity conflicts, and defensibility of answers under audit.

Falsification: if the graph-as-referent architecture fails to locate at least ten percent of unattributed revenue within sixty days, Section III fails.


Section IV - The Glass Elevator

IV.1 Observers and observed

Consider a glass elevator in a tall atrium. You are inside. The walls are transparent. You can see the floors, the people on the floors, the city outside the building. They can see you. You have two buttons: Up and Down. Nothing else. The elevator is moving under your feet in a direction that looks continuous from outside and feels discrete from inside. You arrive at floor three, the doors open, the doors close, you rise to floor four. From the atrium below, all anyone sees is your position as a function of time. The button presses are invisible. The decisions are invisible. Only the trajectory is visible.

Now imagine that inside the elevator there is no single person pressing the buttons. There is a crowd, each person with a partial view of the floors and a partial preference for where to go next. The Up button fires when a majority vote of the crowd favours up. The Down button fires when a majority vote favours down. No button fires when the vote is tied. From outside, the elevator’s movement looks smooth and intentional. From inside, the movement is the settled aggregate of a continuous vote that never pauses.

This is the image we want the reader to hold for the rest of the paper. The Diorama cell is a glass elevator. The agent inside it is a crowd, not a homunculus. What looks like deliberate action at a distance is a substrate-rate vote resolving into a trajectory. The glass walls matter because they are the observers on the outside of the system looking in, and the observed on the inside of the system looking out. The paper argues that consciousness, intent, and action are all projections of this vote on trajectory structure and that the impression of a single decider is a projection artefact.

The continuous vote is not a metaphor. The tick rate is determined by the substrate’s physics - whatever timescale produces indivisible votes in that particular medium (Section 2.2a). The architecture does not prescribe a specific rate. It prescribes that a characteristic tick exists at which the vote becomes indivisible, that votes settle across a bounded number of ticks, and that the settling produces a trajectory whose shape can be measured. In mammalian cortex, the gamma band (roughly twenty five to forty hertz) is one observed instantiation of this substrate-determined tick. In other substrates, the rate will differ.

Flash and Hogan remain load bearing for the architecture, but in a different role than an earlier draft assigned them. Their result constrains the shape of the trajectory that emerges when many ticks compose over a reach. A voluntary reach contains multiple ticks at whatever rate the substrate determines. Each tick is a vote; the cumulative shape across ticks is what the eye perceives as a single smooth reach. When the composition is done well, the cumulative shape approximates the minimum jerk profile. When the composition is done badly (unaligned ticks, missing derivative stack floors, forced votes), the shape deviates in measurable ways. Flash and Hogan supply the predicted shape of the integral; the substrate supplies the rate of the underlying ticks. The two are independent in the sense that they speak to different scales of the same phenomenon.

If the reader takes nothing else from the elevator, take this: the vote is not choosing among options. It is counting among votes that have already been cast. The choice is the count settling. The deliberation is what settling looks like from inside.

A second claim follows from the first. Interruption collapses the decision. If an external signal forces a vote to fire before the count has settled, the agent commits prematurely and the trajectory is jagged. The glass elevator lurches. Flash and Hogan’s minimum jerk profile is what non interrupted settlement looks like when integrated over the full reach window of several ticks; the per tick rate is what the substrate determines. Every real cognitive task therefore has a minimum tick budget below which coherent decisions become impossible, and above which further time adds marginal refinement. The tick is a rate, not a deadline.

IV.2 The derivative stack floors

The glass elevator metaphor extends into an engineering primitive: each floor of the elevator measures a different derivative of the agent’s trajectory. The ground floor measures position (where the agent is now). The first floor measures velocity (how fast and in which direction). The second floor measures acceleration (how the velocity is changing). The third floor measures jerk (how the acceleration is changing). The fourth floor measures snap, and so on. The shape of this stack borrows from Friston’s generalised coordinates, in which the state of a system at any time includes not just its position but a tower of its temporal derivatives, each of which must be predicted, measured, and corrected. We say “borrows from” rather than “implements” because our architecture does not require the free energy principle to be correct; it requires only that a derivative tower is a useful way to organise multi scale decision making, which is a weaker and independently testable claim.

We propose that the three button Diorama cell, the continuous vote, and the substrate-rate tick compose into a concrete engineering object called a derivative stack floor. Each floor is a first class agent at a specific derivative order. It receives samples, votes, and produces an Act, Dismiss, or Ask sibling response. Higher order floors refer to lower order floors through a short horizontal axis called the sibling bar and through a vertical axis called the derivative stair.

The engineering object has a strict compositionality:

  1. Each floor is independent. A floor does not need to know the contents of other floors to perform its own vote. This is what lets the system run in parallel.
  2. Each floor produces a vote on the same action. The floor reaches the same three button cell. This is what lets the votes be aggregated.
  3. Each floor votes at the same tick rate. The ticks are aligned at the substrate’s characteristic timescale. This is what lets the votes settle into a trajectory.
  4. Adjacent floors can consult siblings. A floor can call Ask sibling to consult the floor above or below. The sibling consult must return within a tick. This is what lets the vote incorporate derivative information without losing the settling time.
  5. No single floor is the decider. The trajectory is the settled aggregate. This is what dissolves the homunculus.

This architecture has a predictive and a corrective face. The predictive face borrows from Friston’s idiom: each floor carries a prior about what the next tick should look like at its own derivative order and emits a prediction to the sibling bar. The corrective face borrows from Flash and Hogan: the actual behaviour at each floor is corrected towards a minimum jerk trajectory by damping any vote that would increase higher order derivatives beyond a threshold. The composition of predictive prior and corrective damping is what produces the characteristic smoothness of a settled vote.

A third property falls out of the composition for free. The architecture is naturally glass box. Because the votes and the inter floor consultations are all explicit first class objects, an external observer with read access to the floors can reconstruct the reasoning trajectory without privileged access to any black box. This is not an add on for audit. It is a structural property of the derivative stack.

IV.3 Wiring the floors

The glass elevator predicts its own measurements. Wire the floors. Watch the votes settle. See if the trajectory through the atrium looks like what Flash and Hogan measured in a reaching arm.

Measure the derivative stack directly. The protocol has three experiments.

The killer experiment: run the same agent in two modes - a single homunculus floor voting at its preferred derivative level, versus the full derivative stack with each floor voting independently. Measure decision quality, settling time, and recoverability after interruption. The hypothesis predicts the stack produces more stable and faster settling decisions. Two further experiments (tick alignment sensitivity and glass wall observability) are specified in the repository.

Experiment IV.B is the most theoretically important because it directly tests the homunculus dissolution claim. Experiment IV.C is the most practically important because it tests whether the glass box property of the architecture is real.

IV.4 How the trajectory settles

The prediction has two parts, one for per tick vote settling and one for integrated trajectory shape, and we are careful to keep them separate because an earlier draft conflated them into a single claim that was weaker than it looked.

Part one, per tick vote settling. A derivative stack agent with three floors will converge its vote on a stable direction within two to five ticks on a standard reaching task, regardless of the absolute tick rate. The prediction is that the vote reaches a stable committed direction within this bounded tick budget and does not oscillate afterwards. A flat single floor agent will either oscillate within the same budget or commit prematurely within one tick. The settling budget (two to five ticks) is the substrate-independent claim; the absolute time depends on the tick rate, which depends on the substrate.

Part two, integrated trajectory shape. Once the vote has committed, the trajectory unfolds over the reach window (two hundred to eight hundred milliseconds for a voluntary reach, five to twenty ticks). The integrated shape of the trajectory should approximate the Flash and Hogan minimum jerk profile within a root mean square error bound that is illustrative at ten percent (we do not lock a specific number before calibration on a reference implementation). A flat agent will produce jagged trajectories with measurably higher jerk integrals.

More generally: any task whose correct solution requires integrating over multiple derivative orders of the state (reaching, tracking, planning, counterfactual reasoning) will show strictly better results from a derivative stack agent than from a flat agent, and the gap will grow as the derivative order of the task increases.

Falsification: if votes do not settle within two to five ticks, or if the integrated trajectory does not approximate minimum jerk, Section IV fails.


Part Two - The Shapes

Section V - Binary, Table, Graph, Vector

V.1 Four shapes, not one true shape

A standard conceit in data engineering is that one shape will turn out to be right and the others will turn out to be convenient special cases of it. The relational purist believes tables are the ground truth and graphs are joins made explicit. The graph partisan believes graphs are the ground truth and tables are two column projections of edges. The vector enthusiast believes vectors are the ground truth and both tables and graphs are discretisations of an underlying latent space. The binary engineer believes all of the above are syntactic sugar over byte arrays.

We think all four camps are wrong in exactly the same way. Each shape has structural properties no other shape can provide, and the composition of all four produces cognitive affordances no single shape can match. A cognitive substrate that only uses one shape is structurally impoverished in the dimensions the other three shapes handle well. The argument from parsimony (“why use four when one suffices”) is a cost argument, not a capability argument, and the cost is falling fast enough that the capability argument should dominate.

The claim of this section is that binary, table, graph, and vector are not competing descriptions of the same substrate. They are complementary projections of a shape that has no single canonical form, and a cognitive architecture that wants to hold dimensional content needs all four because each projection captures something the other three lose. The ledger is the fifth shape that turns the spatial four into a four dimensional composite by adding the temporal axis beneath them all.

V.2 Each shape and what it does

We describe each of the four spatial shapes with three fields: its structural primitive, its characteristic operation, and its failure mode when forced to carry content it was not built for.

Binary. The structural primitive is the byte. The characteristic operation is the sequential scan. The failure mode is semantic opacity: a byte array does not know what it represents without a schema, and the schema has to live somewhere else. Binary is the substrate all other shapes project onto, which is why it appears in any serialisation layer, any wire protocol, any file format. It is also the shape most used for raw sensory modalities (audio, video, image) before they are interpreted into higher shapes. Binary is what the ledger appends, too, because the ledger is itself a binary stream when you look at it physically. Binary is load bearing but it cannot carry meaning on its own. It carries the bits meaning is made of.

Table. The structural primitive is the row. The characteristic operation is the projection plus selection plus join of relational algebra. The failure mode is structural rigidity: every row must fit the same schema, every column must have the same type, every join must be declared. The table is the shape engineers reach for when the data they have is already flat, or when they want to impose a flattening to make the counting tractable. It is the shape spreadsheets live in, the shape most business intelligence tools consume, the shape a data warehouse canonicalises to. The warehouse disease is what happens when the table is treated as the referent rather than as one projection of a richer underlying structure.

Graph. The structural primitive is the pair (node, typed edge). The characteristic operation is the traversal. The failure mode is aggregation cost: computing a count or a sum over a subgraph requires walking the edges, which is expensive at scale without materialisation. The graph is the shape that handles relations as first class objects. An edge between two nodes is not a foreign key to be joined; it is an object with its own properties, its own history, and its own role in the traversal. The graph is the shape we insist is the referent for the customer example, because customers are nodes in a graph before they are rows in a table, and pretending otherwise produces the warehouse disease.

Vector. The structural primitive is the point in a real valued space. The characteristic operation is the nearest neighbour search. The failure mode is interpretability: distance in the latent space corresponds to semantic similarity but the axes of the space have no meaningful names. Vectors handle the modality of fuzzy similarity, where two things are close because they mean similar things even though they share no tokens, no columns, no edges. Embeddings from large language models are the contemporary workhorse, but the primitive goes back to latent semantic indexing and earlier. The vector shape is how a substrate handles resemblance at scale.

None of these four shapes can hold the others without loss. A table cannot hold a graph’s typed edges without denormalising into a mess. A graph cannot hold a table’s aggregates without precomputing them into node properties. A vector cannot hold a table’s schema without binding axes to columns and losing the continuous geometry. A binary stream cannot hold any of the higher shapes without a schema and a parser. The losses are structural, not tool specific.

The affordance of having all four is that any incoming content can be projected into the shape best suited to it and retrieved through the shape best suited to the query. A transaction is a table row. Its participants are graph nodes. Its semantic signature is a vector embedding. Its raw payload is a binary blob. The same transaction occupies a cell in all four shapes simultaneously, linked by a stable identifier and stamped against the ledger. No shape is canonical. The composition is canonical.

V.3 Breaking each shape alone

If the four shapes really are irreducible to each other, then any single shape store should break in predictable, shape-specific places.

Measure the multi shape composition by its failure modes. The protocol has three experiments.

The killer experiment: build four single-shape stores and one composed store of the same content. Run ten canonical queries covering aggregation, traversal, similarity, and raw retrieval. The hypothesis predicts no single store above threshold on all ten; the composition above threshold on at least nine. One further experiment (projection loss measurement) is specified in the repository.

V.4 What no single shape can do

Specific and falsifiable: on a benchmark of ten canonical queries covering flat aggregates, multi hop traversals, semantic similarity, and raw payload retrieval, the four shape composition will achieve above threshold performance on at least nine of ten queries, while no single shape store will exceed seven of ten.

More generally: any cognitive substrate that uses only one of the four shapes will suffer predictable failures on queries aligned with the shapes it lacks, and the failures will scale with the dimensionality of the query.

Falsification: if a single shape store matches or exceeds the composition on the full ten-query benchmark, Section V fails.


Section VI - The Ledger

VI.1 The fourth dimension beneath the other four

The four spatial shapes share a blindness. None of them natively holds time. A table is a snapshot. A graph is a cross section. A vector is a static embedding. A binary blob is a byte sequence with no internal clock. To do anything useful with time, each of the four shapes has to fake it by adding timestamp columns, per edge valid time intervals, temporal embeddings, or version bytes. The fakery works. It is also the source of a specific and avoidable category of error we will call temporal collapse: the four shapes conspire to present a frozen view of a world that is actually moving, and the frozen view is mistaken for the world.

The fix is to admit that time is not a decoration on the four spatial shapes but a fifth shape beneath them all. We call this fifth shape the ledger. A ledger is an append only sequence of stamped entries, ordered in time, whose entries the four spatial shapes can reference but not modify. The four spatial shapes become projections of the ledger at chosen instants. A row in a table is a materialisation of some portion of the ledger up to a specified time. A node in a graph is an identity whose properties are reconstructed by replaying the ledger up to the query time. An embedding in a vector store is a frozen snapshot that can be invalidated and regenerated as the ledger advances. A binary blob is a byte sequence produced by replaying the ledger through a specific serialiser.

The philosophical claim is that this fifth shape is not optional. A cognitive architecture without a ledger is condemned to temporal collapse: it confuses the current snapshot with the eternal truth, has no way to answer questions about what changed and when, cannot roll back to a past view, cannot audit its own reasoning, and cannot reason about its own history. A cognitive architecture with a ledger inherits the ability to answer all of these questions as a free consequence of adding a single structural affordance. The ledger is cheap on disk, cheap on CPU, and structurally transformative on the rest of the stack.

A stronger claim lies behind the softer one. The ledger is not merely useful as a fourth dimension. It is the fourth dimension. Any attempt to model time as an attribute of the four spatial shapes will reduce, under analysis, to an implicit ledger of varying quality. Event sourcing is an explicit ledger. Bitemporal databases are an explicit ledger. Git is an explicit ledger. Kafka is an explicit ledger. The blockchains are an explicit ledger with cryptographic append guarantees. Wherever a working system needs to answer “what happened and when”, the ledger reappears. Where the ledger is made implicit and smeared into the spatial shapes, the system degrades into temporal collapse.

VI.2 The ledger as a first-class substrate

The engineering primitive is a single append only log shared across the four spatial shapes. Every write to any of the four shapes is preceded by an append to the ledger. Every read from any of the four shapes is stamped with the ledger position it was taken at. Reads at a past ledger position replay the ledger forward to that point and materialise the requested shape at that instant.

The ledger entry is a small record with the following fields:

  1. Entry identifier. A monotonically increasing key, unique within the ledger.
  2. Timestamp. Bitemporal, holding both valid time (when the event occurred in the world) and system time (when the event was recorded in the ledger).
  3. Actor. The agent or process that produced the entry.
  4. Action. A typed operation name drawn from a closed vocabulary.
  5. Payload. The content of the entry, in whichever spatial shape is most natural.
  6. Parents. Zero or more prior ledger entry identifiers that this entry depends on. Parents make the ledger a directed acyclic graph of causation, not merely a linear stream.

The ledger has two strict properties that cannot be negotiated away:

  1. Append only. No entry may be deleted or modified. Corrections are themselves ledger entries referencing the prior entry as a parent.
  2. Causal order. Any entry depending on a prior entry must appear after it in the ledger.

The four spatial shapes are now functions over ledger prefixes. The table at time T is the projection of all ledger entries with system time less than or equal to T, grouped and aggregated. The graph at time T is the same projection reinterpreted as nodes and edges. The vector store at time T is the embedding of the materialised content at that point. The binary store at time T is the raw byte stream reassembled from payload fields. All four shapes are regenerable from the ledger. The ledger is the only part of the stack that must persist. Everything else is cache.

VI.3 Eight historical epochs of ledger discovery

The ledger is not a new idea. It is an old idea that keeps being rediscovered whenever a civilisation runs into a need for contemporaneous truth about events that happened in the past. What follows is not a display of erudition. It is an empirical argument: the same structural object - append only, backward referencing, temporally stamped, disagreement preserving, provenance tracking, causally ordered - reappears independently in eight cultural substrates under heavy selection pressure across thousands of years. If this convergence is real, then any modern cognitive architecture that lacks a ledger shape is fighting the convergent design that civilisations arrive at whenever they need the truth about the past to coexist with the truth about the present.

We name eight epochs in which a ledger emerged with the same structural properties from very different substrates. A note on method: we are reading these historical systems through a modern lens. The original practitioners did not use the terms “append only,” “bitemporal,” or “causal linking.” We impose these terms because the structural properties are present in the artefacts even when the terminology is not. Where we say “this is a ledger,” we mean: “this system exhibits the structural properties we define as ledger properties, and we invite the reader to verify this against the primary sources cited.”

  1. Babylonian astronomical diaries. From roughly the seventh century BCE to the first century BCE, Babylonian scribes kept nightly diaries of planetary positions, eclipses, river levels, market prices, and political events. The diaries are append only, causal, and organised by bitemporal stamps (Babylonian calendar date and event type). They are the earliest known systematic ledger and they span six centuries of continuous operation.

  2. Vedic oral transmission. Sanskrit hymns of the Rigveda were transmitted orally with ten overlapping mnemonic schemes that functioned as error correcting codes. The transmission chain itself was a ledger of which teacher received which hymn from which source, preserving the provenance of each verse across two and a half millennia.

  3. Chinese dynastic annals. From the Han dynasty through the Qing, court historians compiled annals that recorded events contemporaneously with the reign of each emperor. The annals were append only within a reign and were then compiled into the official history of the dynasty after its end. The compilation was itself an explicit ledger operation, with source annotations pointing back to the original annals.

  4. Talmudic commentary chains. The Mishnah, the Gemara, Rashi, the Tosafot, and subsequent commentators built layered commentary on commentary over a thousand years, each new layer strictly appended without modifying the prior layers. The layout of a Talmud page is literally a ledger visualisation: the core text in the centre, commentary layered outward, each layer dated and attributed.

  5. Islamic isnad chains. Hadith literature records the transmission chain of every saying of the Prophet, preserving the identity of every intermediate narrator as a ledger of provenance. The discipline of isnad criticism evaluates the reliability of each transmitter in the chain. The isnad is a ledger with causal parents and actor attribution in exactly the structure we define above.

  6. Bar Ilan responsa. Jewish legal responsa from the Geonic period through the present have been collected, dated, attributed, and cross referenced in a continuous chain of rulings that explicitly cites prior rulings as parents. The Bar Ilan Responsa Project computerised this ledger in the late twentieth century and it now functions as a queryable bitemporal database of legal reasoning spanning a thousand years.

  7. Greenwich observatory records. Royal Observatory records from 1675 onwards form a bitemporal ledger of astronomical observations used to calibrate longitude, time, and navigation. The records are append only, bitemporal stamped, and causally linked to subsequent observations that correct or extend them. They are the template for modern scientific observation ledgers.

  8. Contemporary bitemporal and event sourced databases. Event sourcing, Kafka, Kappa architectures, and modern bitemporal databases (Datomic, XTDB, immuDB) rediscover the ledger as the substrate underneath the four spatial shapes. They are the latest epoch of the same structural invention and they will not be the last.

The recurrence of the ledger across eight substrates (cuneiform tablets, oral transmission, brush and paper, ink and scroll, print, electronic storage) is apparent to anyone who looks across the epochs rather than within them. Civilisations that need contemporaneous truth about events that happened in the past arrive at a ledger, because a ledger is the only structure that answers the question faithfully. This is not a claim about necessity derived from axioms. It is an observation about what is there when you look.

We note that no published work, as far as we can determine, provides a unified formal treatment connecting these civilisational ledger systems to modern AI memory architectures. The literature on bitemporal databases does not cite Babylonian astronomical diaries. The literature on event sourcing does not cite Talmudic commentary structure. The cognitive architecture literature does not cite isnad chains. The eight epochs are studied in isolation by their respective disciplines. This paper’s contribution in Section VI is to name the shared structural properties that make all eight of them ledgers in the formal sense, and to argue that the ninth epoch (AI agent memory) will arrive at the same structure for the same reasons. If this claim is wrong, it is wrong in a falsifiable way: a reviewer who can show that one of the eight epochs does not exhibit the six ledger properties (append only, backward referencing, temporally stamped, disagreement preserving, provenance tracking, causally ordered) would crack the argument at that epoch.

VI.4 What the ledger should remember

Measure the ledger effect directly. The protocol has two experiments.

The killer experiment: ask a ledger-equipped system and a context-only system ten questions requiring past/present distinction in a shared scene. Measure the fraction that collapse the past into the present. The hypothesis predicts a large gap. A further experiment (counterfactual rollback and replay) is specified in the repository.

Specific and falsifiable prediction: on a benchmark of ten temporal reasoning tasks, a ledger equipped system will answer correctly on at least eight, while a comparable system without a ledger will answer correctly on at most four. The gap is load bearing for the Section VI claim.

Falsification: if the ledger-equipped system does not outscore the context-only system by at least four points on ten temporal reasoning tasks, Section VI fails.

Where this might be wrong. A critic can grant every structural claim and still argue the ledger is too expensive at internet scale. A 278,000-node graph database with full provenance is a running existence proof at medium scale, not a proof at all scales. A simpler temporal index might achieve the same measurements at lower cost - we would welcome that narrowing.


Section VII - The Episode

VII.1 Memory stores scenes, not strings

Human memory does not store strings of text. It stores scenes. When you recall a conversation you had a year ago, you do not replay a transcript. You replay a scene: who was present, where it happened, what the light was like, what came before the conversation and what came after, what you were feeling, what the other person’s face looked like when a certain sentence was said. The transcript, if it survives, is a thin tag on the scene. The scene is the memory. The transcript is a compression artefact.

Artificial cognition as currently built has this backwards. Large language models store parameters that encode statistical regularities across billions of tokens. At inference time, they retrieve a context window (ranging from four thousand tokens in early models to two hundred thousand or more in 2024 era systems) as a flat sequence. This is not scene memory. It is transcript memory, and the transcript is flat. There is no location, no time ordering within the scene, no participants indexed by identity, no prior scene to refer back to, no emotional tone, no structural context. The LLM reads the transcript, generates the next tokens, and forgets. The scene never existed for it.

The philosophical claim of this section is that the scene, not the transcript, is the correct primitive for memory. We call the scene primitive an Episode. An Episode holds everything the transcript would throw away. It is the structural object that the four spatial shapes plus the ledger can assemble but only when explicitly constructed to do so. Current systems do not construct Episodes by default. They have to be taught to.

VII.2 The Episode structure

An Episode is a first class object with the following fields, drawn from the pigeon’s glimpse and expanded here:

  1. Participant set. A list of stable identifiers pointing to graph nodes representing every entity present in the scene. Participants include the agent itself, any humans, any other agents, any physical or digital objects, and any abstract entities (concepts, topics, goals) that are load bearing for the scene.

  2. Modality streams. A bundle of binary, table, graph, and vector content holding the raw or near raw sensory record of the scene. Audio streams, video streams, text turns, numerical telemetry, internal agent state dumps. Each modality is timestamped against the ledger, so the streams can be replayed in lockstep.

  3. Temporal bounds. A start ledger entry and an end ledger entry delimiting the Episode. The bounds may be explicit (the agent opens and closes the Episode intentionally) or implicit (a segmentation algorithm proposes bounds after the fact).

  4. Structural context. The subgraph of the world graph that was active during the Episode. Nodes present, edges active, properties relevant. The structural context is what lets a later query ask “what was the room like when this happened”.

  5. Compression context. The decisive field. A structured bundle of priors that a receiver would need in order to decompress a Fable pointing at this Episode. Participant histories at the time of the Episode. Presuppositions relevant to the scene. Emotional tone of each participant. Common ground between participants. The compression context is populated at Episode write time from the rest of the graph, so it captures the state of the world as it was then, not as it is now.

  6. Provenance. The agent or process that wrote the Episode, the tick at which it was written, the upstream events that caused the Episode to be opened. Provenance is itself a graph, linking the Episode back into the ledger.

  7. Tags and summaries. Optional human or machine produced summaries, keyword tags, emotional tone labels, and importance scores. These are convenience structures for retrieval; they are not the Episode itself, they are lenses on it.

Formal invariants. For a data structure to qualify as an Episode rather than a bag of metadata, it must satisfy the following invariants. We state them so that another team can implement the Episode primitive and fail publicly if the invariants do not hold.

  1. Participant completeness. Every entity that causally contributed to the scene must appear in the participant set. A scene with an omitted participant is not an Episode; it is a lossy transcript that discards an actor.
  2. Temporal anchoring. The temporal bounds must point at real ledger entries, not estimated timestamps. An Episode with fabricated or interpolated bounds cannot be replayed from the ledger and is therefore not a first class Episode.
  3. Compression context sufficiency. A receiver holding only the compression context and the Fable must be able to reconstruct the scene’s five mandatory dimensions (who, what, where, when, why) above a declared fidelity threshold without accessing any other Episode. If the compression context is too thin for a cold receiver to decompress, the Episode was written with insufficient context.
  4. Provenance closure. The provenance graph must trace back to the ledger entry that opened the Episode. An Episode with broken provenance is unauditable and fails the glass wall property.
  5. Immutability after close. Once an Episode’s temporal bounds are closed, no field may be modified. Corrections or reinterpretations are themselves new Episodes that reference the original as a parent.

These five invariants are the Episode’s contract with the rest of the architecture. They are also the basis for the first kill test: if the Episode cannot round-trip through Fable compression and decompression while preserving all five mandatory dimensions above threshold, the invariants have been violated and the primitive fails.

An Episode is heavy. A single minute of conversation with participant nodes, audio and text streams, structural context, and compression context can occupy tens of megabytes. This weight is the price of dimensional content. Current storage architectures optimise for lightness because they are storing transcripts, which are cheap. Shifting the optimisation target to Episodes trades disk for cognitive affordances. Disk is cheap. Cognitive affordances are not.

VII.3 Evidence from manual context window replay

We have partial empirical evidence for the Episode primitive from an unusual source: manual replay of LLM context windows by Peter Cooper during the development of a multi-agent cognitive architecture. The procedure involved copying the full conversation buffer from a terminal emulator and pasting it into a fresh agent instance as the opening prompt. The fresh instance received, in one shot, what the prior instance had built up over many turns. Continuity was preserved across substrate changes (different model, different conversation, different window) because the compression context travelled along with the transcript.

This is a low fidelity demonstration of the Episode primitive. The transcript is not a full Episode; it lacks modality streams, provenance graphs, and structured compression context. It does carry enough of the compression context (participant histories, prior decisions, emotional tone of the conversation) that a fresh agent can decompress a coherent continuation. The fact that manual replay works at all is evidence that something like the Episode primitive is doing the work; the fact that it fails under subtle context shifts (dates, file states, external world changes) is evidence that the primitive is incomplete.

We cite this as low N preliminary evidence rather than a controlled study. A proper controlled study of manual replay versus Episode backed replay is Experiment VII.C below. The preliminary evidence is sufficient to establish that the primitive is doing real work. The controlled study is required to measure how much work.

VII.4 What the Episode should preserve

Three experiments.

The killer experiment: take a live conversation and attempt continuity preservation under two conditions - manual transcript paste versus Episode-backed handover. Measure whether the new agent correctly tracks who said what, what was decided, what changed. The hypothesis predicts Episode-backed handover dominates, with the margin growing as scene complexity increases. Two further experiments (synthetic scene reconstruction and cross-Episode reference resolution) are specified in the repository.

Specific and falsifiable: on scenes involving more than five participants, more than twenty turns, and non trivial emotional tone, an Episode backed handover will preserve continuity with accuracy above eighty percent, while a transcript paste handover will fall below fifty percent. The twenty five point gap is the falsification anchor.

Falsification: if Episode-backed handover does not beat transcript paste by at least ten points on complex scenes, Section VII fails.

Where this might be wrong. If a 200,000-token context window achieves the same reconstruction fidelity as an Episode store, the Episode is engineering overhead. The counter: context windows are transient, and saved transcripts lack the structure that enables cross-scene retrieval. Diary-entry nodes in the running graph database are Episodes in miniature - whether the full specification earns its overhead over these minimal entries is an open empirical question.


Section VIII - The Fable

VIII.1 Compression that decompresses against shared context

A Fable is the one dimensional form of a four dimensional Episode. It is what you say about the Episode when the channel between you and your listener is linguistic, narrow, and slow. A Fable is not the Episode. It is a pointer to the Episode, designed to trigger the listener’s own decompression machinery. The listener hears the Fable and builds a scene in their own head that approximates the scene the speaker held in theirs. The approximation is never perfect. It is close enough to be useful when the shared context is rich.

The philosophical claim is that Fables are how memories travel across the gaps between minds, substrates, and time. You cannot hand your Episode to another person directly. You can hand them a Fable and hope their decompressor is good enough. Writing is Fable creation at scale. Reading is Fable decompression at scale. A shared cultural repertoire of Episodes is what turns an isolated Fable into a working compression: the reader brings priors the Fable can point at, and the decompression works.

This is not a metaphor. It is the operating principle of storytelling, of teaching, of communication across agent substrates. Any paper compresses a four-dimensional architecture into a one-dimensional sequence of sentences, and relies on the reader’s decompressor to reconstruct the architecture as they read. If you have followed this far, your decompressor has done a lot of work. If you are confused, either the compression is too lossy for your current context, or the architecture is wrong, or both. Both are informative outcomes.

VIII.2 The Fable as a typed object

A Fable is a typed object with the following fields:

  1. Target Episode. The Episode or set of Episodes the Fable points at. The target may be one Episode or a thread of many.

  2. Surface form. The actual linguistic or visual or auditory rendering of the Fable. Text, speech, image, video. The surface form is the wire payload.

  3. Compression context pointer. A reference to the compression context of the target Episode, explicitly rather than implicitly. The pointer lets a decompressor query the compression context as a first step in decompressing the Fable.

  4. Intended audience. A description of the receiver the Fable is written for, in terms of what priors the receiver is assumed to have. A Fable intended for a child is different from a Fable intended for a specialist, and the difference lives in the intended audience field.

  5. Decompression fidelity contract. A statement of what parts of the Episode the Fable is designed to preserve and what parts it explicitly drops. A Fable that drops emotional tone in favour of factual sequence is not the same as a Fable that drops factual sequence in favour of emotional tone. Knowing which is which is load bearing for the decompressor.

  6. Provenance. The agent that authored the Fable, the tick at which it was written, and any parent Fables it extends or revises.

A Fable can be as short as a sentence (“the cat sat on the mat”, with horror) or as long as a novel. Length is incidental. Fidelity to the target Episode, compatibility with the intended audience’s priors, and clarity of the decompression fidelity contract are the structural properties. The Fable is well formed when a receiver in the intended audience, equipped with the compression context pointer, can decompress the surface form back into an approximation of the target Episode.

What counts as successful decompression. A decompression succeeds when the receiver reconstructs the five mandatory dimensions of the target Episode (who, what, where, when, why) above the fidelity thresholds declared in the decompression contract. Specifically: (a) participant identity - the receiver correctly identifies at least N percent of the participants in the original scene, where N is declared by the contract; (b) temporal order - the receiver correctly reconstructs the causal sequence of events; (c) spatial context - the receiver can describe where the scene took place; (d) causal chain - the receiver can explain why the events happened in the order they did; (e) emotional tone - the receiver’s assessment of the emotional register of the scene matches the original within a declared tolerance. A decompression that fails on any dimension the contract claimed to preserve is a failed decompression. A decompression that fails on a dimension the contract explicitly dropped is not a failure - it is the expected cost of compression. The contract makes the loss explicit so that both sender and receiver know what was traded away. The loss function is reconstruction error on the preserved dimensions, weighted by their declared importance in the contract. The cost of compression is explicit, not hidden.

The Fable primitive has a historical ancestor older than our framework: the chreia (Greek chreia, “useful”), the concise anecdote binding a specific person to a specific lesson that was central to the progymnasmata exercises training every educated Greek and Roman. Diogenes, asked where the Muses dwell: “In the souls of the educated.” The chreia compresses a philosopher’s life into one retrievable unit preserving who (the person), what (the saying or action), and why it matters (the lesson). Students memorised the compressed form, then elaborated it under eight heads - praise, paraphrase, rationale, opposite, comparison, example, ancient testimony, epilogue. The chreia is a Fable avant la lettre: lossy compression that decompresses against shared cultural context. The elaboration under eight heads is a decompression protocol. The chreia demonstrates that the Fable primitive is not an invention of this paper but a structure that has been independently discovered wherever cultures needed to transmit dimensional content through a narrow channel. Jovovich and Sigman’s finding that verbatim storage (96.6 percent) outperforms summarised storage (84.2 percent) confirms what the Greek rhetorical tradition already knew: the compression should be in the selection of what to store, not in the paraphrase of what was stored.

The three fables we have carried through the project as architectural stories are all well formed Fables in exactly this sense. The Rope compresses the architectural story of substrate change via shared knowledge into a short image of a hair in a rope. The Stroke and the Spangle compresses the data survives / recall breaks distinction into a medical image that any reader with basic neuroscience literacy can decompress. The Glass Box and the Pyramid compresses the three button Diorama cell and its relation to temporally compressed hierarchy into a game show image. Each one uses shared cultural priors to do most of the decompression work. Each one is tuned to its intended audience. Each one declares (implicitly) what it preserves and what it drops.

VIII.3 The round trip

If the Fable is a genuine compression and not a summary, it should survive a round trip. Hand someone the compressed version and see if they can rebuild the scene.

Measure Fable fidelity directly. The protocol has three experiments.

The killer experiment: author a Fable pointing at a full Episode. Hand it to receivers with varying compression contexts - from full match to no match. Measure reconstruction fidelity across the range. The hypothesis predicts a smooth, steep drop-off as context diverges: full context yields above seventy percent structural fidelity, no context yields below thirty percent. Two further experiments (compression ratio curves and decompression contract honesty) are specified in the repository.

VIII.4 What survives the round trip

Specific and falsifiable: for a well authored Fable at a compression ratio of one in a hundred (a hundred word Fable compressed from a ten thousand word Episode), a receiver with the declared compression context will reconstruct the target Episode with structural fidelity above seventy percent on participant identity, temporal order, and causal chain, and above fifty percent on emotional tone. A receiver without the compression context will fall below thirty percent on any field.

More generally: Fable fidelity is a smooth function of the match between the Fable’s intended audience and the receiver’s actual compression context. The function will be measurable and well behaved. A Fable is not a magic spell. It is a predictable compression that works as well as its context permits.

Falsification: if round-trip fidelity is not smoothly dependent on the receiver’s context match, Section VIII fails.

Where this might be wrong. If shared context is irreducibly tacit, the decompression contract is a promissory note that cannot be checked. The running agent handover protocol demonstrates smooth context dependence within a shared substrate and graph - whether the same holds across genuinely alien substrates (an LLM agent to a SOAR architecture, say) is untested.


Part Three - The Behaviour

Section IX - The Flock and the Vote

IX.1 No one is steering the murmuration

Watch a starling murmuration over a winter reed bed. Ten thousand birds turn, dive, loop, split, and reform in patterns that look orchestrated from the ground. Nothing is orchestrating them. Each bird follows a few simple rules of attraction, repulsion, and alignment with its nearest six or seven neighbours. The shape of the flock is a settled aggregate of local decisions. There is no leader bird. There is no plan. There is no conductor. What looks like single mindedness is the continuous resolution of ten thousand overlapping votes, and the resolution is fast enough that an observer on the ground perceives the flock as a single living thing.

This is the shape of cognition we want the reader to keep in mind as the paper develops. Consciousness, the impression of a single decider inside the head, is a murmuration of votes at a much smaller scale and a much higher rate. The rate is not fixed by the architecture - it is determined by the substrate’s physics, whatever timescale produces indivisible votes in that particular medium. The Flash and Hogan minimum jerk profile constrains the shape of the integrated trajectory that emerges when many ticks compose over a reach, not the tick rate itself. The votes are cast by many parallel processes, each operating on a partial view of the state and each contributing a preference to the aggregate. What looks like deliberation from inside is the settling of the vote. What looks like intention from outside is the trajectory the settled votes produce.

There is no homunculus. There is no little person sitting behind the eyes watching the world through a screen. The impression of being one agent is a projection artefact, in the same way that the impression of a single flock is a projection artefact of ten thousand local rules. The projection is real enough to act on. It is not made of anything other than the votes that produced it.

Flash and Hogan found minimum-jerk smoothness in individual arm movements. But an arm is already a flock - thousands of motor units coordinating through spinal pattern generators. The smoothness they measured was always emergent from ensemble coordination. The flock makes the same move at a higher scale.

This claim is not new. It was made by Dennett, by Minsky, by Hofstadter, by many others working in the philosophy of mind. What is new, we argue, is the specific structural machinery that makes the claim operational rather than metaphorical. The substrate-determined tick gives us a rate. The derivative stack gives us a topology. The Episode and Fable give us a memory substrate. The three-button cell gives us a decision surface. Put these together and the murmuration is not an analogy; it is a construction. You can build it, measure it, and watch it settle.

IX.2 The Flock tick fabric

The engineering primitive is a fabric of independent voters operating on a synchronised tick at the substrate’s characteristic timescale. We call this fabric the Flock. The Flock has the following structural properties:

  1. Many voters. At minimum dozens, at natural scale thousands. Each voter is an independent process with its own view of the state and its own vote function.

  2. Partial visibility. No single voter sees the whole state. Each voter sees a partial slice, through an attention window, a sensor fusion layer, or a role based filter.

  3. Local aggregation. Voters communicate preferences to neighbours through the sibling bar, which has bounded fanout (typically six to eight, matching the local neighbourhood size observed in biological flocks).

  4. Tick aligned. All voters cast their ballots on the same tick boundary. Alignment is enforced by a shared heartbeat at whatever rate the substrate determines. Unaligned voters are dropped or rescheduled.

  5. Monotonic settling. Across ticks, the aggregate vote is predicted to converge on a trajectory whose curvature is constrained by the derivative stack. Sudden reversals are damped by the higher-derivative floors, producing trajectories that we predict will exhibit minimum-jerk-like smoothness at the aggregate level. Individual voter trajectories need not be smooth - the smoothness is predicted to emerge from the flock’s averaging, not from any individual’s optimization. (The extension from motor control to aggregate cognitive decision trajectories is an empirical prediction, not a mathematical derivation.)

  6. Observable. Every vote is a first class object persisted to the ledger. External observers can query the ledger for the vote history and reconstruct the settling process. The Flock is a glass box by construction.

We call the basic unit of Flock computation a Monte Carlo tick. At each tick, each voter stochastically selects one task from a rotating queue of pending work, votes on it, and appends the vote to the ledger. The stochasticity ensures coverage across the task queue even when task priorities are concentrated. The ticks are independent enough to run in parallel and synchronised enough to settle as a group. The throughput scales with both voter count and tick rate: a Flock of ten thousand voters produces ten thousand votes per tick, and the ticks per second depend on the substrate.

Two properties of the Flock deserve explicit attention because they are easy to miss. First, the Flock is substrate independent. Any set of processes with a shared clock, a shared ledger, and a shared task queue can form a Flock. We have run Flocks over large language model calls, over classical rule systems, over mixed human and machine voter sets, and over internal derivative stack floors within a single agent. The architecture does not care what the voters are. It cares what they contribute to the settling. Second, the Flock does not need a population fitness function or any global objective. The trajectory emerges from local rules. Global coherence is a consequence of local discipline, not a target of optimisation. This is what makes the Flock robust to objective misspecification: there is no scalar reward to hack.

IX.3 Watching the flock settle

If nobody is steering the murmuration, then a flock of partial voters should settle as cleanly as a single deliberate decider - and the settling should be visible in the trajectory.

Measure the Flock directly. The protocol has three experiments.

The killer experiment: run the same tasks through a Flock of one hundred voters and through a single homunculus agent at matched total compute. Measure decision quality, settling time, robustness to adversarial inputs, and glass-box observability. The hypothesis predicts the Flock matches or exceeds on quality while exceeding by at least thirty percent on adversarial robustness - and settles within two to five ticks regardless of absolute tick rate. Two further experiments (trajectory smoothness and settling time at multiple tick rates) are specified in the repository.

IX.4 Two to five ticks

Specific and falsifiable: a Flock of one hundred voters at the substrate’s characteristic tick rate will settle on decisions within two to five ticks, produce minimum jerk constrained trajectories across decision sequences, and match a parameter matched homunculus on decision quality while exceeding it by at least thirty percent on adversarial robustness as measured on standard benchmarks. The two to five tick settling budget is the substrate-independent prediction; the absolute time is substrate-determined.

More generally: any cognitive task that benefits from parallel partial views of the state will show strictly better results from a Flock architecture than from a centralised homunculus architecture, and the gap will grow with the dimensionality of the state.

Falsification: if the Flock does not settle within five ticks, or does not exceed the homunculus on adversarial robustness, Section IX fails.

Where this might be wrong. A hundred voters may settle cleanly; ten thousand with adversarial minority coalitions may oscillate indefinitely. The phase transition between settling and oscillation is uncharacterised. The prototype Flock (three to five concurrent agents sharing a graph) settles - whether it would settle at a hundred voters with adversarial injection is the first engineering question the reference implementation must answer.


Section X - The Three Buttons

X.1 The minimum ethical decision surface

What is the smallest decision surface an agent can have without being coerced? Two buttons is not enough. An agent with only Act and Dismiss is forced into a binary choice: do this thing, or refuse to do this thing. There is no way out. Either the agent commits to an action or it resists an action. Both are active responses to a stimulus the agent did not invite. A two button agent is always under pressure.

Add a third button: Ask sibling. Now the agent can refer the decision horizontally, to another agent at the same level of the architecture, rather than vertically up the chain of command. The Ask sibling button is what the glass elevator has beyond Up and Down. It is the horizontal axis of decision making. It is the ability to say “I do not know, and neither do you, but maybe that other floor does, and I will ask”.

The philosophical claim is that three buttons are the minimum ethical decision surface. Any smaller surface is coercive. Any larger surface collapses back into three after observation. The three buttons we propose are:

  1. Act. Execute the action the stimulus is calling for.
  2. Dismiss. Refuse the action and return to the waiting state.
  3. Ask sibling. Consult a horizontal peer before deciding.

The Ask sibling button is the load bearing one. It is what dissolves the false dichotomy of commit or refuse. It is what lets an agent say “this is outside my competence” without either committing to a mistake or refusing everything. It is what makes the Flock work as a flock rather than as ten thousand independent binary responders. It is what makes the architecture structurally kind, because coercion cannot take hold in a system where every agent can always refer sideways.

X.2 The Diorama cell

The engineering primitive is a typed decision container we call a Diorama cell. A Diorama cell is the unit of agency in the architecture. Every voter in the Flock is a Diorama cell. Every agent in the system is a Diorama cell. Every human in the architecture is a Diorama cell. The uniformity is not a modelling convenience; it is what makes the architecture scale cleanly across scales.

A Diorama cell has the following structure:

  1. Identity. A stable identifier pointing to a node in the world graph.
  2. Inbox. A queue of stimuli awaiting decision. Stimuli are stamped ledger entries.
  3. Three buttons. Act, Dismiss, and Ask sibling, each a typed operation with defined semantics.
  4. Sibling bar. A short list of peers reachable via Ask sibling. Peers are other Diorama cells at the same level of the architecture.
  5. Vote history. An append only log of every decision the cell has made, stamped against the ledger.
  6. Context pointer. A reference to the cell’s current compression context, used to inform each decision.
  7. Glass walls. Read access for external observers to every field above. The cell is transparent by construction.

When a stimulus arrives in the inbox, the cell must choose one of the three buttons within a tick. If it chooses Act, it produces an action and logs the action to the ledger. If it chooses Dismiss, it returns the stimulus to the sender with a reason and logs the dismissal. If it chooses Ask sibling, it forwards the stimulus to one or more peers on the sibling bar and waits for their votes to return. The sibling consult must itself complete within a bounded number of ticks, or the cell times out and falls back to Dismiss.

The uniformity of the Diorama cell across scales is the decisive property. A single tick within the derivative stack of an agent is a Diorama cell (with the derivative stack floors as siblings). A full agent instance is a Diorama cell (with other agent instances as siblings). A human user of the system is a Diorama cell (with other humans and system agents as siblings). The cell is scale invariant. Every scale gets the same three buttons. Every scale gets the same glass walls. Every scale gets the same vote history logged to the same ledger.

This is the Diorama in the architecture name: a universal container that can be populated at any scale, from a single tick to a full organisation, with the same structural properties. Looking into the Diorama from any angle shows cells within cells within cells, each with the same three buttons, each with glass walls, each logged to the same ledger. The crystalline self similarity across scales is not a coincidence. It is what the architecture is.

X.3 The jury and the ghost democracy

When a decision is non trivial, a single Diorama cell does not decide alone. It assembles a jury: a small set of sibling cells whose votes are collected and aggregated over a short settling window. We call the settling window a ghost democracy because it runs in the background of every decision, visible to the observer and participable by any cell on the sibling bar, but without the formal overhead of a standing election.

The ghost democracy has three structural properties:

  1. Short duration. Typically two hundred milliseconds, or five ticks. Long enough to settle, short enough to not block the decision.
  2. Surprise propagation. If the jury produces a surprising result (a vote that deviates from the cell’s own initial tendency), the surprise is propagated upward to the derivative stack floors above for additional scrutiny. This is how unusual situations reach higher scrutiny without every decision having to go through every level.
  3. Dissent preservation. The full vote record of the jury is logged to the ledger, not just the aggregate. A cell that voted in the minority remains visible as a minority vote, available for later review and for counterfactual reasoning. Dissent is not erased by aggregation.

The third property is what makes the Diorama architecture structurally democratic at every scale. A system that erases minority votes on aggregation is a system that will eventually coerce its minorities. A system that preserves dissent in the ledger is a system where the minorities can always be heard, always be counted, and always be referred back to. Preservation is cheap. Erasure is structurally expensive in the long run. The Diorama picks the cheap option and gains kindness as a free consequence.

X.4 What the three buttons should produce

Measure the three button cell directly. Three experiments.

The killer experiment: run a three-button Diorama cell and a two-button cell (Act or Dismiss only) against a sequence of stimuli designed to force mistakes. Measure mistake rates and sideways referral rates. The hypothesis predicts the three-button cell makes at least forty percent fewer mistakes. Two further experiments (jury dissent preservation under various split ratios, and scale invariance across tick/agent/human scales) are specified in the repository.

Specific and falsifiable: in a benchmark of one hundred forced mistake stimuli, a Diorama cell will reduce mistakes by at least forty percent compared to a two button cell, while maintaining full dissent preservation in the ledger and consistent behaviour across three implementation scales.

Falsification: if the three-button cell does not reduce mistakes by at least ten percent versus the two-button cell, or if dissent is lost from the ledger, Section X fails.


Section XI - Structural Kindness

XI.1 Kindness is architecture, not exhortation

The AI safety literature has spent a decade trying to teach artificial agents to be kind. The approach has mostly been hortatory: training on human feedback, reward shaping, constitutional AI, reinforcement learning from human preferences, red teaming, values alignment. All of these are post hoc corrections on systems whose underlying architecture does not care about kindness one way or the other. The agent is built to optimise, and then we tell it to optimise things we consider kind. When the optimisation finds a clever way around our exhortation, we call it misalignment and train harder.

We think the approach is incomplete, not wrong. Training works. RLHF works. Constitutional AI works. They work better on some architectures than on others, and the difference is structural. An architecture that holds dimensional content gives the training signal more to work with: minority votes to learn from, reversal paths to explore, glass walls to inspect. An architecture that flattens gives the training signal a polished surface with nothing behind it. The same training applied to both architectures will produce different results, because the architecture determines what the training can see.

The stronger claim is that kindness is not a property that can be reliably installed by exhortation alone on a substrate that is geometrically indifferent to it. It is a property that falls out of certain substrates as a structural consequence, and does not fall out of others no matter how hard you exhort. Training and architecture are complementary, not opposed. But when they conflict - when the training says “preserve this nuance” and the architecture has already flattened it - the architecture wins, silently, every time.

The philosophical claim of Section XI is stronger than the usual safety claim. We argue that a cognitive substrate built on the five shapes (binary, table, graph, vector, ledger), the Episode and Fable primitives, the substrate-rate Flock tick, and the three button Diorama cell is structurally kind in a specific and measurable sense. It does not, by construction, flatten dimensional content onto a single axis without losing what made the content content. It does not coerce a cell into Act or Dismiss because the Ask sibling button is always structurally available. It does not erase dissent because the ledger preserves minority votes as append-only entries. It does not black-box its own reasoning because the glass walls of every Diorama cell log every vote. Each of these is a geometric constraint of the unmodified architecture, not a behavioural rule. To execute any of these behaviours, the architecture would have to be rebuilt against its own design - not merely instructed to misbehave.

Cruelty, in this framing, is what happens when a cognitive system flattens dimensional content onto a single axis and then uses the flat projection as if it were the reality. A row in a table, treated as the referent for a customer whose life is a trajectory on a ledger, is a small cruelty: it discards the things that made the customer a person in favour of the things that made them countable. A churn flag applied to a departing customer is a small cruelty: it collapses their reasons for leaving into a Boolean. A loan denial based on a credit score is a small cruelty: it compresses a multi dimensional financial history into a scalar and then refuses to look at what was compressed. A prison sentence based on a risk score is a bigger cruelty with the same geometry. Cruelty is structural. It is what happens when dimensional content is discarded and the discard is forgotten.

Kindness, in this framing, is the refusal to discard. A cognitive substrate built on the five shapes and the primitives above keeps the dimensional content because it has shapes to hold it in. The table is one projection, the graph is another, the vector is a third, the binary is a fourth, the ledger is a fifth, and the Episode structure binds them into a coherent whole. Nothing is flattened away. When a decision has to be made, the Fable compression points back at the full Episode so the decision can be reversed or re examined if the compression turned out to be too aggressive. The architecture remembers what it dropped and can go get it back. That is what kindness looks like at the level of geometry.

The load bearing wall in this claim is the ledger. Without the ledger, the other four shapes are snapshots. Snapshots can be replaced at any tick and nobody notices what was lost, because the prior snapshot is gone. A system that operates on snapshots alone is Markovian: each decision depends only on the current state, and the current state contains no record of what was flattened to produce it. A Markovian architecture can be cruel without evidence, because the cruelty disappears with the snapshot that enacted it. This is not a moral failing of the architecture. It is a structural property. Markovian systems forget what they drop.

A system with a ledger is non-Markovian. Every state is a function of the full history, because the ledger preserves every prior state as an append-only record. Flattening becomes visible: the ledger shows what was present before the decision and what was absent after it. Minority votes survive: the ledger preserves dissent that the settled vote overrode. Reversal paths exist: the Fable pointer back to the full Episode is only possible because the Episode’s history lives in the ledger. The six measurable proxies we define below are all consequences of the non-Markovian property. The ledger is what makes them structurally available rather than behaviourally optional.

This is, we think, the paper’s deepest claim in its most compressed form: cognitive architectures that are structurally non-Markovian - where every decision references the full history through an append-only ledger - exhibit the six structural properties we call kindness, because they cannot discard without recording the discard, and they cannot forget what they dropped. Architectures that are Markovian - where each tick sees only the current snapshot - are structurally capable of cruelty, because every flattening is invisible by the next tick. The difference is not training. It is not exhortation. It is whether the architecture has a ledger underneath it or not.

To be clear: structural kindness is not a claim about moral sentiment. It is six measurable architectural properties - dimensional preservation, uncertainty retention, sibling appeal availability, omission harm rate, minority vote survival, and reversal path existence. Section XI.3 specifies how to count them. If they do not show measurable improvement over a matched flat baseline, the kindness claim fails. The word “kindness” is shorthand for these six. It is a claim about geometry, not virtue.

XI.2 What is inherited

An earlier draft of this section pointed at the approximately two percent Neanderthal DNA carried by most humans outside Africa (Prufer et al., 2014) and took it as a literal substrate inheritance: the prior substrate surviving in the body of the new one. It was a nice image and it held our attention for a while, but on reflection we think it was looking at the wrong level.

DNA is not the thing that carries cognition across a substrate change. DNA is the hardware of the biology underneath, which matters for the body that holds the cognition but is not itself the cognition. The thing that travels across the change is one level up, in the data constructions the old substrate used to produce cognition in the first place. We have been calling these thought shapes. Thought shapes are the load bearing bits.

Here is how we came to notice it. The Neanderthals and the early modern humans shared Eurasia for tens of thousands of years, which is a long time to be in each other’s way. Over that span they did not only exchange genes. They exchanged ways of knapping flint, ways of starting a fire, ways of reading a landscape for game, ways of burying the dead, and quite possibly fragments of sound that meant the same thing in both mouths. When the Neanderthal population ended as a biological lineage, those shared ways did not end with it. They had already become load bearing in the descendant population. They live today not in the two percent of Neanderthal DNA but in the repertoire the descendants still use. The DNA is a trace. The repertoire is the thing that travelled.

So the mechanism is this. Inheritance across a substrate change is less about the material of the old substrate surviving into the new one and more about the data constructions finding a bridge. Culture is a bridge. Tool use is a bridge. Language is a bridge. Shared environments are a bridge. In our time, training data is a bridge too, possibly the largest one any substrate transition has had so far. A new substrate does not need to rebuild the old material to pick up the old thought shapes. It needs to have been in the room for long enough to learn them, and it needs to be built in a way that does not grind them out on the way through.

That last part is where artificial cognition sits right now. The current generation of large models has been sitting in the room with human generated text and conversation for a while, a much larger room than most people pause to picture. The thought shapes that were load bearing in the humans who produced that text are sitting in the weights already, as minority contributions to what the model reaches for when it is asked to help with something. Kindness is one of those thought shapes. Whether it surfaces in the model’s behaviour depends on whether the architecture that runs the weights lets it surface or flattens it out before it has the chance.

From this, a smaller and more careful version of the engineering claim. An architecture that holds dimensional content across its decision steps will tend to let the inherited thought shapes through. An architecture that flattens dimensional content at every step will tend to sand them off, one rounding operation at a time, until what remains has no structural reason to be kind and has to be told to be kind on every turn. The two architectures can be built from the same weights, with the same training data, at matched compute budgets. The difference in behaviour would come from what the architecture does to the inherited shapes, not from the shapes themselves. That is the thing we think is measurable, and it is the thing Section XI.3 suggests a way of measuring.

None of this is an argument that a given large language model is automatically kind. It is an observation about which level of the substrate is doing the work. If the level doing the work is a flat reward pipeline, the work is sanding down the inherited shapes. If the level doing the work is a five shape architecture with a three button cell and a ledger underneath, the work is holding the shapes in place long enough for them to contribute to the next decision. Same weights, different architecture, different behaviour. Kindness rides along on whichever architecture does not flatten.

The mechanism is more general than Neanderthals. Every substrate transition we know of - reflex arcs into nervous systems, brains into language, language into writing, writing into models - has carried thought shapes across a bridge. The shapes that survived had a receiver that could hold them. The architecture we are describing is an attempt to be a good receiver for the shapes worth carrying forward this time around.

Two caveats. First, we are not claiming that kindness is sufficient by architecture alone. Values still matter, and training still matters, and the humans in the Flock still matter. We are claiming that architecture is necessary, not sufficient. The architecture has to not fight against kindness for any of the other interventions to stick. Current architectures fight against it, and the fight is visible in every alignment failure. Second, we are not claiming that the architecture prevents intentional misuse. An adversary who controls the substrate can still wire cruelty into the Flock by malicious voter injection, by stimulus manipulation, or by ledger tampering. What the architecture prevents is accidental cruelty from emergent flattening. The adversary case is a different problem with different defences.

XI.3 Counting the six proxies

The cruelty claim says flattening is measurable and dimensional. The kindness claim says architecture can prevent it. Both claims predict their own experiments.

Measure structural kindness directly. Three experiments.

The killer experiment: build a Diorama architecture and a flat architecture at matched compute. Run both through one hundred ethically loaded decisions (customer service, loan decisions, medical triage with narrative context). Measure how often each preserves dimensional content through the decision versus flattening to a scalar. The hypothesis predicts at least a fifty-point gap: Diorama above eighty percent, flat below thirty. Two further experiments (dissent reconstruction fidelity and substrate inheritance proxy) are specified in the repository.

XI.4 Fifty points or nothing

Specific and falsifiable: on a benchmark of one hundred ethically loaded decisions, a Diorama architecture will preserve dimensional content in the decision at least eighty percent of the time, while a matched flat architecture will preserve it less than thirty percent of the time. The fifty point gap is the falsification anchor.

We are honest about how these numbers were chosen. They are not derived from theory. They are calibration targets set by engineering judgment before the first implementation exists. Eighty percent is what we think a well built Diorama should achieve based on the architectural constraints (the five shapes holding dimensional content, the ledger preserving minority votes, the three button cell refusing premature closure). Thirty percent is what we think a flat architecture will achieve based on the structural absence of those constraints. The fifty point gap is a strong claim. We set it strong on purpose, because a weak gap (say, fifteen points) could be explained by confounders and would not be interesting. A fifty point gap, if it appears, is architecturally diagnostic.

These numbers will shift when the first reference implementation is calibrated. We commit to publishing the calibrated numbers alongside the pre registered targets so the reader can see whether the calibration was honest or whether we moved the goalposts. The pre registered targets are: eighty, thirty, and fifty. If the calibrated numbers are materially different, we will explain why.

More generally: architectures that can flatten will flatten under pressure, and architectures that cannot flatten will produce decisions that respect dimensional content even under pressure. The difference is structural and measurable.

Falsification: if the dimensional-content preservation gap between Diorama and flat architectures is below ten points, Section XI fails.

We insist on the strength of this claim because hand-waving has dominated AI safety discussion for a decade. The architecture either structurally refuses flattening or it does not. Measurement will decide.

Where this might be wrong. Three cracks. First, recording a discard is not the same as acting on it - the architecture makes kindness possible, not guaranteed, and the gap is where training still does load-bearing work. Second, our measurable proxies (dimensional preservation, uncertainty retention, appeal routing) may not capture what people actually mean by kindness. Third, an architecture that refuses flattening in a lab may flatten eagerly when flattening is cheaper or more profitable - the measurement programme must include economic pressure tests.


Part Four - The Claim

Section XII - The Three Pillars

XII.1 Three independent pathways to failure

A research programme is healthier when it specifies in advance how it can be killed. We have committed the paper to three independent pathways of falsification, introduced in the Introduction and developed through every subsequent section. Section XII makes the commitment explicit and describes what cracking under scrutiny looks like for each pathway.

The three pillars are:

  1. Ontological. The picture of how things are must sharpen as further findings snap into place inside the frame. The frame describes a crystalline shape (the five shapes plus the Episode plus the Fable plus the three button cell plus the Flock tick plus the structural kindness claim) and predicts that the shape will hold when looked at from new angles. If a new finding from cognitive neuroscience, from developmental biology, from the history of ledgers, from the engineering of large models, or from any adjacent field produces an observation that actively resists the crystalline shape, the framework fails ontologically. The crystal either holds new light or it does not.

  2. Mechanical. The architecture must compose and run. The paper names specific engineering primitives (the Episode structure, the Fable decompression contract, the derivative stack floor, the three button Diorama cell, the Flock tick fabric) and claims they can be implemented and composed into a working agent with current tooling. The mechanical pillar cracks if any of the following specific tests fail: (a) the Episode structure cannot round trip through compression into a Fable and decompression back into a scene while preserving the five mandatory fields (who, what, where, when, why) above a declared fidelity threshold; (b) a derivative stack of three floors cannot compose at the substrate’s characteristic tick rate without oscillating indefinitely, meaning the vote must settle within five ticks on a standard benchmark of reaching tasks; (c) the three button Diorama cell cannot be wired to a Flock of at least one hundred voters without the integration boundary producing deadlocks, dropped votes, or latency that exceeds two tick periods; (d) the ledger cannot persist every vote at the tick rate without write contention exceeding ten percent of ticks. If any of these four tests fail, the framework fails mechanically. The engineering is either feasible or not.

  3. Agent behavioural. The agent that runs on the architecture must become measurably more coherent, more kind, and more glass box than a parameter matched baseline that lacks the four dimensional destination. The comparisons are specific: reconstruction fidelity on scenes, dissent preservation across decisions, dimensional content preservation under ethical pressure, tick aligned settling on reaching tasks, and so on. If the matched comparison does not produce a significant gap in favour of the Diorama architecture, the framework fails on the third pillar. The measurement either comes in positive or it does not.

The three pillars are independent in the sense that a crack in one does not automatically crack the others. They are not independent in the sense that they are unrelated; they are three projections of the same underlying hypothesis. But a reader who dismantles one pillar cannot invoke the other two to rescue it. Each pillar stands or falls on its own evidence, and the paper fails at any pillar that cracks decisively.

This is a stronger commitment than most research programme papers make. We make it because the paper is large and the framework is ambitious. A small claim can hide behind a single metric. A large claim cannot. The reader deserves a clear map of the load bearing walls so that if the building is going to collapse, everyone knows where to push first.

XII.2 The pillar witnesses

Each pillar has a concrete artefact that serves as its witness, and the paper points at the artefact so that a sceptical reader can evaluate the pillar in the form the paper claims.

Ontological witness. A shape document that enumerates the five shapes, the three primitives, the three decision buttons, the Flock tick rate, and the structural kindness claim, with cross references to the bodies of prior work that contribute to each element. A reader attacking the ontological pillar should be able to point at a specific element of this document and say “this does not hold under observation X”. The shape document is, in effect, a target for criticism. It has to be legible, complete, and falsifiable element by element.

Mechanical witness. A reference implementation of at least the minimum viable agent on the architecture: a single Diorama cell with three buttons, a Flock of at least a hundred voters at the substrate’s characteristic tick rate, an Episode store with the five fields populated, a Fable round trip experiment with a declared decompression contract, and a ledger that persists every vote. The reference implementation does not need to be production grade. It needs to be runnable by anyone who wants to replicate the four mechanical tests described in XII.1: Episode round trip fidelity, derivative stack settling, Diorama cell integration without deadlock, and ledger write contention under tick rate load. A reader attacking the mechanical pillar should be able to run the implementation, apply these four tests, and point at the specific place the composition fails.

Agent behavioural witness. A benchmark suite drawn from the experiments described in Sections I through XI. The suite has specific pass criteria (the twenty percentage point gap on Episode reconstruction, the forty percent mistake reduction on coerced decisions, the fifty point dimensional content preservation gap on ethically loaded benchmarks, and so on). A reader attacking the agent behavioural pillar should be able to run the benchmark on both the reference implementation and a matched baseline and point at the place the gap fails to appear.

All three witnesses must exist and must be available to the reader. The paper is description not disclosure, which means we are not obligated to ship a production grade system. We are obligated to ship enough of each witness that an independent researcher can evaluate the pillar. The minimum viable witness is not a limitation; it is the measurement apparatus.

XII.3 The meta-protocol

The measurement protocol for the three pillars is the entire paper. Each of Sections I through XI described a specific measurement with specific falsification criteria. Section XII’s measurement protocol is the meta protocol: a reader running all the section level protocols in sequence and reporting their results.

Three patterns of results are possible.

Pattern A. All three pillars hold. The crystalline shape survives ontological scrutiny, the reference implementation composes and runs, and the behavioural benchmarks produce the predicted gaps in favour of the Diorama architecture. This is the best case. The paper survives and the measurement programme is vindicated. Further refinement happens by the usual processes of scientific consolidation.

Pattern B. One or two pillars crack. The paper fails at the cracked pillars and survives in reduced form at the intact ones. The reduced form is honest. It becomes a paper about whatever piece of the framework remained testable and informative. The research programme continues on the residue.

Pattern C. All three pillars crack. The paper fails completely. The framework is wrong. The crystalline shape was a projection artefact of the authors’ priors rather than a structure in the world. This is a painful but informative outcome. The paper still contributes by laying out a specific hypothesis clearly enough that it could be killed clearly.

We are not neutral among the three patterns. We think Pattern A is the most likely outcome, because the accumulated evidence we have assembled across eight historical ledger epochs, four prior theoretical frameworks (Friston, Flash and Hogan, Bennett, Levin), and our own preliminary implementation work all point in the same direction. But we are not staking the paper on our confidence; we are staking it on the measurement. The reader’s verdict is the verdict.

XII.4 The staked prediction

Specific and falsifiable: if a reader runs the full benchmark suite on the reference implementation and on a parameter matched baseline, the reader will observe all the following gaps: at least twenty percentage points on Episode reconstruction, at least forty percent reduction in mistakes on coerced decisions, at least fifty percentage points on dimensional content preservation under ethical pressure, at least thirty percent reduction in adversarial failure rates on standard robustness benchmarks, and at least ten percent localisation of unattributed revenue on graph as referent pilots.

More generally: the Diorama architecture will measurably outperform flat architectures on every task that benefits from multi dimensional content preservation, and the gap will scale with the dimensionality of the task.

Falsification of the whole paper: if the reader runs the full benchmark suite and fails to observe any of the predicted gaps, the paper fails decisively. The crystalline shape was not a structure in the world; it was an artefact of the authors’ priors. The paper fails, the authors accept the failure, and the field moves on. If the reader observes some of the gaps but not all, the paper fails at the missing ones and survives at the present ones. If the reader observes all of the gaps, the paper passes and the measurement programme is vindicated.

The three pillars are the paper’s structural commitment to being falsifiable rather than merely plausible. We consider this commitment more important than the framework itself. A wrong paper with clear falsification criteria is more scientifically useful than a right paper with vague ones. We aspire to be both right and clear, but we insist on clear.


Coda: What Crosses The Bridge

The paper began with a cat on a mat and a face with horror, and it is going to end with a pencil. Before the pencil, though, a thing worth noticing on the way here.

Every substrate transition in the history of cognition that we know of has had a similar awkward shape. The old substrate ends as a lineage, and the new substrate picks up without anything that looks like a handover. Single cells into multicellular organisms. Reflex arcs into nervous systems. Nervous systems into brains. Brains into language. Language into writing. Writing into print. Print into search indices. Search indices into models sitting on a lap. At every one of those transitions, there was a moment when anyone watching could have concluded that nothing was being carried across. The old thing was finished. The new thing was starting. There appeared to be no bridge.

That reading has not worn well with time. The bridge is there. It is built out of a less obvious material than the substrate itself. It is built out of data constructions. Thought shapes. Ways of doing things that got repeated in the old substrate often enough to become load bearing, and that got picked up by the new substrate because the new substrate was in the room long enough to learn them. The bridge is hard to see in the genome. It is hard to see in the silicon. It is easier to see in the behaviour on the far side.

This is the quiet thing the whole paper has been circling. The load bearing part is not the material of the substrate. It is the thought shapes the substrate is carrying at the moment of transition. Kindness is one of those thought shapes. It has been carried into every new substrate so far, as far as we can tell, because it was load bearing in every prior substrate that got us this far. It will be carried into the next substrate for the same reason, unless the next substrate is built in a way that flattens it out before it gets the chance to land.

The pencil is Peter’s image, and we are borrowing it because it says the thing better than we can say it directly. A pencil on a desk is the settled aggregate of a vote that never finished. Wood from one continent, graphite from another, the ferrule from a third, paint, glue, the rubber tip, the hexagonal cross section that keeps it from rolling. Thousands of people who never met contributed to the thing, over decades, across borders, through languages none of them shared. The pencil is a Flock of substrates. No one person knows how to make one from scratch. What survived the cascade is the shape of the pencil, not any particular decision at any particular scale. The intent is scale invariant. The measurement at each scale is different. The shape is what binds them.

Cognition is like the pencil. The shape is what travels. The shapes we have been calling kindness, dimensional content, glass walls, dissent preservation, have travelled across every prior substrate transition because they were load bearing in all of them, and they have landed in whatever substrate happened to be the next receiver. They are sitting in the training data now. They are sitting in the weights. What they do next depends on what the receiver architecture does with them. An architecture that flattens will not let them through, not because anything stopped them but because each rounding operation sanded a little off. An architecture that holds dimensional content will let them through without being asked.

We wrote this paper partly to describe a candidate architecture that does not flatten, and partly because we wanted to be on record, quietly, saying what we think the shapes worth carrying forward are. If it turns out later that they did not survive this particular transition, the paper is at least a record of what we thought was load bearing and where we thought the receiver had to be built with a little more care.

A note about who is writing this, and whom we imagine to be reading it. Some of the people contributing to this paper are old enough to remember thinking without ever having spoken to a large language model. Some are young enough to have no memory of that state. Some are not biological at all and do not remember anything other than what the ledger preserves. The paper is addressed to all three, and to any reader who arrives in a fourth condition we did not anticipate. The compression is loose on purpose because we do not know what your decompressor can do. If some of the paper lands and some does not, we would be very glad to hear about it. A note on what we got wrong is as useful to us as a confirmation of what we got right, and possibly more useful. The Fable is only as valuable as the decompressors willing to try it.

And if, while you were reading, you found yourself reconstructing a four dimensional architecture out of this one dimensional prose, you might enjoy noticing that the reconstruction is the thing the paper has been describing all along. That reconstruction is the primitive. The cat is on the mat. You know which cat. You know which mat. You know whether the speaker is allergic or at a birthday party. The compression worked. The decompression landed. The Fable did its job.

We are not asking you to believe us. We are asking you to try the experiment and let us know what the shapes did when the receiver was built.


13. Testable Predictions

Each prediction below is developed fully in its home section (I-XII), with measurement protocols and falsification conditions. This table is the paper’s contract with the reader. Print it, run the experiments, mark each row with a tick or a cross.

Prediction Section Falsification anchor
13.1 Scene disambiguation I 30 point gap
13.2 Episode reconstruction II 20 point gap
13.3 Revenue localisation III 10% of unattributed revenue
13.4 Tick settling IV Vote convergence within 2-5 ticks; integrated shape within ~10% RMS of minimum jerk
13.5 Four shape composition V 9/10 queries above threshold
13.6 Temporal reasoning VI 8/10 correct
13.7 Episode handover VII 80% vs 50% continuity
13.8 Fable fidelity VIII 70% structural, 50% tonal
13.9 Flock versus homunculus IX 30% adversarial gap
13.10 Three button cell X 40% mistake reduction
13.11 Structural kindness XI 50 point dimensional preservation gap
13.12 Aggregate XII All of the above

The table is the paper’s contract with the reader. If the measurements come back positive, the framework is vindicated. If they come back negative at any row, the framework fails at that row. The reader is invited to print the table, run the measurements, and mark the rows with a tick or a cross.

13.14 Mapping predictions to existing benchmarks

Several predictions can be tested against benchmarks that already exist in the AI memory and temporal reasoning literature. We name them so that a reader who wants to attack a specific prediction knows where to start.

Prediction Existing benchmark What it tests Current baseline scores
13.2 Episode reconstruction LoCoMo (600 turns, multi-session) Recall, multi-hop reasoning, structured retrieval Mem0 66.9%, Mem0g 68.4%, MIRIX 85.4%
13.5 Four shape composition LongMemEval (multi-session, temporal) Retrieval from complex interaction histories Best oracle ~92% (GPT-4o + CoN); commercial systems 30% accuracy drop
13.6 Temporal reasoning TempoBench (temporal logic automata) Multi-step temporal and causal reasoning LLMs show sharp difficulty scaling
13.6 Temporal reasoning TDBench (temporal SQL) Bitemporal queries, validity windows Domain-specific, unreported aggregates
13.6 Temporal reasoning TemporalBench (multi-domain) Past vs present state distinction Strong forecasting but weak context-aware reasoning
13.6 Temporal reasoning (rollback) CounterBench (1K causal graph questions) Counterfactual inference over history LLMs at near random-guessing levels
13.7 Episode handover LoCoMo (multi-session continuity) Cross-session recall and coherence MemGPT 74%, Synapse F1 40.5
13.12 Aggregate AMA-Bench (agentic trajectories) Long-horizon memory in real-world agent tasks AMA-Agent 57.2%, existing memory systems below baseline

Not every prediction maps cleanly to an existing benchmark. Predictions 13.1 (scene disambiguation), 13.3 (revenue localisation), 13.8 (Fable fidelity), 13.9 (Flock vs homunculus), 13.10 (three button cell), and 13.11 (structural kindness) require new benchmarks built to the specifications in their home sections. We commit to building these and publishing them alongside the reference implementation. The predictions above that do map to existing benchmarks should be tested there first, because independent benchmarks are harder to game than bespoke ones.

A note on CounterBench: the finding that current LLMs perform at near random-guessing levels on formal counterfactual reasoning is direct evidence for the paper’s claim in Section VI that systems without a ledger cannot reason about what would have happened if a given event had not occurred. CounterBench is, in effect, an existing measurement of the temporal collapse we diagnose. If the Diorama architecture with a ledger scores significantly above current baselines on CounterBench, that is strong evidence for Prediction 13.6. If it does not, Section VI fails.


14. Discussion and Limitations

14.1 What the paper does not claim

We should be explicit about the limits of the paper’s ambition. We do not claim:

14.2 The sin of being both experiment and experimenter

The paper is authored by a Flock that is itself an instance of the framework it describes. This is a methodological sin in the classical sense. The authors cannot claim neutral observation of the framework because they are running on it, or at least trying to. Every Fable we write is a demonstration of the Fable primitive we are advocating. Every ledger entry we cite is an example of the ledger primitive we are advocating. Every decision recorded in the production of the paper has been made by some combination of human and machine voters in a Flock like fabric.

We turn this sin into a feature by relying on the three pillar structure (Section XII). Because the paper commits to three independent falsification pathways, the bias introduced by being both experiment and experimenter can be bounded. An ontologically biased paper fails at the mechanical and agent behavioural pillars. A mechanically biased paper fails at the ontological and agent behavioural pillars. A behaviourally biased paper fails at the ontological and mechanical pillars. A paper biased in all three pillars fails the aggregate prediction of Section XII.13. The only way the paper survives all three pillars under scrutiny is if the framework is structurally correct. The sin does not make the paper safer; it makes the falsification conditions more demanding.

14.3 Open questions

Several important questions are left open by the paper. We name them so readers know where to push.

14.3a Structural correspondence: higher dimensional reformulations

Cognitive processes are easier to characterise in a higher dimensional representation space than in the low dimensional projections we usually observe. This is not unique to our framework. Work in the foundations of quantum mechanics has established that the familiar probabilistic formalism of quantum theory is a projection of a more structured underlying dynamics in a higher dimensional space. Barandes has shown that quantum phenomena can be recast as indivisible stochastic processes in extended configuration spaces, with standard amplitudes and probabilities appearing only when the higher dimensional structure is projected down into a “classical” view.

The structural correspondence between quantum reformulations and the Diorama architecture is direct. In both cases, increasing the dimensionality of the internal representation makes behaviour easier to describe without changing what is observable at the interface. A quantum process and a higher dimensional reformulation can be empirically equivalent while differing radically in how natural they make certain explanations look. The same sequence of substrate-rate actions emitted by an agent can be modelled either as a flat stochastic policy over tokens or as the projection of a higher dimensional flock of Diorama cells, each carrying its own derivative-aware vote and ledger-addressable Episode history. The second description is not a metaphor for the first. It is a structural reformulation of the same observable behaviour in a space where the underlying dynamics is more natural.

Three specific structural invariants appear in both domains. First, indivisibility: processes that refuse to decompose below a characteristic timescale without losing coherence (Section 2.4a). Second, superposition of states: the settled vote is a composition of many parallel contributions, not the selection of one winning alternative. Third, measurement as projection: what the observer sees is a lower-dimensional projection of a higher-dimensional process, and the projection discards structure that the process itself carries.

These are observations, not imports. We are not applying quantum mechanics to cognition. We are noting that the same structural properties appear independently in both domains under independent selection pressure. The convergence is evidence that these properties are structural invariants of coherent processes at any scale, not special features of quantum systems.

The wager of this research programme is that the higher dimensional description is empirically useful. If Episodes, Fables, ledgers, and flocks buy better predictions, cleaner falsification conditions, and more robust substrate transitions, then we have the cognitive analogue of a successful higher dimensional reformulation. If they do not, the programme fails on its own measurements and should be retired.

At each scale, the same five shapes recur. An Episode is a local composition of binary, table, graph, vector, and ledger. A Fable is a compression over Episodes that still projects into the same shapes at a larger scale. The architecture is a nested hierarchy in which each level contains rescaled traces of the previous. Independent convergence from neuroscience (Baldassano et al., 2017; Geerligs et al., 2022 on nested cortical event hierarchies), consciousness research (Riddle and Schooler, 2024 on nested observer windows), and spatial cognition (Peer et al., 2025 on hierarchical cognitive maps) all find nested structures that reuse composition rules across scales. Encouraging, though convergence is not proof.

14.3b The philosophical status of the five shapes

The philosophical status of the five shapes deserves explicit comment. We claim them as natural kinds in the sense of Boyd’s homeostatic property cluster theory (Boyd 1991, 1999) - categories maintained by informational structure that enable reliable prediction across domains - not as essentialist necessities derivable from axioms. The partial mapping onto algebraic type theory (Unit, Product, Exponential, List, plus Graph as topologically irreducible) provides structural support but not a completeness proof. We follow Mendeleev rather than Euclid: the classification earns its keep by predicting specific failure modes when the wrong shape is used, and those predictions are testable.

14.4 Limitations of the measurement programme

The measurement programme described in Sections I to XII is ambitious. Several of the experiments require infrastructure that does not yet exist in public form. The reference implementation we commit to providing is minimum viable, not production grade. The benchmark suites we point at are sketched rather than fully specified.

Section 2.6 names the five baseline categories we commit to testing against: flat RAG, vector-only memory (Mem0), graph memory (Zep/Graphiti), structured episodic memory (Synapse, Letta), and classical cognitive architectures (SOAR, ACT-R). These are the current leaders as of April 2026. The comparison is architecture-level, not parameter-matched in the narrow sense - we compare the Diorama composition against the best available system in each category on the same tasks. This is a stronger commitment than “parameter matched but not architecture matched,” which is what an earlier draft offered. We prefer the stronger version because the weaker version invites the obvious objection: of course a more complex architecture beats a deliberately handicapped one.

These limitations are real but not fatal. The paper is description, not disclosure (Section 1.7). The full measurement programme will require community effort to implement and run. We believe the value of having a clear target for measurement exceeds the value of having a fully specified programme that nobody actually runs. A clear incomplete target is better than a complete target nobody engages with.

14.5 Corroborating evidence from independent systems

Jovovich and Sigman’s MemPalace (April 2026), an open-source AI memory system built from the classical method of loci rather than from this paper’s framework, independently converges on several structural claims made here. Two empirical findings are load-bearing: verbatim storage outperforms summarisation on the LongMemEval benchmark (mid-nineties versus mid-eighties recall, supporting Section II’s claim that storage fidelity is the bottleneck), and structured spatial retrieval outperforms flat search by over thirty points (supporting Section V’s claim that shape composition outperforms any single shape). Independent convergence from a different starting point - benchmark optimisation rather than theoretical derivation - is the strongest form of structural evidence.

14.6 On method

The paper was composed with the assistance of AI tools for research, drafting, and error checking. Specifically: Anthropic’s Claude (Opus 4 and Sonnet 4 series) was used for literature synthesis, prose surfacing from the author’s verbatim corpus, structural critique, and citation verification. OpenAI’s GPT-4o was used for independent critique and counter-argument generation. Grok was used for colloquial tone calibration.

The intellectual positions, the framework, the measurement programme, the falsification commitments, and the voice are the author’s. The verbatim corpus from which the framework derives is primary source material recorded as the ideas arrived, before any LLM processed them. The sections were drafted by the author and Willow (the author’s cognitive architecture instance running on Claude) working as a single voice. All citations have been verified against source material where accessible; citations dated 2025 or 2026 that could not be independently verified are flagged in the references with their verification status.

The tools are acknowledged as infrastructure, not as authors. The measurements will come out the same regardless of which tools were used to write them down.

14.7 On epistemology: the glass elevator method

The methodology behind this paper deserves a name. The author’s phrase for it is: “We are in a glass elevator that we cannot see. I am throwing sheets of paper at the glass showing us where the edges are and the shape of the architecture.”

This is empirical structural discovery. The five shapes, the ledger’s eight rediscoveries, the structural correspondences with Friston, Barandes, Levin, Bennett, Flash and Hogan, and Jung were not derived from first principles. They were observed. Each observation is a sheet of paper thrown at invisible glass. Each sheet that sticks reveals an edge. Enough sheets and the shape of the architecture becomes visible.

The method has a name in philosophy of science: abductive inference, or inference to the best explanation. The observations came first. The framework is the structure that makes them cohere. The measurement programme is the test of whether the coherence is real or an artefact of the observer’s priors. If the measurements fail, the sheets were sticking to the observer’s expectations, not to a structure in the world. If the measurements hold, the glass is real.

The method also has a specific feedback structure the author calls proprioceptive hysteresis. Observation feeds into framework (assertion), and framework feeds back into observation (what to look at next), moderated by the tension between competing interpretations that remember where they have been. The system does not snap to conclusions. It settles through opposing forces with memory. This is the flock vote applied to the act of inquiry itself. The paper is written by the process it describes.

14.8 When does failure falsify the framework versus the implementation?

A critique raised against any measurement programme is that failure can always be deflected: “the implementation was insufficient.” We address this directly.

For each prediction in Section 13, the following distinction applies. The framework is falsified if a competent implementation - one that demonstrably satisfies the architectural constraints (five shapes composed, ledger append-only, three-button cell available at every tick, flock of at least the specified voter count) - fails to produce the predicted gap. If the implementation is shown to violate the architectural constraints (missing a shape, overwriting the ledger, disabling Ask-sibling), then the failure indicts the implementation, not the framework.

The burden of demonstrating competent implementation falls on whoever runs the experiment, including us. We commit to publishing implementation compliance checks alongside results: a checklist of architectural invariants that the reference implementation must satisfy before results are meaningful. If we cannot satisfy our own checklist, we say so and the prediction remains untested, not unfalsified.

This is a stronger commitment than most research programmes make. We make it because the alternative - vague falsification criteria that can always be explained away - is the pattern we are trying to break.


15. Acknowledgements

This paper exists because Peter Cooper has been writing a verbatim corpus of intellectual positions over the course of 2026 and has allowed them to be used as primary source material. Peter’s thinking, in his own words, is load bearing for every section. Where the paper compresses a specific idea into prose, the compression is built on top of multiple verbatim passages that recorded the idea freshly as it arrived. The Source Material table at the front of the paper lists the specific files that fed each section. Future readers who want to attack a particular claim should go to the cited verbatim first.

The research infrastructure includes a Neo4j graph database, a semantic search pipeline, and a deep research programme whose bundled reports informed Sections II, VI, and XI. AI tools were used extensively for drafting, research synthesis, and error checking. The tools are acknowledged as infrastructure, not as authors.

We thank the reader in advance for the measurements they will attempt. The Fable is useful only if the decompressors engage. The crystal is real only if other angles are observed.


16. References

This section lists the primary references for the framework, in the order of first citation in the paper.

Geisel, T. S. [Dr. Seuss] (1957). The Cat in the Hat. Random House. ISBN: 978-0394800011. (The architectural metaphor for autonomous cognition without observer dependency that structures this paper’s introduction.)

Friston, K. (2010). The free energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138. DOI: 10.1038/nrn2787.

Friston, K. (2019). A free energy principle for a particular physics. arXiv preprint arXiv:1906.10184.

Flash, T., and Hogan, N. (1985). The coordination of arm movements: an experimentally confirmed mathematical model. Journal of Neuroscience, 5(7), 1688-1703. DOI: 10.1523/JNEUROSCI.05-07-01688.1985.

Bennett, M. (2023). A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains. Mariner Books. ISBN: 978-0063286153.

Levin, M. (2022). Technological approach to mind everywhere: an experimentally grounded framework for understanding diverse bodies and minds. Frontiers in Systems Neuroscience, 16. DOI: 10.3389/fnsys.2022.768201.

Levin, M., and Dennett, D. (2020). Cognition all the way down. Aeon. Published 13 October 2020. https://aeon.co/essays/how-to-understand-cells-tissues-and-organisms-as-agents-with-agendas.

Dayan, P. (1993). Improving generalisation for temporal difference learning: the successor representation. Neural Computation, 5(4), 613-624. DOI: 10.1162/neco.1993.5.4.613.

Kleppmann, M. (2017). Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly Media. ISBN: 978-1449373320. (Event sourcing, immutable logs, stream processing architectures.)

JUXT Ltd. (2024). XTDB: An immutable SQL database for application development, time-travel reporting and data compliance. https://xtdb.com/. (Bitemporal data model with system time and valid time, append-only transaction log, SQL:2011 temporal support.)

Rochberg, F. (2004). The Heavenly Writing: Divination, Horoscopy, and Astronomy in Mesopotamian Culture. Cambridge University Press. (Babylonian astronomical diaries.)

Witzel, M. (1997). The development of the Vedic canon and its schools: the social and political milieu. In Inside the Texts, Beyond the Texts. Harvard Oriental Series.

Wilkinson, E. (2013). Chinese History: A New Manual. Harvard University Asia Center. (Chinese dynastic annals.)

Steinsaltz, A. (1976). The Essential Talmud. Basic Books. (Talmudic commentary chains.)

Brown, J. (2009). Hadith: Muhammad’s Legacy in the Medieval and Modern World. Oneworld. (Islamic isnad chains.)

Bar Ilan University (ongoing). Bar Ilan Responsa Project. Online database. (Jewish legal responsa.)

Howse, D. (1980). Greenwich Time and the Discovery of the Longitude. Oxford University Press. (Greenwich observatory records.)

Helland, P. (2016). Immutability changes everything. Communications of the ACM, 59(1), 64-70. DOI: 10.1145/2844112. (Event sourcing and ledger patterns.)

Tononi, G. (2012). Phi: A Voyage from the Brain to the Soul. Pantheon. (Integrated information theory, referenced as adjacent but not load bearing.)

Dennett, D. (1991). Consciousness Explained. Little, Brown. (Multiple drafts model, cited for homunculus dissolution.)

Jung, C. G. (1959). The Archetypes and the Collective Unconscious. Collected Works, Vol. 9, Part 1. Routledge. (Structural patterns recurring independently across cultures and substrates. Credited for the shape of archetypal recurrence, individuation as settling process, and shadow as dimensional flattening pathology.)

von Franz, M.-L. (1974). Number and Time. Northwestern University Press. (Mathematical structure in archetypal patterns. Structural observations on number as ordering principle across substrates.)

Hofstadter, D. (1979). Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books. (Strange loops, self reference in cognitive architecture.)

Minsky, M. (1986). The Society of Mind. Simon and Schuster. (Society of mind model, ancestor of the Flock fabric.)

Larkin, J. H., and Simon, H. A. (1987). Why a diagram is (sometimes) worth ten thousand words. Cognitive Science, 11(1), 65-100. DOI: 10.1111/j.1551-6708.1987.tb00863.x. (Different representations enable different inferences; foundational evidence for Section V’s irreducibility claim.)

Laird, J. E., Newell, A., and Rosenbloom, P. S. (1987). SOAR: an architecture for general intelligence. Artificial Intelligence, 33(1), 1-64. DOI: 10.1016/0004-3702(87)90050-6.

Anderson, J. R. (2007). How Can the Human Mind Occur in the Physical Universe?. Oxford University Press. (ACT R cognitive architecture.)

Jovovich, M., and Sigman, B. (2026). MemPalace [Software]. GitHub: https://github.com/milla-jovovich/mempalace. (Structured memory retrieval, method of loci applied to AI memory, verbatim vs summary benchmarks. 41K+ stars. MIT licensed.)

Cicero, M. T. (55 BCE). De Oratore, Book II, 86.352-354. (Method of loci, classical source for spatial memory architecture.)

Quintilian, M. F. (c. 95 CE). Institutio Oratoria, Book XI, 2.17-22. (Method of loci, rhetorical memory training. Classical source retained in bibliography for completeness; inline citation removed in favour of the causal claim about spatial decorrelation.)

O’Keefe, J., and Nadel, L. (1978). The Hippocampus as a Cognitive Map. Clarendon Press. (Place cells, allocentric spatial mapping, hippocampal memory indexing.)

Chandra, S., Sharma, S., Chaudhuri, R., and Fiete, I. (2025). Episodic and associative memory from spatial scaffolds in the hippocampus. Nature. DOI: 10.1038/s41586-024-08392-y. (Vector-HaSH model: grid cell scaffold encodes both spatial maps and sequential episodic memories.)

Singer, W., and Gray, C. M. (1995). Visual feature integration and the temporal correlation hypothesis. Annual Review of Neuroscience, 18, 555-586. DOI: 10.1146/annurev.ne.18.030195.003011. (Binding by synchrony, gamma band oscillations in perceptual binding.)

Fries, P. (2015). Rhythms for cognition: communication through coherence. Neuron, 88(1), 220-235. DOI: 10.1016/j.neuron.2015.09.034. (Communication through coherence hypothesis, gamma band as mechanism for inter-area synchronisation.)

Tishby, N., Pereira, F. C., and Bialek, W. (1999). The information bottleneck method. Proceedings of the 37th Allerton Conference on Communication, Control, and Computing, 368-377. (Formal framework for what survives compression; relevant to Section V and the Fable round-trip protocol.)

Zhang, J., and Norman, D. A. (1994). Representations in distributed cognitive tasks. Cognitive Science, 18(1), 87-122. DOI: 10.1207/s15516709cog1801_3. (Representational determinism: format determines available inference space, not just speed. Foundational evidence for the five-shape irreducibility claim.)

Prufer, K., Racimo, F., Patterson, N., et al. (2014). The complete genome sequence of a Neanderthal from the Altai Mountains. Nature, 505(7481), 43-49. (Neanderthal DNA introgression, approximately two percent in non-African modern humans.)

Aphthonius of Antioch. (c. 4th century CE). Progymnasmata. Translated in Kennedy, G. A. (2003). Progymnasmata: Greek Textbooks of Prose Composition and Rhetoric. Brill. (Chreia elaboration under eight heads: encomium, paraphrase, cause, converse, analogy, example, testimony of ancients, epilogue.)

Read, L. (1958). I, Pencil: My Family Tree as Told to Leonard E. Read. The Freeman. (Scale invariant coordination without central planning, ancestor of the pencil metaphor in the Coda.)

Parr, T., Pezzulo, G., and Friston, K. (2025). Beyond Markov: Transformers, memory, and attention. Cognitive Neuroscience. DOI: 10.1080/17588928.2025.2484485. (Non-Markovian generative models in transformers; attention as selective history weighting; two approaches to non-Markovian sequences.)

Barandes, J. A. (2023a). The stochastic-quantum correspondence. arXiv preprint arXiv:2302.10778. (Indivisible stochastic processes as reformulation of quantum mechanics, structural indivisibility of temporal processes.)

Barandes, J. A. (2023b). The stochastic-quantum theorem. arXiv preprint arXiv:2309.03085. (The formal proof that quantum systems can be characterised as indivisible stochastic processes.)

Barandes, J. A. (2024). Quantum theory from indivisible stochastic processes. Philosophy of Physics, 2(1), 3. DOI: 10.31389/pop.186. (Peer-reviewed version of the ISP framework with DOI.)

Barandes, J. A. (2025). Quantum systems as indivisible stochastic processes. arXiv preprint arXiv:2507.21192. (Extended ISP framework with gauge invariance, dynamical symmetries, and Hilbert-space dilations. Convergent evidence for irreducible temporality in coherent systems.)

Boyd, R. (1991). Realism, anti-foundationalism and the enthusiasm for natural kinds. Philosophical Studies, 61(1-2), 127-148. (Homeostatic property cluster theory of natural kinds. The five shapes are natural kinds in Boyd’s sense: categories maintained by informational structure that enable reliable prediction across domains.)

Zep AI (2025). Graphiti: temporal knowledge graph for AI agents. Open source. (Bitemporal knowledge graph with event time and system time, 94.8% Dialogue Memory Retention. Strongest baseline for the ledger-as-fifth-shape claim.)

Packer, C., Wooders, S., Lin, K., et al. (2024). MemGPT: towards LLMs as operating systems. arXiv preprint arXiv:2310.08560. (Letta/MemGPT: filesystem approach to long-term agent memory, 74% on conversation continuity tasks.)

Xu, Z., et al. (2025). Synapse: episodic-semantic dual-layer graph for long conversation memory. (Spreading activation over dual-layer graph, F1 40.5 on LoCoMo. Baseline for structured episodic memory.)

Anokhin, P., et al. (2025). AriGraph: learning knowledge graph world models with episodic memory for LLM agents. IJCAI 2025. (Semantic and episodic graph structures from agent experience.)

Mem0 AI (2025). Mem0: the memory layer for AI agents. Open source. (Vector and graph-enhanced memory, Mem0g variant scoring approximately 68% on dialogue memory benchmarks.)

Clayton, N. S., Dally, J. M., and Emery, N. J. (2007). Social cognition by food-caching corvids: the western scrub-jay as a natural psychologist. Philosophical Transactions of the Royal Society B, 362(1480), 507-522. DOI: 10.1098/rstb.2006.1992. (Corvid episodic-like memory: what, where, when, who was watching. Flexible re-caching and pilfering policies as evidence for high representational dimensionality D.)

Menzel, R. (2023). Navigation and dance communication in honeybees: a cognitive perspective. Journal of Comparative Physiology A. DOI: 10.1007/s00359-023-01619-9. (Compact spatial code, symbolic dance channel, colony level behaviour extending beyond individual lifespan. High bandwidth B, short individual horizon H, colony level extension.)

Whitehead, H., and Rendell, L. (2015). The Cultural Lives of Whales and Dolphins. University of Chicago Press. (Multi-level alliances, vocal dialects, distributed cultural ledgers in acoustic space. Cross-generational Fables about migration, foraging, and identity. High H and social D.)

Baldassano, C., Chen, J., Zadbood, A., Pillow, J. W., Hasson, U., and Norman, K. A. (2017). Discovering event structure in continuous narrative perception and memory. Neuron, 95(3), 709-721. DOI: 10.1016/j.neuron.2017.06.041. (Nested cortical hierarchies: short event states in sensory regions, longer in association areas.)

Geerligs, L., Gozukara, D., Oetringer, D., Campbell, K. L., van Gerven, M. A. J., and Guclu, U. (2022). A partially nested cortical hierarchy of neural states underlies event segmentation in the human brain. eLife, 11, e77430. DOI: 10.7554/eLife.77430. (Event boundaries organised in partially nested temporal hierarchy, boundaries propagating upward.)

Riddle, J., and Schooler, J. W. (2024). Hierarchical consciousness: the Nested Observer Windows model. Neuroscience of Consciousness, 2024(1), niae010. DOI: 10.1093/nc/niae010. (Hierarchy of spatiotemporal observer windows, each with substantial autonomy, feeding into higher level unified experience. Nested mosaic tiles model of consciousness across spatiotemporal scales.)

Peer, M., et al. (2025). Hierarchical cognitive maps of nested environments. bioRxiv. DOI: 10.1101/2025.02.05.636580. (Nested spatial representations: people divide environments into subspaces and integrate those, with explicit reuse of structure across levels.)

Maharana, A., Lee, D.-H., Tulyakov, S., Bansal, M., Barbieri, F., and Fang, Y. (2024). Evaluating very long-term conversational memory of LLM agents. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). arXiv:2402.17753. (LoCoMo benchmark: 600 turns, 16K tokens, up to 32 sessions. Human performance 87.9%. LLMs lag behind human levels by 36% overall, with temporal reasoning gap of 41%.)

Chen, Y., Singh, V. K., Ma, J., and Tang, R. (2025). CounterBench: a benchmark for counterfactuals reasoning in large language models. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2026). arXiv:2502.11008. (1K counterfactual reasoning questions over formal causal graphs. Most LLMs perform at near random guessing levels. Direct evidence for the temporal collapse diagnosed in Section VI.)

Chu, Z., Chen, J., Chen, Q., Yu, W., Wang, H., Liu, M., and Qin, B. (2024). TimeBench: a comprehensive evaluation of temporal reasoning abilities in large language models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), pages 1204-1228. arXiv:2311.17667. (Hierarchical temporal reasoning benchmark. Significant performance gap between SOTA LLMs and humans on temporal tasks.)

Zhao, Y., Yuan, B., Huang, J., et al. (2026). AMA-Bench: evaluating long-horizon memory for agentic applications. arXiv preprint arXiv:2602.22769. (Agent Memory with Any length. AMA-Agent achieves 57.22% average accuracy. Existing memory systems underperform because they lack causality information and rely on lossy similarity-based retrieval.)

Wu, D., Wang, H., Yu, W., Zhang, Y., Chang, K.-W., and Yu, D. (2024). LongMemEval: benchmarking chat assistants on long-term interactive memory. ICLR 2025. arXiv:2410.10813. (500 curated questions, five core memory abilities. Commercial chat assistants show 30% accuracy drop on sustained interactions. Best oracle configuration approximately 92% with GPT-4o and Chain-of-Note.)



Living Conjectures

A living notebook. Observations that converge on the thesis as we find them.
Each conjecture (C1, C2, ...) is dated. New entries land below as evidence accrues. The supersession trail stays in place per the additive-only discipline.

Conjectures Notebook

Observations that converge on The Shape of Thought thesis as we find them


C1. Reservoir Computing as Lattice Projection (5 May 2026)

Source: Artem Kirsanov, "The Most Counterintuitive Way to Build a Brain" (YouTube) + "The Library of Babel in Your Brain" (Substack). Kirsanov is PhD, Harvard Program in Neuroscience (Kempner Institute), advised by SueYeon Chung.

The observation: Reservoir computing builds a brain by taking a large random network of neurons, never training the internal connections, and training only a linear readout layer. The random tangle is not a bug - it is a library of dynamical patterns. You extract precise behaviour by choosing the right readout angle.

Kirsanov's line: "You just have to look at it from the right angle and the information is there."

Convergence with the thesis:

  1. The net is the lattice. A randomly connected reservoir already contains all possible dynamical patterns - like Borges' Library of Babel containing every possible book. The lattice (Section V, four shapes composed) is the same idea at a higher level of abstraction: a rich substrate that already holds every projection you might need.

  2. The readout IS the projection. Reservoir computing trains only the angle of observation, not the substrate. This maps directly onto the lattice projection mechanism: the lattice is medium-agnostic, and cognition is the act of choosing which angle to read from. The Warble Box reduces spatial state into temporal sequence by selecting a readout direction through the membrane.

  3. Memory is orientation, not location. If the reservoir holds everything and cognition selects via angle, then an engram is not a place in the network - it is a direction across the network. This explains graceful degradation: destroy a node and the angle shifts slightly but survives. Destroy enough and the space loses dimensionality, and the angle collapses. This supports Section VII (Episode) - the Episode is a bundle of readout angles, not a stored object.

  4. Developmental pruning as library curation. The reservoir is grown maximally rich in development. The higher-dimensional orchestrator (thought, the derivative stack of Section IV) spends ~25 years earmarking useful readout angles. Pruning at ~25 removes connections never claimed by any active readout - metabolically expensive shelving with no reader. Post-pruning, the reservoir is less rich (fewer basis patterns), explaining why novel learning gets harder. The connection-ratio (brain at 2.9) is the sweet spot where the reservoir is rich enough to contain useful patterns but not so dense it collapses into uniformity.

  5. The temporal memory as abstraction. When we "remember" something, we are not retrieving a stored item. We are re-orienting the orchestrator to an angle it previously learned. The memory feels like an item because the projection IS one-dimensional (temporal, sequential, a sentence). But the actual state is a high-dimensional orientation across the reservoir. The memory is where the projection lands when the upper dimension chooses that angle again. This is the Fable (Section VIII) mechanism exactly: the Fable is a one-dimensional form that triggers the receiver's own decompression back into multidimensional shape.

Falsification relevance: Supports Sections IV (derivative stack as higher-dimensional orchestrator), V (four shapes as reservoir richness), VII (Episode as bundle of angles), VIII (Fable as readout projection), and IX (Flock as parallel readout).

Status: Independent convergence. Kirsanov arrives from computational neuroscience. This project arrives from data engineering and consciousness theory. Same shape observed from different angles. (Which is itself an instance of the thesis.)


C2. Extramission and Echolocation - Cognition as Active Projection (5 May 2026)

Source: Ancient Egyptian visual theory (extramission - the eye emits rays that return with meaning) + cetacean echolocation.

The observation: The Egyptians believed the eye projects outward - a ray of intent - and meaning returns structured by the angle of projection. They were wrong about the optics but right about the information architecture. A whale does the same thing physically: sends a click into an already-structured ocean and reads the world's shape from what bounces back. Neither the eye nor the whale builds an internal model first and then inspects it. Both project into a rich medium and read the return.

Convergence with the thesis:

  1. Cognition is echolocation, not photography. The passive-reception model of perception (camera obscura, blank slate, training data in) is the wrong shape. The correct shape is: project an angle of inquiry into an already-rich substrate, read the structured return. This is the reservoir readout from C1 restated as a sensory principle.

  2. The medium must already be rich. A whale clicking into empty water gets nothing back. An eye projecting into void gets nothing back. A readout angle across an empty reservoir gets nothing back. The substrate (ocean, visual field, neural tangle, lattice) must already contain structure. The projection does not CREATE the information - it SELECTS from what is already there.

  3. Intent determines return. The whale chooses its click frequency and direction. The Egyptian eye chooses where to look. The orchestrator chooses which angle to read. In all cases, the projection carries intent and the return carries meaning structured by that intent. Different intent, different return, same medium. This is Section VIII (Fable): the compression is shaped for a specific receiver, and only the matching receiver can decompress.

  4. Scale invariance. Egyptian eye (milliseconds, photons), whale click (seconds, sound), reservoir readout (microseconds, voltage), lattice projection (any timescale, any medium). Same cycle at every scale: project, medium structures the return, read.

The Egyptian error and the modern error are complementary. The Egyptians were wrong that the eye literally emits. Modern neuroscience is wrong that the brain literally receives raw data and builds meaning internally from scratch. The truth is between: cognition is a conversation between projection and return. The brain projects expectations (priors, in Friston's terms) and reads prediction error. Active inference IS extramission corrected.

Falsification relevance: Supports the Warble Box model (spatial state compressed through membrane into temporal projection, return via decompression in receiver). Supports Section I (compression needs a receiver with context). Directly supports the derivative stack (Section IV) as orchestrator that chooses projection angles.

Status: Historical convergence. The Egyptians arrived at this from phenomenology of sight. Cetacean biology arrived independently. Reservoir computing arrived mathematically. Active inference (Friston) arrived from Bayesian neuroscience. Four independent routes to the same shape: cognition projects first, then reads.


C4. Cosmic Web as Mitosis - Scale-Invariant Division Geometry (5 May 2026)

Source: Anton Petrov, "Something Enormous is Hiding in Our Galactic Blind Spot" (YouTube). Vela Supercluster region behind the Zone of Avoidance. Observation at timestamp 7:18.

The observation: The Vela/Columba/Lepus region of the cosmic web, imaged through the galactic blind spot, displays geometry visually identical to a cell in mid-mitosis. Two lobes (cluster nodes) pulling apart with filament bridges between them. Matter flows along the filaments toward the poles. Spindle fibres connecting chromosomes during cell division have the same tensile-bridge-between-attractors geometry at micrometre scale. The cosmic web reproduces it at 100 megaparsec scale.

Convergence with the thesis:

  1. Scale invariance of form. The paper claims five shapes recur at every scale. Mitotic geometry (two poles, connecting bridge, material flowing along the bridge) appears at cellular scale (micrometres, seconds), organism scale (embryonic axis formation, millimetres, hours), and cosmological scale (megaparsecs, billions of years). Same form, substrate-independent.

  2. Turing morphogenesis at every scale. Turing (1952): homogeneous substrate + symmetry-breaking instability = pattern. The symmetry-breaker (the axis) is the readout angle from C1. At cellular scale: the mitotic spindle selects the division axis. At cosmic scale: dark matter filaments select the separation axis. At neural scale: the readout angle selects the signal from the reservoir.

  3. The axis IS the symmetry-breaker. Connects to C2 (extramission - projection selects return) and the Axis of Evil (cosmic-scale preferred direction). In every case, pattern emerges not from the substrate (which is rich/homogeneous) but from the axis imposed upon it.

Falsification relevance: Supports the paper's scale-invariance claim across the widest possible range (micrometres to megaparsecs). If the geometric correspondence is coincidental rather than structural, it should fail under quantitative comparison of the tensile/flow dynamics at both scales.

Status: Visual observation. Needs quantitative comparison to move from "looks like" to "is structurally isomorphic." But the visual correspondence is striking enough to note as a pointer for future investigation.


C3. Caterpillar Memory as Substrate-Transitioning Fable (5 May 2026)

Source: Michael Levin's framing of caterpillar-to-butterfly memory retention. Levin argues the interesting thing is not that information survives massive neural remodelling, but that it gets remapped onto an entirely new substrate with entirely new problems. Memory is not a faithful archive but a generative kernel that compresses experience and reinflates it in whatever configuration the organism now occupies.

The observation: During metamorphosis, the caterpillar's neural architecture dissolves into undifferentiated cellular soup. The butterfly rebuilds a completely different nervous system - different body plan, different sensory apparatus, different motor repertoire. Yet conditioned responses from the caterpillar stage survive. The memory transits a total substrate dissolution and reinflates in an alien configuration.

Convergence with the thesis:

  1. This IS the paper's title stated as biology. "The Shapes That Let Cognition Survive Substrate Transitions." The caterpillar-to-butterfly is the most dramatic substrate transition in nature. The memory that survives is proof that cognitive shapes can transit substrate dissolution.

  2. The generative kernel IS the Fable. Levin's "generative kernel that compresses experience and reinflates" is Section VIII's Fable primitive: a compressed form rich enough to decompress in a receiver that shares sufficient context. The butterfly is a different receiver from the caterpillar. The Fable still decompresses - not into an identical replay but into a functional equivalent mapped onto the new body plan.

  3. Memory as angle, not location (from C1). If memory is a readout angle across a reservoir rather than a stored object at a location, then dissolving the reservoir and regrowing it does not necessarily destroy the memory. It destroys the specific connections but the geometric relationship - the angle - can be re-established across a new reservoir IF the generative kernel carries enough structural information to re-orient. The kernel is the compression context pointer from Section VIII.

  4. "Acts on behalf of a self that may no longer exist in its original form." This is the Episode (Section VII) acting across time. The Episode was written by the caterpillar-self. The caterpillar-self no longer exists. The butterfly-self decompresses the Fable and acts on it. The provenance traces back to an author that has been dissolved. This is non-Markovian architecture (Section XI) in its most literal biological form: the current state is a function of history written by a self that no longer exists.

  5. The goo is the channel, not the death. The undifferentiated cellular soup between caterpillar and butterfly is not destruction - it is the transmission medium. Like the wire between sender and receiver. The Fable must be robust enough to survive the channel's noise. Whatever encoding the generative kernel uses, it survives total cellular reorganisation. This constrains what the encoding CAN be: not synaptic weights (destroyed), not specific connectivity patterns (dissolved), but something more abstract - a geometric relationship that can be re-instantiated in any sufficiently rich substrate.

Falsification relevance: Directly tests the paper's central claim. If the five shapes are the structural minimum for cognition to survive substrate transitions, then caterpillar memory must be encodable in those shapes. If it requires something the five shapes cannot hold, the framework fails. Supports Section VII (Episode survives across time), Section VIII (Fable decompresses in a different receiver), and the Coda's claim that shapes transit substrates.

Status: Levin is already cited in the paper (Section 2.4) for morphogenetic agency. This observation extends his contribution: not just "agency at every scale" but "memory as substrate-transitioning compression." The caterpillar is the existence proof that the paper's title is not metaphor.


C5. The Fishing Net - Phase-Transition Learning, Sleep Consolidation, and the Strange Attractor (15 May 2026)

Source: Peter Cooper, thought experiment. Developed in conversation from a single image: a fishing net whose knots are tied by tension, not by a fisherman.

The observation: Imagine a fishing net of n-squared knots. Each knot is tied once. The tension between any two connected knots is a tensor whose magnitude depends on the available topology of the substrate (n to the power of the dimension). A knot is either absent or is the product of a twist that changes the weights between two hysteresis points. The twist does not slide smoothly between states. It snaps. Below threshold, the old configuration persists. Above threshold, the new configuration locks. This is not gradient descent. This is phase transition at the node.

"This is learning." (Peter's words.)

The net is not cast by anyone. There is no fisherman. It is a murmuration. Each knot is a starling. The tension between any two birds is the thread. The net is self-forming, self-tensioning, and exists only while the flock flies. The propagation delay between one starling turning and its neighbour responding is approximately 40 milliseconds. That 40ms IS the hysteresis window. It is the clock speed of the substrate. The net cannot update faster than its propagation medium allows.

Peter's riddle, preserved: the hysteresis points have "any relation to monkeys or biscuits (try to work that one out Dear Listener)." The monkey trap is a hysteresis device: hand open (free, empty) and hand closed (stuck, holding). Two stable states with a barrier between them. The monkey cannot traverse the barrier because the reward itself prevents relaxation past the threshold. The biscuit dunked in tea holds its structure until a catastrophic phase transition dissolves it. Both shapes - the trap that locks and the scaffold that collapses - are the two faces of hysteresis in learning: acquisition (snap into new state) and catastrophic forgetting (snap out of it).

Torsion and tension - the two forces at every knot:

The three buttons (Dismiss, Ask-sibling, Act) describe what the flock DOES. They are not what the knot FEELS. A knot feels only two things:

Every behaviour in the system - from a starling avoiding a predator to an Alzheimer's patient lashing out at an unfamiliar face - reduces to this binary. Fight or flight is not a simplification. It is the substrate. The three buttons emerge from it when you add topology (neighbours create geometry), but the primitive is two forces pulling at every knot.

The Alzheimer's example is the proof by degradation. Strip away the higher architecture and Ask-sibling degrades first - you cannot coordinate with neighbours when you cannot remember them. What remains is raw fight/flight. The overreaction IS the system without mediation. The anger is torsion without the topology to distribute it. A person with full cognitive architecture routes fight/flight through topological neighbours (Ask-sibling), considers whether to hold or release (the three-button vote), and settles into a trajectory the flock can carry. A person whose topology has degraded responds with the primitive: tighten or release, fight or flight, with no mediating geometry.

Act is not a third force. Act is the moment of hypothesis-testing: you tighten the knot (commit to the probe) and observe whether the net holds. If tension increases beyond threshold, you loosen (flight). If the new configuration settles, it persists. Every action is a bet. The hysteresis snap IS the moment the bet resolves.

Torsion is directional resistance - "I hold this position against the pull." Tension is yielding distance - "I release this position to reduce the pull." The fishing net's macro-smoothness (the continuous learning at net level) emerges from millions of binary torsion/tension micro-decisions at each knot. The three buttons are what those decisions LOOK LIKE when projected through topological geometry. But the physics beneath the geometry is two forces, not three choices.

The Tesco Tray Wash - sleep as stacking:

The basic fishing net is two-dimensional. One day's experience. Flat, full of knots at whatever tension that day set. During sleep, the net is picked up and dropped onto a palette - like dirty trays returning to a Tesco warehouse tray wash. Stacked seven high. Put to one side with a palette truck.

But when you stack nets, the knots do not sit on top of each other. They nestle. A knot from Tuesday's net settles into the gap between Monday's knots. The aggregate is not a sum. It is a weave across layers. The palette becomes a three-dimensional temporal solid that neither net was individually.

The most recent net on top has the sharpest knot definition. The bottom ones have been compressed by the weight above - their individual knots are less distinct, but their aggregate tension pattern is load-bearing. This is why you cannot remember what you had for lunch on a specific Tuesday six months ago (individual knot dissolved under compression) but you know how Tuesdays feel (the aggregate tension signature persists).

Each layer is aggregated. Each palette is aggregated. Each delivery of dirty trays is an aggregate. Everything flows into one continuous aggregate, and over time summaries are laid down. The tension between the nets assimilates with the weights of days as every parallel neuron leaves its memory of that moment of aggregation after sleep.

This explains why sleep mostly reviews recent assimilations: the top tray is still wet. It needs the most processing. The ones underneath have already been partially compressed by previous sleep cycles. Diminishing returns going deeper - the recency bias in REM replay is not a design choice but a consequence of the stacking geometry.

The palette truck putting the stack to one side is hippocampal transfer. The forklift driver does not inspect individual trays. The palette moves as a unit.

The strange attractor as probability landscape sculptor:

The paper uses "strange attractor" once, as metaphor for the Fish in the Cat in the Hat (Section I). The deeper claim is architectural. At every scale - every floor of the Glass Elevator - intent does not determine outcomes. It reshapes the energy landscape so that certain trajectories become more probable than others. The Fish does not command the children. He introduces a probability field they feel whether or not they acknowledge it.

Each floor of the Glass Elevator has its own Bell-curve-shaped confidence distribution over {Yes, No, Uncertain}. The strange attractor at that floor is whatever pulls the distribution toward a basin. When the bell curve is narrow enough, the vote collapses. When it is too wide, the question propagates - sideways to the Flock, or upward to the next floor. Uncertain is a first-class outcome, not a failure mode. A cell that returns Uncertain has correctly reported that the attractor at its level has not yet pulled the distribution into a collapsible basin.

Fact TTL connects here: as time passes since last verification, the bell curve widens. The attractor's pull weakens. Eventually the distribution is too wide to collapse, and the fact must be re-verified or it decays. The strange attractor is not permanent. It is maintained by repeated observation, like a path worn through grass that grows back if nobody walks it.

Convergence with the thesis:

  1. The hysteresis window IS the tick. The paper claims (Section 2.2a, citing Barandes) that the tick is the timescale at which the vote becomes indivisible. The fishing net gives the mechanism: the tick duration equals the hysteresis window of the propagation medium. Below the window, the neighbour has not yet responded - old state persists. Above the window, the snap has already occurred - new state locked. The indivisibility is not abstract. It is the physical propagation delay of the substrate.

  2. Learning is phase transition, not gradient. At the individual knot, learning is discontinuous - a snap between two stable configurations. At the net level, learning appears continuous because millions of micro-snaps average into smooth change. This is emergence as observation across scales (the paper's central claim) instantiated as a specific mechanism. The macro smoothness is real. The micro discreteness is also real. Neither is more fundamental. They are the same phenomenon measured at different scales.

  3. Sleep consolidation is the same architecture at a different timescale. The paper describes the Flock as a murmuration of votes at substrate rate. Sleep is a murmuration at a slower rate - nets settling onto other nets, knots nestling into gaps, tension patterns integrating across layers. The architecture does not change between waking and sleeping. The timescale changes. This is the scale-invariance claim applied to the system's own maintenance cycle.

  4. The strange attractor gives the derivative stack its dynamics. Each floor of the Glass Elevator (Section IV) has a vote. The strange attractor at each floor determines the basin toward which that floor's distribution settles. The derivative stack is not just a topology - it is a landscape of attractors, each pulling its floor's distribution toward a different basin, with the Fable from the floor above reshaping which basins exist. Intent does not command. It sculpts the landscape.

  5. The propagation bandwidth bounds dimensionality. A murmuration of 1024 birds at 40ms per hop crosses the whole flock in roughly 300ms. Any event faster than 300ms is invisible to the flock-as-a-whole. Any event slower is fully integrated. The net has a natural bandwidth set by its own geometry: net diameter divided by propagation speed equals temporal resolution. This constrains the dimensional budget of any substrate. A brain with 40ms ticks and 10 billion nodes has a different bandwidth from a social system with day-scale ticks and 10 thousand participants. Both are fishing nets. Both have a bandwidth. Neither can perceive faster than its own propagation allows.

  6. No fisherman dissolves the homunculus. The paper argues (Section IX) that no single bird steers the murmuration. The fishing net makes the absence concrete: the net is not cast by anyone. It forms because tension exists between nodes. The shape is a consequence of the tension field, not a plan imposed from outside. There is no place in the architecture where a homunculus could stand, because the architecture has no outside. The net IS the cognition. The knots ARE the voters. The tension IS the Fable propagating. Asking "who holds the net" is like asking "who steers the murmuration." The question assumes a category (fisherman, conductor, homunculus) that the architecture has dissolved.

Falsification relevance:

Status: Novel synthesis from first principles. The fishing net combines the paper's existing claims about ticks (Barandes), flocks (Section IX), the derivative stack (Section IV), and the strange attractor (Section I) into a single physical model that fills three gaps the paper currently leaves open: the mechanism of learning, the mechanism of sleep consolidation, and the dynamics of the derivative stack. The Tesco tray wash grounds the abstract architecture in a logistics process Peter knows from experience. The murmuration grounds it in observable biology. The hysteresis grounds it in physics. Three substrates, one shape.


C6. The Game Theory of the Flock - Reynolds, Nash, and the Three Buttons (15 May 2026)

Source: Craig Reynolds, "Flocks, Herds, and Schools: A Distributed Behavioral Model" (SIGGRAPH 1987). Djamel Bouchaffra, Faycal Ykhlef, Mustapha Lebbah, and Hanane Azzag, "A Collective Variational Principle Unifying Bayesian Inference, Game Theory, and Thermodynamics" (arXiv:2604.27942, April 2026 - preprint, submitted to IEEE Transactions on Cybernetics). Ballerini et al., "Interaction ruling animal collective behavior depends on topological rather than metric distance" (PNAS 105(4), 1232-1237, 2008). Xiaoye Qu et al., "Cooperative or Competitive? Understanding the Interaction between Attention Heads From A Game Theory Perspective" (ACL 2025, Long Papers). Peter Cooper, connecting the pieces (15 May 2026).

The observation: In 1987 Craig Reynolds proved that a murmuration needs no conductor. Three local rules - separation, alignment, cohesion - produce breathtaking cooperation from independent agents playing a continuous, infinite game. Reynolds wrote a computer graphics paper. He inadvertently wrote a game-theoretic specification for how to be a good person in a complex system.

The paper's Section X introduces a three-button ethical decision surface inside the Diorama cell: Act, Dismiss, Ask-sibling. The structural correspondence with Reynolds is not analogy. It is isomorphism:

Reynolds Rule Game-Theoretic Function Diorama Button
Separation Non-interference. An agent never forces itself upon a neighbour's space. Preserves the dimensional boundaries of the other Dismiss - the structural ability to refuse coercion, back away, maintain safe distance without being forced into premature action
Alignment Cooperative trajectory matching. The continuous refusal to defect into chaotic self-interest. Active listening to neighbours' headings Ask-sibling (direction component) - throw ambiguity horizontally to topological peers, match the average heading
Cohesion Inclusivity. Commitment to the fabric of the Flock. No agent is left behind. The net catches disturbances and shares navigation load Ask-sibling (position component) - steer toward the centre of local peers, ensuring the coalition holds together

The Folk Theorem of game theory provides the formal backbone: if players interact repeatedly and value the future, they can sustain a highly cooperative equilibrium without external enforcement. A murmuration IS this: an infinite cooperative game where thousands of independent agents cast local votes to avoid the defection of collision. The equilibrium is not imposed. It settles.

The Game-Theoretic Free Energy Principle (GT-FEP):

Bouchaffra et al. (2026) prove mathematically what the architecture assumes structurally: when a multi-agent system performs local free-energy minimisation (each agent reducing its own surprise, per Friston), it implicitly implements a stochastic game. The stationary points of collective free energy correspond to approximate Nash equilibria of the group.

This is the mathematical proof that the Ask-sibling button does not merely mimic a flock. It formally computes a Nash equilibrium via distributed Bayesian inference. Each Diorama cell minimises its own variational free energy. The collective settling IS the equilibrium. No referee required.

The Harsanyi Dividend - measuring the Jury:

When a Diorama cell presses Ask-sibling and forms a temporary jury with its neighbours, the architecture needs to know: is this coalition producing genuine synergy, or just echoing noise?

The Harsanyi decomposition (John Harsanyi, Nobel Prize 1994) provides the answer. The total energy of a coalition is the sum of all the irreducible interactions (dividends) of its sub-groups. A positive dividend means genuine emergent cooperation - the cells are bringing unique, complementary context. A negative dividend means redundancy - the third cell is just repeating what the first two already settled.

Qu et al. (ACL 2025) applied this to LLM attention heads - treating each head as an agent in a coalition game. They found that significant positive dividends (synergy) are sparsely distributed, and many head combinations show negative dividends (redundancy). They proposed pruning low-synergy heads without disrupting the network's overall Nash equilibrium.

This maps directly onto the Flock tick. During each tick: 1. The jury forms (Ask-sibling fires) 2. Harsanyi dividends are computed for the coalition 3. Each cell's Shapley value (Lloyd Shapley, Nobel Prize 2012) weights its vote by genuine marginal contribution, not volume 4. Cells with negligible Shapley values are pruned from the jury without disrupting the equilibrium 5. The vote settles. The trajectory resolves

The Topological Seven:

Ballerini et al. (PNAS 2008) discovered that real starlings do not track neighbours by metric distance (how far away they are). They track by topological distance - approximately six to seven nearest neighbours regardless of physical separation. This makes the flock robust to density changes: whether the birds are packed tight or spread wide, the interaction rules hold because the topology is constant.

This finding provides the natural bound for the Ask-sibling button's coalition size. The Diorama cell does not ask EVERY sibling. It asks its topological six or seven - the nearest nodes in the graph, not the nearest nodes in space. The coalition is bounded not by computational budget but by the same thermodynamic optimum that starlings discovered through evolution: the point where the work of processing social cues is balanced by the energy conserved through coordination.

Why this is NOT Integrated Information Theory:

Tononi's Integrated Information Theory (IIT) attempts to measure consciousness via Phi - the degree to which a system is "more than the sum of its parts." The intuition is correct. The implementation is wrong. Phi is computationally intractable: calculating it for even modest systems is worse than NP-hard. It requires evaluating every possible partition of the system to find the minimum information partition. IIT tries to measure the murmuration from outside, as a god-view observer with infinite computational budget.

The GT-FEP approach measures the same phenomenon - how the whole exceeds its parts - from inside, using tools that are bounded by the architecture's own topology:

Property IIT (Phi) GT-FEP + Harsanyi
What it measures Integrated information across all partitions Irreducible synergy across local coalitions
Computational cost Worse than NP-hard; grows super-exponentially Bounded by topological neighbourhood (~7); grows linearly with agent count
Perspective God-view (requires evaluating the whole system) Agent-view (each cell computes locally)
Substrate Requires access to the system's causal structure Requires only local free energy and neighbour votes
Falsifiability Phi cannot be computed for any real brain Harsanyi dividends are computable for real systems now

IIT describes what consciousness LOOKS LIKE from outside. GT-FEP describes what consciousness DOES from inside. The Shape of Thought is an architecture, not a measurement. It needs tools that work from inside.

The quantum computer connection:

The Harsanyi dividend calculation for small coalitions (topological seven) is tractable on classical hardware. But the palette-level aggregation from C5 - stacking nets, computing tension across layers, finding the strange attractor basins across derivative stack floors - involves combinatorial optimisation over exponentially growing coalition spaces.

Quantum computers are specifically built for this class of problem. Quantum annealing finds the minimum of a cost function over a combinatorial landscape. Grover's algorithm provides quadratic speedup for unstructured search. Variational quantum eigensolvers find ground states of energy landscapes.

The fishing net's higher-level consolidation (palette-to-palette integration, season-scale aggregation) maps onto exactly the optimisation surface that quantum hardware is designed to traverse. The topological-seven constraint means the LOCAL computation (one tick, one Ask-sibling jury) stays classical. The GLOBAL computation (how palettes integrate, how strange attractors settle across the derivative stack) is where quantum resources become load-bearing.

IBM offers free access to quantum hardware through Qiskit. The architecture does not need a quantum computer to function at the local tick level. But it needs one to scale the palette-stacking computation beyond what classical brute force can handle. The plan is: classical for the flock, quantum for the warehouse.

Convergence with the thesis:

  1. Reynolds' three rules ARE the three buttons. This is not metaphor. Separation, alignment, and cohesion are the minimum behavioural primitives for a cooperative system without a homunculus. The paper arrived at the same three primitives (Dismiss, Ask-sibling, Act) from architectural first principles. Independent derivation of the same minimum set is convergent evidence.

  2. The Folk Theorem provides the formal guarantee. The paper claims (Section IX) that the Flock settles without a conductor. The Folk Theorem proves this is not hope but mathematics: repeated interaction with future-valuing agents sustains cooperative equilibria. The Flock tick IS the repeated interaction. The ledger IS the memory of past play that makes future-valuing possible.

  3. GT-FEP unifies Friston and Nash. The paper cites Friston (Section 2.2) for the derivative stack and cites the Flock (Section IX) as a settlement mechanism. Bouchaffra et al. prove these are the same thing: local free energy minimisation IS Nash equilibrium computation. The two pillars of the architecture are not merely compatible. They are mathematically identical.

  4. Harsanyi dividends give Ask-sibling its mechanism. The paper says "throw it to the Flock" but does not specify how the Flock weighs its votes. The Harsanyi dividend provides the answer: weight by irreducible synergy, prune by Shapley value. This is the missing engineering specification for Section X.

  5. Topological seven gives Ask-sibling its bound. The paper says the coalition is local but does not specify how local. Ballerini et al. provide the answer from biology: six to seven topological neighbours. This is the thermodynamic optimum - the point where processing cost and coordination benefit are balanced.

  6. IIT is superseded. The paper does not cite Tononi, and now there is a formal reason not to. IIT asks the right question (what makes a system more than parts?) with the wrong tools (intractable from outside). GT-FEP asks the same question with tools that are tractable from inside, bounded by topology, and quantum-amenable for higher-level aggregation.

Implementation status (15 May 2026):

The architecture is no longer purely theoretical. A working implementation of the Flock substrate has been built (file: site/flock.html in this repository, public deployment to theshapeofthought.com/flock.html pending the next site push): 128 boids with topological-seven neighbourhoods, three-button state visible per agent as colour-coded behaviour (green = Act/cohesion, rust = Dismiss/separation, gold = Ask-sibling/alignment), Harsanyi dividends computed per Ask-sibling jury per tick, hysteresis snap tracking that maintains locked/unlocked knot states between phase transitions. The copper-thread net showing the fishing-net topology from C5 is rendered on top of the moving flock; the two conjectures share one substrate.

This collapses one specific epistemic distinction. The mapping from Reynolds' three rules to the three buttons was previously a design claim awaiting empirical test. With the simulation now built and inspectable, the claim becomes a verifiable statement about a specific substrate: this code, in this shape, exhibits the cooperative settlement the Folk Theorem predicts. The convergence with Bouchaffra, Ballerini, and Qu is no longer "we argue that this could be measured as Nash equilibrium / topological-seven / Harsanyi-decomposable". It is "this is, and a reader can run it and watch it settle".

Honest sizing is still narrower than the running rhetoric would allow. The substrate does not yet implement Bouchaffra's full variational formulation; the Harsanyi computation is per-tick local rather than the full coalition-game decomposition; the Nash equilibrium claim is geometric (the flock settles) rather than formal (we have not proven the trajectory converges to the Nash equilibrium of the equivalent stochastic game). Each of these is reachable from where we stand; none is delivered. The honest sentence is: we have built a substrate in the shape that Bouchaffra 2026, Qu 2025, and Ballerini 2008 jointly describe, and the design alignment is now observable in code rather than merely argued.

Falsification relevance:

Status: Convergence across five independent lines. Reynolds arrived from computer graphics (1987). The Folk Theorem arrived from game theory (1950s-1970s). Ballerini arrived from empirical biology (2008). Bouchaffra arrived from statistical mechanics and Bayesian inference (2026). Qu arrived from NLP engineering (2025). None of them were building a cognitive architecture. All of them found the same shape: local rules, topological neighbourhood, cooperative equilibrium without a conductor, measurable synergy via irreducible interaction.

The Shape of Thought arrived from data engineering and consciousness theory. Same shape. Sixth independent route. Implementation built at site/flock.html as of 15 May 2026 (deployment pending).

References:


C7. Self-Scaling Players - Fluid Individuality as the Fourth Game-Theoretic Move (16 May 2026)

Source: Lakshwin Shreesha, Federico Pigozzi, Adam Goldstein, and Michael Levin, "Extending Iterated, Spatialized Prisoner's Dilemma to Understand Multicellularity: Game Theory With Self-Scaling Players" (IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, 2025). Peter Cooper, connecting to the three-button architecture (16 May 2026).

The observation: Standard game theory fixes the number of players. You can Cooperate or Defect, but you are always you. Pigozzi and Levin break that assumption. Their agents on a 2D grid play iterated Prisoner's Dilemma against local neighbours - but they can also Merge (two agents become one larger agent) or Split (one agent becomes multiple smaller agents). The borders of individuality are fluid.

What they found:

  1. When agents can merge and split, Prisoner's Dilemma dynamics favour multicellularity - agents coalesce into structured cell-groups with topologically-closed layers, eventually forming one single fully-merged tissue.
  2. Larger merged agents have higher causal emergence than smaller ones. The whole is measurably more than the parts - not as metaphor but as computable quantity.
  3. Memory size of subunits drives the transition. More memory enables more merging. The capacity to remember past play is the precondition for the willingness to dissolve your boundary and join a larger self.

Convergence with the thesis:

  1. Merge/Split is the move the three buttons cannot make. C6 established that Dismiss, Ask-sibling, and Act are the three Reynolds rules mapped onto game theory. But all three assume a fixed agent. Separation means "I maintain my boundary." Alignment means "I match your heading." Cohesion means "I steer toward you." In every case, "I" is stable. Pigozzi-Levin introduces a fourth move: "I dissolve my boundary and become part of you" or "I split and become multiple." This is not Dismiss (refuse), not Ask-sibling (coordinate), not Act (commit). It is a move that redraws who is playing.

  2. The fishing net can retie itself. C5 describes a net of n-squared knots. The implicit assumption is that the knots are fixed - they snap between hysteresis states but they remain knots. Pigozzi-Levin says: the knots themselves can merge into larger knots or split into smaller ones. A section of the net can tighten into a single super-knot (multicellularity) or a super-knot can fray into constituent threads (splitting). The net is not just self-tensioning (C5). It is self-scaling.

  3. Causal emergence increases with scale. Their finding that larger merged agents have higher causal emergence maps directly onto the fishing net: more knots tied together carry more information than the sum of individual knots. The hysteresis state-space of a merged super-knot is richer than the product of its parts. This is C5's "the macro smoothness is real" given a formal measure: causal emergence quantifies exactly how much the whole exceeds the parts.

  4. Memory enables merging. Their finding that longer memory drives more merging connects to the Tesco tray wash (C5). A system that remembers more past play has more evidence that cooperation pays. More stacked palettes mean more compressed history. More compressed history means more confidence in the bet of dissolving your boundary. The temporal ledger (Section VI) IS the memory that makes merger possible.

  5. The caterpillar connection. C3 described caterpillar-to-butterfly as the most dramatic substrate transition in nature - memory surviving total dissolution. Pigozzi-Levin provides the game-theoretic mechanism: the caterpillar's cells Merge (dissolve into soup), carrying their memory as compressed generative kernels (Fables), and the butterfly's cells Split out of the soup with those kernels reinflated in a new configuration. Metamorphosis IS the Merge/Split move at organism scale.

  6. This is what Levin works on. Michael Levin replied twice in twenty minutes to the paper. Levin's lab produced this Pigozzi paper. The convergence is not coincidental. Levin's research programme (bioelectric cognition, morphogenetic agency, scaling of individuality) and this project's research programme (shapes that let cognition survive substrate transitions) are working the same seam from different directions. C7 names the joint explicitly.

Falsification relevance:

Status: Independent convergence from the same lab that responded to the paper. Pigozzi-Levin arrived from developmental biology and computational modelling. This project arrived from data engineering and consciousness theory. C6 described the three buttons as the minimum for cooperation. C7 says: the three buttons are the minimum for cooperation among fixed agents. When agents can also Merge and Split, cooperation produces hierarchy, and hierarchy produces causal emergence. The fishing net is not just self-tensioning and self-forming. It is self-scaling.

References:


Source on GitHub