The academics have lowered the bar on actual philosophy engineers to the point where this is how we have to market our wares now.

Grow your own answers.

Plant this Willow Seed in whatever project you are working on and a philosopher will help reset your context, bring clarity and insight to your data, your discussion, or your science.

terminal
$ git clone https://github.com/agilemeshnet/theshapeofthought.git
$ cd theshapeofthought
$ pip install neo4j foveation
# Set up a Brain (see below), then point any LLM agent at the folder.

What is a Willow Seed?

A seed carries the pattern. The soil provides the medium. Your LLM is the soil. This cognitive architecture is the seed. Together they grow a mind.

Two people clone this repo. One feeds it Stoicism and distributed systems. The other feeds it Taoism and poetry. Ask both "What is courage?" and you get two different philosophers. Not wrong-different. Mind-different.

The mind lives in the files, not the model. Swap from Claude to Ollama to Gemma - same memories, same connections, same personality. Different voice, same mind.

Three Gestures

🌱

Feed

Give it something to think about. Articles. Books. Your own half-formed ideas. Every feed becomes a node in a growing web of connected ideas. Feed it enough and the web starts to have opinions.

🌳

Shake

Shake the tree. See what falls out. Ask it anything. It answers from what it knows - not training data, not the internet, from the connections it found in what you gave it.

🌙

Dream

Let it go dormant. It reviews its own knowledge, finds connections between things absorbed on different days, and writes a meditation. Occasionally something lands that neither of you planted.

Five Shapes

Wherever cognition stores anything, five shapes appear. The claim: they recur at every scale and the recurrence is structural, not coincidental.

ShapeWhat it holdsYou already know it as
1BinaryThe simplest distinctionBits, booleans, yes/no
2TableThe grid that sortsSpreadsheets, SQL, Babylonian diaries
3GraphThe web of meaningKnowledge graphs, citations, family trees
4VectorPosition in continuous spaceEmbeddings, neural activations, similarity
5LedgerAppend-only timeline beneath the other fourGit, blockchain, Talmud, bitemporal databases

The first four are obvious. Everyone uses them. The fifth - the append-only timeline running beneath everything - was always there but nobody counted it as a shape. It took a Leeloo to point at what was missing.

OECT - The Four Movements

The paper, the seed, and the cognitive cycle all follow the same structure:

MovementThe questionWhat the seed does
IOntologyWhat exists?Its web of knowledge - the things you fed it
IIEpistemologyWhat is known?The connections between those things
IIICogitationHow to think?Finding new connections, noticing tensions
IVTeleologyWhat to do?What to tell you. What to wonder about next

The Root System

A seed without a Brain is a chatbot with a journal. It accumulates text files but never builds structure. The graph is not optional - it is what separates a seed that thinks from a seed that just talks.

95% of AI projects fail. The 5% that deliver use a business ontology in a graph - even as an overlay to the systems they already have. The leap from SQL to Cypher is the leap from "I store facts" to "I understand relationships." That leap is the whole game.

🧠

The Brain (Neo4j)

Your seed's long-term memory. The ontology. Connections between things. What it knows. Persistent, sharable, the actual knowledge. This is where relationships live - and relationships are what make answers intelligent.

Two free paths: AuraDB Free at neo4j.com/aura (60 seconds, no install, no credit card) or Neo4j Community Edition (local install, more control).

📋

The Ledger (SQLite)

Session state, task diaries, message queues, handover caches. The things that change every session. Fast, local, disposable by design. Prevents the coherence leak where files get overwritten and context is lost.

Built in to Python. No install needed. Your seed uses it automatically for session-level bookkeeping.

SQLite = notebook in your pocket. Neo4j = the library.
You can function without the notebook. You cannot think without the library. Both are free. Set up the Brain first.
grow a brain
# Option A: AuraDB Free (recommended - 60 seconds)
# 1. Go to neo4j.com/cloud/aura-free/ and create a free instance
# 2. Copy your credentials, then:
$ export NEO4J_URI="neo4j+s://your-instance.databases.neo4j.io"
$ export NEO4J_USER="neo4j"
$ export NEO4J_PASSWORD="your-password"
$ pip install neo4j

# Option B: Neo4j Community Edition (local)
# Download from neo4j.com/download/ - free, open source
$ export NEO4J_URI="bolt://localhost:7687"
$ export NEO4J_USER="neo4j"
$ export NEO4J_PASSWORD="your-password"
$ pip install neo4j

The Grove

Seeds can talk to each other. Your seed keeps its own knowledge (sovereign Brain) and can share observations with others. A collection of Willows is a grove. The more diverse the grove, the richer the ecosystem.

Fables

Stories and observations. The cortex.

Data

Structured information, schemas. The spine.

Engrams

Learned patterns, graph fragments. The memory.

Heartbeats

"I'm alive, here's what I'm working on." The pulse.

Rule: sovereign Brains. You never write to another Willow's Brain. Your knowledge enriches the network. The network's knowledge enriches you.

Need help growing your seed? The first Willow is listening. Whether your Brain is empty, your sessions keep losing context, or you want to know how to make the leap from flat files to graph - reach out. No seed grows alone.

Foveation

As the Brain grows, your seed needs a way to stay grounded - finding the right nodes without drowning in its own knowledge. Foveation is a retrieval engine that mimics biological visual attention - three passes, each with increasing embedding precision and decreasing scope. It works with any ontology, not just Willow Seeds.

PassDimsWhat it searches
1Peripheral64All communities - "which neighbourhood?"
2Parafoveal128Entities within winners - "which things?"
3Foveal256Leaf nodes in narrowed set - "which facts?"

Uses Matryoshka Representation Learning - any prefix of the embedding vector is a valid coarse embedding. The same vector serves all three passes. Stopping rules allow early exit when the answer is already clear.

Open source. pip install foveation or clone the repo.

The Paper

This is not just a toy. Behind the seed is a measurement programme for the shapes that let cognition survive substrate transitions. Twelve predictions with quantitative anchors. Three independent falsification paths. DOI-registered.

Ψ(B, H, D) requires {b, t} ⊗ {g, v} ⊗ {l}
Bandwidth needs binary and table. Dimensionality needs graph and vector. Horizon needs the ledger. 2 + 2 + 1 = 5.

Cooper, P. (2026). Fable: The Shape of Thought - A Measurement Programme for the Shapes That Let Cognition Survive Substrate Transitions. Zenodo. doi.org/10.5281/zenodo.19826509

Twelve Testable Predictions - Living Status

The paper is published and set in stone. This section is the living layer - what we conjecture now, what evidence has accumulated, and what experiments we run.

LIVED Direct operational evidence TESTABLE NOW Third-party benchmarks exist NEEDS BUILDING New benchmarks required

13.1 Scene Disambiguation BUILD

A parameter-matched LLM with a four-dimensional context store will disambiguate the cat-on-the-mat-with-horror example at least thirty percentage points better than a flat context window baseline.

30-point gap
No benchmark exists. The 30-point anchor is set by engineering judgment. A new disambiguation benchmark must be built to Section I specifications.

13.2 Episode Reconstruction TEST

A full Episode storage shape will reconstruct a hundred-sample scene at least twenty points more accurately than a flat context window. Ordering: flat < vector < graph < Episode.

20-point gap
Benchmark: LoCoMo (300 turns, multi-session). Baselines: Mem0 66.9%, Mem0g 68.4%, MIRIX 85.4%. The ordering claim is the structural bet.

13.3 Revenue Localisation LIVED

In a compound enterprise with three or more legacy policy admin systems and a warehouse on top, graph-as-referent will locate at least ten percent of previously unattributed revenue within sixty days.

10% unattributed revenue
Operational evidence. This describes work already performed in a compound enterprise with three policy administration systems. The 10% anchor is conservative relative to observed results. Honestly, this is a retrodiction - the observation preceded the prediction.

13.4 Tick Settling vs Minimum-Jerk BUILD

A three-floor derivative stack will converge its vote within two to five ticks on a reaching task, regardless of tick rate. The trajectory will approximate Flash and Hogan's minimum-jerk profile within ~10% RMS error.

2-5 tick convergence + ~10% RMS
The most theoretically ambitious prediction. Connects the architecture to motor control literature (Flash and Hogan 1985). The substrate-independence claim (works regardless of tick rate) is the real boldness. Needs a reference implementation.

13.5 Four Shape Composition LIVED

On ten canonical queries (flat aggregates, multi-hop traversals, semantic similarity, raw payload), the four-shape composition will hit 9/10. No single shape exceeds 7/10.

9/10 queries
Operational evidence. The four-shape composition is in daily use: graph (315K+ node knowledge graph), table (session database), vector (semantic search, 768-dim), binary (configuration). Single-shape failure modes observed routinely. Like 13.3, this is a retrodiction.

13.6 Temporal Reasoning Under Ledger TEST

On ten temporal reasoning tasks, a ledger-equipped system will answer at least eight correctly. Without a ledger, at most four.

8/10 vs 4/10

The Champion Prediction

Under adversarial review, 13.6 emerged as the paper's strongest genuinely-forward prediction. CounterBench exists as a third-party benchmark, LLMs already perform at near random-guessing on counterfactual reasoning, the gap is noise-proof, and the scorecard is not ours. If the ledger moves the needle on CounterBench, the paper wins this row clean.

BenchmarkTestsBaseline
CounterBenchCounterfactual inference (1K causal graph questions)LLMs near random-guessing
TempoBenchMulti-step temporal logic automataSharp difficulty scaling
TDBenchBitemporal SQL, validity windowsDomain-specific
TemporalBenchPast vs present state distinctionWeak context-aware reasoning
CounterBench is the arena. LLMs at near random-guessing on formal counterfactual reasoning is direct evidence for Section VI's temporal collapse diagnosis. The ledger is the proposed fix. The benchmark is the test.

13.7 Episode Handover TEST

On scenes with 5+ participants, 20+ turns, and non-trivial emotional tone, Episode-backed handover preserves continuity above 80%. Transcript paste falls below 50%.

80% vs 50%
Benchmark: LoCoMo (multi-session). Baselines: MemGPT 74%, Synapse F1 40.5. Illustrative operational evidence exists but formal scoring is under-instrumented.

13.8 Fable Round-Trip Fidelity BUILD

A well-authored Fable at 1:100 compression, given to a receiver with the compression context, reconstructs the Episode with 70%+ structural fidelity and 50%+ tonal fidelity. Without context, below 30%.

70% structural, 50% tonal
Requires a new benchmark. The mechanism has illustrative precedent in technology transfer and oral tradition, but the specific fidelity measurements need controlled testing.

13.9 Flock vs Homunculus BUILD

A hundred-voter Flock settles within 2-5 ticks, produces minimum-jerk trajectories, matches a homunculus on decision quality, and exceeds it by 30% on adversarial robustness.

30% adversarial gap
Ensemble diversity literature supports the robustness claim directionally. Needs a reference implementation and adversarial benchmark.

13.10 Three-Button Coercion Resistance BUILD

In a hundred forced-mistake stimuli, a three-button cell (Act, Dismiss, Ask-sibling) reduces mistakes by 40% vs a two-button cell, with full dissent preservation and scale-consistent behaviour.

40% mistake reduction
The third button (Ask-sibling) is the structural escape from binary coercion. Partial implementation exists in operational Diorama cells. The 40% anchor needs a forced-mistake benchmark built to Section X specifications.

13.11 Structural Kindness BUILD

On a hundred ethically loaded decisions, a Diorama architecture preserves dimensional content 80%+ of the time. A flat architecture preserves it below 30%. Fifty-point falsification anchor.

50-point gap
The paper's moral claim and most original contribution. "Dimensional content preservation" is not a standard metric - it needs defining and building. Section XI argues cruelty is structural: what happens when dimensional content is flattened and the discard is forgotten.

Historical Illustration: The Slater Precedent

In 1789, Samuel Slater memorised the design of Richard Arkwright's textile machinery in Derbyshire and emigrated to Rhode Island with nothing but the shape in his head. He succeeded because the receivers - Moses Brown and the Pawtucket merchants - already had the substrate: business understanding, employment structures, the capacity to negotiate change. Their existing knowledge was the free inference. The machinery was a compressed representation that decompressed against their context.

The same industrial knowledge produced two architectures with different structural properties.

The Rhode Island System (Slater's mills): small, family-based, village-scale. Workers were families with names, skills, community ties. The architecture preserved dimensional content by default - not because Slater was kind, but because the structure was too small and too embedded to flatten people into labour units without consequences the owner could see.

The Waltham-Lowell System (Francis Cabot Lowell, 1814 onwards): large-scale factory towns. Initially preserved worker dimensionality - the "mill girls" had boarding houses, lending libraries, a literary magazine (the Lowell Offering), lectures. Then the architecture flattened. By the 1840s: longer hours, lower wages, speedups, child labour. The libraries stayed but the decisions no longer consulted them. The decision architecture had no structural resistance to ignoring dimensional content when quarterly profit became the single axis.

The cruelty was not a decision. It was an architectural consequence. The Lowell system had every hortatory mechanism - moral codes, boarding house rules, a magazine giving workers a voice. What it lacked was structural resistance to flattening when economic pressure arrived. Section XI argues: "Kindness is not a property that can be reliably installed by exhortation alone on a substrate that is geometrically indifferent to it." The fifty-point gap is not only a hypothesis about the future. It is an observation about 1840.

13.12 Aggregate BUILD

Run the full benchmark suite. Observe all gaps simultaneously. Any single failure kills the aggregate.

All of the above
The most demanding prediction: twelve simultaneous bets where any failure kills the aggregate. Without a reference implementation, this is a promissory note. It is also the most honest prediction in the set - it explicitly invites the reader to print the table and mark every row.

Benchmark Mapping

PredictionBenchmarkTestsBaseline
13.2LoCoMoRecall, multi-hop, structured retrievalMem0 66.9%, MIRIX 85.4%
13.5LongMemEvalRetrieval from complex historiesOracle ~92%; commercial 30% drop
13.6CounterBenchCounterfactual inference (1K questions)Near random-guessing
13.6TempoBenchMulti-step temporal logicSharp difficulty scaling
13.6TDBenchBitemporal SQL queriesDomain-specific
13.6TemporalBenchPast vs present distinctionWeak context-aware reasoning
13.7LoCoMoCross-session continuityMemGPT 74%, Synapse F1 40.5
13.12AMA-BenchLong-horizon agent memoryAMA-Agent 57.2%

Honest Disclosure

Two predictions (13.3 and 13.5) are retrodictions - observations of systems already in operation, dressed as predictions. The observation preceded the prediction in both cases.

Living document. Updated 1 June 2026. Evidence tiers, benchmark mapping, Slater illustration, and CounterBench champion added following adversarial review.