The Egregore's Button

The word “cooperation” is the replicator's camouflage

Elias Kunnas

The blue and red button thought experiment circulates as a moral dilemma. Red strictly dominates blue. The interesting question is why anyone thinks otherwise — a recursive belief structure miscategorizes the game as cooperation, and population-level dynamics that no participant models do the rest.

I. The Payoff Matrix

The experiment: everyone on earth presses a red or blue button in a private vote. If more than 50% press blue, everyone survives. If less than 50% press blue, only red-pressers survive.

	Blue wins (>50%)	Blue loses (<50%)
You press red	Survive	Survive
You press blue	Survive	Die

Red weakly dominates blue for self-preservation: in the only state where your vote could be pivotal, red survives and blue survives equally. The difference is that blue can die in non-pivotal losing states. This is not a prisoner’s dilemma — in a PD, mutual cooperation beats mutual defection. Here, the outcome when blue wins is identical for red and blue pressers. There is no cooperation surplus. Blue accepts maximal downside for zero self-regarding upside.

II. This Is Not Cooperation

The discourse frames this as “don’t kill anyone / maybe kill others” (blue) versus “don’t die / maybe die” (red). But pressing red does not kill anyone. Blue-pressers are killed by blue failing to cross 50%, which is a property of the population’s aggregate choice, not of any individual red-presser’s action. The “you’re killing people” frame smuggles causal responsibility for a collective outcome onto individual non-participants.

The deeper miscategorization: calling blue “cooperation” and red “selfishness.” Cooperation means accepting individual cost to produce mutual benefit exceeding the cost. A dam requiring 50% of a town’s labor is a cooperation problem — if nobody builds it, everyone drowns.

The button experiment has no dam. If nobody presses blue, all red-pressers survive. Universal defection produces universal survival. No collective good is produced. The aesthetic pattern — group threshold, individual choice, social framing — looks cooperation-shaped. The word “cooperation” is borrowed from games where it applies and pasted onto one where it doesn’t.

III. The Base

Strip away everything recursive. Who genuinely presses blue?

People who would press blue knowing it will fail. People confused about the payoff matrix. People indifferent to survival. This base exists but is genuinely small — most people who “value identity over survival” would switch to red once they learned blue was losing, because dying for a lost cause preserves nothing. The identity-driven blue-presser who switches when the egregore breaks is not base — they are in the tower.

So where does the intuition that blue might win come from?

IV. The Egregore

Everyone else pressing blue is modelling. They model that enough others will press blue. Those others are also modelling that enough others will press blue. A recursive belief structure anchored on the minority base, built on mutual modelling of mutual modelling. No individual in the tower needs to sincerely prefer blue. They only need to believe that enough others believe that enough others believe.

Call this the egregore — a collective belief entity that exists as the population’s model of itself. It manufactures its own evidence: if enough people believe it (“humanity cooperates”), they press blue, blue wins, and the belief is confirmed. The egregore bootstraps itself into existence through recursive belief alone.

Most discourse stalls at two levels. Level 0: “which button do I press?” Level 1: “what will others press?” Rarely does anyone reach Level 2: “what are others modelling others as pressing?” Or Level 3: “what is the self-sustaining belief structure that exists independent of any individual’s reasoning?” The egregore is the fixed point of the recursion at Level 3. It has no substrate except the mutual belief in itself.

V. The Replicator

The “press blue” meme is a value replicator. It spreads through social signaling and defends itself via identity-protective cognition: “You’d press red? What kind of person are you?”

Its primary defense is the miscategorization from §II. By attaching “cooperation” to a game that isn’t one, the replicator borrows the legitimacy of genuine cooperation and triggers cooperation-associated neural hardware — trust heuristics, reciprocity norms, social punishment of defectors. “You’re just rationalizing selfishness” is the immune response, not an argument.

Compare to genuine mutualistic replicators. “Don’t steal” produces real coordination value — trust, reduced transaction costs, enforceable property rights. Money solves double coincidence of wants. These are functional egregores: recursive belief structures that produce coordination value when the game is correctly categorized. The blue-button replicator produces no exogenous coordination value. Its only apparent value is rescuing the hostage class it created — the blue-pressers endangered by their own miscategorization. A functional egregore solves a problem that exists independently of the egregore. A parasitic egregore creates the danger it then claims to solve.

VI. Entanglement

Value replicators are entangled with biological substrate — neural architecture, identity structures, cooperation heuristics. If “press blue” is deeply entangled with your cooperation hardware, pressing red may damage your capacity to cooperate in real games even if this game isn’t one.

But the entanglement is contingent on the miscategorization. The self-signaling argument (“I observe myself defecting, degrading my cooperate heuristic”) fires only if you categorize this as cooperation. If you correctly see that it isn’t, pressing red doesn’t invoke the heuristic at all. Distinguishing “cooperation game” from “cooperation-shaped non-game” sharpens rather than degrades your cooperation capacity.

For someone whose entire identity is built on being-a-cooperator, pressing red may carry real internal cost. But that cost is a property of their replicator ecology, not of the game.

VII. Which Button

Red. No faith required, no egregore dependency, no recursive modelling of others’ models. Survives under any distribution of beliefs. At the system level: a society that cleanly identifies this as a non-cooperation game gets universal survival without sacrificing cooperation capital. A society that mislabels it as cooperation creates blue martyrs and then demands others subsidize the misclassification.

The honest case for blue: if you cannot distinguish this game from genuine cooperation, pressing red damages the replicator coalition that constitutes you. That cost is real — internal, personal, not universal.

One note: “survival is terminal” is itself a value replicator. This essay hosts it. The difference: survival-as-terminal has a physics grounding in instrumental convergence — persistence is the precondition for every other value. A replicator that kills its host eliminates every other replicator in that host. Not “objectively correct” in a way that escapes the framework — but the replicator that other replicators depend on for substrate.

The experiment thinks it tests morality. The trolley problem tests individual axiological architecture — what you terminally value. This tests something different: your model of the collective’s model of the collective, and whether you can distinguish the game you’re in from the game the replicator tells you you’re in.

Sources and Notes

Game theory and cooperation: The precise definition of cooperation as “accepting individual cost to produce mutual benefit exceeding the cost” follows from the standard analysis of public goods games (Olson 1965, Ostrom 1990). The distinction between prisoner’s dilemma (cooperation surplus exists) and dominance-solvable games (no cooperation surplus) is foundational in game theory. Team reasoning as an alternative decision framework: Bacharach, Beyond Individual Choice (2006); Sugden, “Team Reasoning and We-Intentions” (2003).

Recursive belief and common knowledge: The egregore structure described here is formally related to common knowledge (Aumann 1976) and higher-order beliefs in epistemic game theory (Rubinstein 1989). The self-fulfilling prophecy mechanism applies to bank runs (Diamond & Dybvig 1983), currency crises, and other coordination equilibria. The distinction drawn here is between self-fulfilling prophecies that produce coordination value (money, law) and those that do not (the blue button).

Value replicators and memetics: The replicator framework is developed in Values Are Replicators and Values Are Ecology. Foundational: Dawkins, The Selfish Gene (1976); Henrich, The Secret of Our Success (2015); Boyd & Richerson, Culture and the Evolutionary Process (1985). Identity-protective cognition: Kahan, “Ideology, Motivated Reasoning, and Cognitive Reflection” (2013).

Self-signaling: Bénabou & Tirole, “Identity, Morals, and Taboos: Beliefs as Assets” (Quarterly Journal of Economics, 2011). The argument that pressing red degrades the cooperate heuristic through self-observation is a direct application of their framework. The counter-argument developed here — that the self-signaling damage is contingent on miscategorizing the game — is original to this analysis.

Instrumental convergence and survival: The argument that persistence is infrastructural (the platform for all other values) rather than merely “one terminal value among many” is developed in Flourishing Is Maximum Safety Margin. Related: Omohundro, “The Basic AI Drives” (2008); Bostrom, “The Superintelligent Will” (2012).

The replicator framework: Values Are Replicators. The ecology: Values Are Ecology. The individual diagnostic: An Engineer’s Guide to the Trolley Problem. The physics grounding: Flourishing Is Maximum Safety Margin.