Reward Substrate

What a society rewards becomes what it can think.

Elias Kunnas

A reward substrate is the reproduction environment — selection gradient, carriers, justification layer, hardening device, failed counterpressure, and persistence past credible correction — that makes a frame pay, makes its carriers persist, and makes its justification feel like common sense. It needs no coordinator. The output is not what people happen to believe but what becomes institutionally rewarded, transmissible, and repeatable to believe.


I. Lead specimen: RCT dominance in development economics

Between roughly 1995 and 2020, development economics converged on randomised controlled trials (RCTs) as the prestige-bearing tool for evaluating anti-poverty interventions. The transition has a name (the credibility revolution), a hub (the Abdul Latif Jameel Poverty Action Lab, J-PAL, founded at MIT in 2003), Nobel recognition (Banerjee, Duflo, Kremer in 2019), institutional infrastructure (the American Economic Association RCT Registry from 2013, the American Economic Journal: Applied Economics, NBER development-economics venues and affiliated networks), and a funding architecture in which many major funders — Gates, USAID, DFID/FCDO, the World Bank's evaluation arm — increasingly required “evidence-based” framing in proposals.

The pro-RCT case is strong on its own terms: internal validity, transparent identification, policy tractability, and accumulation through registries and meta-analysis. The field also accumulated, in the same period, sustained internal critique: external-validity problems (Pritchett and Sandefur 2013; Deaton's 2010 JEL piece), within-RCT replication concerns, and the displacement of structural and historical inquiry. The critique was made by senior, well-credentialled figures using the field's own standards. It did not shrink the gradient. The response was more RCTs, meta-RCTs, and methodological refinements; the correction stayed inside the RCT carrier channel.

The relevant question is not whether RCTs are good or bad as a tool. The relevant question is how a method becomes the institutional answer to a field's central question — what works in development — while the same field documents that the method does not in fact answer the question well in many contexts, and the documentation does not change the equilibrium. The mechanism that produces this pattern is what this essay names.

II. The six-part test

A phenomenon is a stable reward substrate equilibrium when all six of the following are present:

#ComponentQuestion
1Selection gradientWhat behaviour or frame pays?
2Carrier reproductionWho gets funded, promoted, cited, hired, retained?
3Justification layerWhat story makes the rewarded behaviour feel rigorous, moral, or professional?
4Hardening deviceWhat stabilises the frame across cohorts — metric, credential, journal, dashboard, legal category, routine?
5Counterpressure failureWhat should have stopped it, and why was that counter too weak?
6Persistence past correctionDoes the frame keep reproducing after evidence, failure, or contradiction should have weakened it?

Component six is the falsifiability anchor. Without it, the diagnostic accepts nearly any durable institutional pattern: mathematical theorem acceptance, engineering tolerance standards, even bell-bottoms briefly. Component six excludes those. Mathematical theorems do not satisfy six because the proof system corrects errors before they harden. Tolerance standards do not satisfy six because physical failure forces revision. Bell-bottoms do not satisfy four or six.

Applied to the RCT case: the gradient — top-journal preference for clean causal identification — was visible across applied microeconomics from the early 1990s through the natural-experiments and instrumental-variables work of Angrist, Imbens, Card, and Krueger, well before the field-wide convergence on RCTs in development followed in the late 1990s and 2000s. The carriers reproduced through dedicated PhD pipelines and lab networks once J-PAL and parallel labs existed. The justification (credibility, rigor, what works) is internally coherent and partly true. The hardening devices (registries, journals, the Nobel) accumulated. The counter (external-validity critique by sympathetic insiders) had standing but no carrier infrastructure to compete on hiring or funding. Persistence past correction is documented: the response to external-validity critique was more RCTs, not a return to theory-driven inference. The point of the test is intervention: it identifies where reproduction must be changed if the equilibrium is to shift.

The ex-ante rule. The selection gradient must be identifiable before the convergence is explained, not reconstructed from it. For the RCT case, the gradient was visible in top-journal acceptances of natural-experiment work in the early 1990s; the specialisation into development-economics RCTs followed. If the only evidence for a gradient is the convergence itself, the diagnosis is circular and fails. This is the same discipline as the ex ante test in Stand Alone Complex.

III. Frame selection: why this frame won

Frames do not win merely because they are useful to one class of actor. They win when they enter a coalitional hardening cascade: enough decisive strata can host the frame without role-conflict, and the available institutions can turn it into durable routines. A rival may be truer, older, or more elegant, yet lose if it cannot become a funding criterion, legal category, credential, metric, hiring filter, compliance demand, or professional common sense.

The first selection is hostability. The second is hardening. The third is carrier reproduction: once the frame staffs offices, journals, foundations, departments, courts, and media desks, it selects for people who can speak it fluently and against those whose rival frame imposes blame, complexity, or career risk. The substrate is not a list of incentives. It is a reproductive contest among frames, decided by the frame that can be simultaneously moralised, administered, defended, and carried.

In this essay's ordering, hardening-device fit matters most, then coalitional dominance, then early carrier accumulation. Cognitive parsimony, defensibility, and affective recruitment are accelerants, rarely decisive on their own.

IV. Role-compatible cognition

A reward substrate does not require liars. It works better when most participants can tell the truth as their role permits them to see it. The funder sees seriousness. The credentialled expert sees method. The administrator sees defensibility. The marginal participant sees the only ladder left standing. Each stratum produces a locally plausible account of what it is doing within its role constraints, and those accounts add up to a coherent vocabulary across the field.

This helps explain why exposure-only strategies often fail unless they also change reproduction incentives. The participants are not, in the main, hypocritical. Their cognition has adapted to the reward field, and the rewarded behaviour is experienced as coherent, virtuous, and professional. Cynical critique reads as attacking sincerity, which it usually isn't, and bounces off.

V. Relation to adjacent primitives

Stand Alone Complex (SAC) names the behavioural surface: independent actors converging on coordinated-looking behaviour without a coordinator. Reward Substrate names the upstream cause: the gradient that made the convergence individually rational. SAC tells you the convergence is not commanded. Reward Substrate tells you why it pays.

The Egregore's Button names the cognitive equilibrium downstream — the recursive belief structure that becomes self-sustaining once mutual modelling stabilises. An egregore is the cognitive equilibrium; a reward substrate is its energy source.

Mechanism Realism treats outcomes as caused by mechanisms rather than by intentions, values, or rhetoric. In social and institutional reality, mechanism realism specialises into the study of reward-coupled mechanisms that reproduce their own carriers and justifications unless counter-mechanisms interrupt them. This essay is one operational sharpening of that claim, scoped to institutional reality.

VI. Hard-case pointer

The hardest case is universal-franchise democracy itself: the legitimacy logic of one-person-one-vote shapes which frames about human variance can be hosted, defended, and administered. That larger case belongs outside this diagnostic. The 1966–1976 convergence is catalogued in The Tyranny of the Present; a full architectural treatment is deferred to a separate essay.

VII. Elite and non-elite channels

Reward Substrate is usually told as an elite-incentive theory: funders, promoters, citers, hiring committees, board members. That account is incomplete. Non-elite adoption of substrate-selected frames usually routes through promise rather than payoff: dignity, inclusion, mobility, safety, belonging, moral status. The frame offers an affective or status promise that makes the participant's own failure under it survivable. The non-elite reward is identity, dignity, and survivable failure rather than direct material payoff. It is a parallel reward gradient operating at the hope/identity layer, distinct from the material reward gradient operating on elites. Berlant's cruel-optimism analysis is closest in the existing literature.

For the RCT case, the elite gradient is direct (publishability, fundability, hireability). The non-elite gradient is faint but present: development professionals at lower career ranks adopt the vocabulary because it organises their work into a coherent professional identity (“evidence-based practitioner”) even when their own access to RCT-generating capacity is limited.

VIII. What the diagnosis is for

The framework is a design checklist for institutions that create rewards. Before building a grant program, credential, metric, dashboard, review process, or legal category, ask the six questions about what you are creating: what does this reward, who reproduces it, what story will make it feel proper, what hardens it across cohorts, what live counterpressure remains, and would the frame persist if its core empirical assumption turned out false? An institution that cannot answer the last question honestly is committing future capacity to a frame it cannot revise.

The framework also explains why standard repairs fail.

The repairs that work all operate on reproduction: alter what gets funded, retire the hardening category, train and credential a competing carrier class, build a counter-institution, or install institutions that continuously generate binding counterpressure the substrate cannot absorb.

Self-application. This essay is itself hostable by a reward substrate — the heterodox-commentary economy that monetises institutional critique through paid subscriptions, podcast networks, and citation circuits among adjacent writers, with hardening devices like named primitives and the format of “diagnostic essay” itself. Role-compatible cognition will make its carriers, including this one, experience the work as rigorous diagnosis.

Does that invalidate the framework? Only if it cannot identify cases where institutions select for truth, sincerity, and competence under reward pressure. Those cases exist: functional peer review in mature subfields with strong replication norms, well-staffed audit institutions with independence guarantees, calibrated forecasting communities with public track records. The protection is the ex-ante rule: identify the gradient before the convergence. If “Reward Substrate” becomes a thought-terminating move (“that's just substrate, ignore it”), it has failed its own discipline.

IX. Counter-mechanisms

Reward substrates are interrupted when reproduction changes.

The strongest historical track record belongs to mechanisms that touch reproduction directly. Funding-architecture shifts can starve old carriers, as Soviet patronage withdrawal eventually did to Lysenkoism after Khrushchev's fall in 1964. Legal disestablishment retires a hardening category, as the American Psychiatric Association (APA) began doing in 1973 when it removed homosexuality from the DSM, though replacement categories (“sexual orientation disturbance,” “ego-dystonic homosexuality”) persisted into subsequent revisions. Carrier replacement works when a new technical class displaces the old, as the postwar generation of geophysicists, armed with seafloor and palaeomagnetic data, displaced fixist geology between roughly 1962 and 1968 (the history is in Oreskes's Plate Tectonics: An Insider's History).

A mixed-track-record category requires pre-positioned alternative carriers: falsification events (Michelson-Morley mattered because relativity was ready to host the result), external shocks (1970s stagflation broke Keynesian fine-tuning because monetarist and supply-side carriers existed to consolidate the new regime), status inversion (the post-WWII collapse of formal scientific racism needed credentialled alternatives to host the new orthodoxy).

A weak-on-their-own category includes counter-identity stacks and exit. Solidarity in Poland worked because Catholic-labor-national institutional carriers already existed, not because moral conviction was sufficient by itself.

The meta-category is institutions that continuously generate binding counterpressure: independent central banks, audit courts, fiscal scoring offices, adverse-event reporting systems. These work when their outputs actually bind decisions. The Finnish mekanismivirasto concept inhabits this category.

X. Close

What a society rewards becomes what it can think.

Reward gradients select frames. Carriers reproduce them. Justifications make them feel rigorous, moral, and professional. Hardening devices stabilise them across cohorts. Counterpressures fail or are absorbed. Frames persist past the evidence that should have weakened them. The substrate is the reproduction environment, not the participants' sincerity, intelligence, or character. Reform begins where reproduction happens — at funding, carriers, categories, rival institutions, and recurring pressure-generation infrastructure — not at the rewarded behaviour the substrate keeps producing.


Sources and notes

Material relations and dominant ideology. Karl Marx and Friedrich Engels, The German Ideology (1846, published 1932). The base/superstructure formulation — ruling ideas as ideas of the ruling class — is the earliest systematic statement that material reproduction shapes which beliefs become socially dominant. Reward Substrate generalises beyond class-reductive framing: the carriers can be any stratum, not only an owning class, and the diagnostic is operational rather than dialectical.

Power and regimes of truth. Michel Foucault, Discipline and Punish (Pantheon, 1977) and The Archaeology of Knowledge (Pantheon, 1972). Foucault's account of how power produces professional categories, disciplinary knowledge, and the subject positions that inhabit them is the closest prior art for “role-compatible cognition”: the actor's mind adapts to the institutional reward field and experiences the rewarded behaviour as coherent and virtuous.

Institutional isomorphism. Paul DiMaggio and Walter Powell, “The Iron Cage Revisited: Institutional Isomorphism and Collective Rationality in Organizational Fields,” American Sociological Review 48:2 (1983), 147–160, JSTOR 2095101. Coercive, mimetic, and normative isomorphism as convergence mechanisms across organisations. Reward Substrate is adjacent: it adds the cognitive/justificatory loop, the six-component diagnostic, and the persistence-past-correction criterion.

Institutions as rules of the game. Douglass C. North, Institutions, Institutional Change and Economic Performance (Cambridge University Press, 1990). Path-dependent lock-in; institutions as the structures that make some behaviours pay and others cost. North's framework owns much of the “rules shape rewards, rewards shape durable behaviour” terrain Reward Substrate operationalises at finer grain.

Measure becomes target. Charles Goodhart, “Problems of Monetary Management: The U.K. Experience,” Papers in Monetary Economics, Reserve Bank of Australia, 1975. Donald T. Campbell, “Assessing the Impact of Planned Social Change,” Evaluation and Program Planning 2:1 (1979). Metrics as hardening devices that drift from the territory they were intended to measure. One subcase of component four in the diagnostic.

Field, capital, habitus. Pierre Bourdieu, Distinction (Harvard University Press, 1984). Bourdieu's apparatus — competing fields with distinct capital types and dispositional habitus — is structurally similar to the strata-and-carriers analysis here. Reward Substrate is less theoretically dense and more operationally compact.

Reward systems in science. Robert K. Merton, The Sociology of Science (University of Chicago Press, 1973). Merton's analysis of citation, priority, and recognition as the reward currency of scientific work is the most direct prior art for the RCT specimen in this essay.

Preference falsification. Timur Kuran, Private Truths, Public Lies (Harvard University Press, 1995). When public expression diverges from private belief under social-cost gradients, the gap can stabilise as a reward equilibrium of its own. Companion mechanism: Reward Substrate does not require hidden dissent (participants may sincerely adopt the rewarded frame), but preference falsification often co-occurs in the same substrate.

Cruel optimism. Lauren Berlant, Cruel Optimism (Duke University Press, 2011). Attachment to objects that obstruct the flourishing they promise. The closest prior literature for the non-elite channel in section VII: promise (dignity, belonging, mobility, safety) as a parallel reward gradient.

Cultural evolution and prestige-biased transmission. Joseph Henrich, The WEIRDest People in the World (Farrar, Straus and Giroux, 2020). Cultural variants are selected and reproduced through institutions and learning biases. Reward Substrate is institutionally-scoped: it focuses on modern bureaucratic, academic, and media reward channels rather than long-run cultural group selection.

Mesa-optimization (AI alignment parallel). Hubinger et al., “Risks from Learned Optimization in Advanced Machine Learning Systems,” arXiv:1906.01820 (2019). An outer optimisation process produces inner optimisers whose effective goals diverge from the outer process's intent. The institutional version — outer reward gradient producing inner optimisers (actors, departments, carriers) whose effective goals diverge from any stated mission — is structurally analogous.

RCT specimen sources. Abhijit Banerjee, Esther Duflo, Rachel Glennerster, and Cynthia Kinnan, “The Miracle of Microfinance? Evidence from a Randomized Evaluation,” American Economic Journal: Applied Economics 7:1 (2015), DOI 10.1257/app.20130533 — representative RCT-era development paper. Angus Deaton, “Instruments, Randomization, and Learning about Development,” Journal of Economic Literature 48:2 (2010), 424–455, DOI 10.1257/jel.48.2.424 — internal critique of RCT methodology. Lant Pritchett and Justin Sandefur, “Context Matters for Size: Why External Validity Claims and Development Practice Do Not Mix,” Center for Global Development Working Paper 336 (2013), CGD publication page — external-validity critique.

Pre-RCT credibility-revolution work. Joshua D. Angrist, Guido W. Imbens, and Donald B. Rubin, “Identification of Causal Effects Using Instrumental Variables,” Journal of the American Statistical Association 91:434 (1996). David Card and Alan B. Krueger, “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania,” American Economic Review 84:4 (1994). Cited in the body to establish the ex-ante selection gradient: top-journal preference for clean causal identification was visible across applied microeconomics in the early 1990s, before the development-economics convergence on RCTs followed.

Carrier-replacement history for the 1960s geophysics revolution. Naomi Oreskes, ed., Plate Tectonics: An Insider's History of the Modern Theory of the Earth (Westview Press, 2003). Source for the postwar geophysicists / fixist geology displacement cited in section IX.

Verdict on the coining. Reward Substrate is not a substantively new theoretical mechanism. The component pieces appear in Marx (base/superstructure), Foucault (regimes of truth), DiMaggio & Powell (institutional isomorphism), North (institutional lock-in), Goodhart/Campbell (measure-becomes-target), Bourdieu (field/capital/habitus), Merton (reward systems), Kuran (preference falsification), Berlant (cruel optimism), Henrich (cultural evolution), and Hubinger et al. (mesa-optimization). The contribution is the integrated six-component diagnostic, the persistence-past-correction criterion as a falsifiability anchor, and the ex-ante rule against retrospective pattern-ownership. Coined for diagnostic usability and for cross-essay coherence within this corpus, not for academic novelty.


Related: