Causal Scope Laundering

When valid evidence closes the wrong question.

Elias Kunnas

Standard objections addressed in this essay

A treatment-effect study identifies what happens to a defined population under a defined intervention against a defined counterfactual over a defined window. Policy discourse routinely quotes such a study as if it had answered a different question: what system of incentives, expectations, and equilibrium effects should govern the underlying domain. The citation is formally accurate. The scope jump is invisible. The failure is not false citation. It is the citation being deployed as warrant for a question it did not identify — whether the underlying study was well-identified to begin with or not. Citation as a public evidential surface is democratic infrastructure; the failure is its use as a stopping rule for a question whose decisive causal components the cited evidence did not vary.


I. Specimen — the body-worn camera RCTs

A police department deploys body cameras. Officers are randomly assigned to wear them or not. After a year, the data come in. The Washington, DC randomized trial — among the largest in the world, run by the Lab @ DC and published in PNAS — found no statistically significant effect on documented use of force, civilian complaints, policing activity, or judicial outcomes. The Rialto, California pilot reported large reductions in use-of-force incidents and citizen complaints.

Ariel and colleagues’ multisite work in Journal of Experimental Criminology finds use-of-force effects depend on officer activation discretion — when officers control when cameras record, use of force can increase rather than decrease. Lum and colleagues’ systematic review finds no clear or consistent effects on most measured police/citizen behaviors and notes that whether body-worn cameras strengthen accountability systems remains under-addressed.

Rialto’s reported reductions were widely invoked by departments, media, and governments as rationale for rollout; later null or mixed estimates were likewise invoked as evidence against expansive expectations. The same bounded findings carried different closure claims across the discourse.

The public-discourse summary collapses this to three phrases: the evidence shows body cameras work, or the evidence shows body cameras don’t work, or body cameras are evidence-based police accountability. The same body of work is cited by all three. Each citation is formally correct: the studies do report what the speaker says they report.

But the studies estimate the effect of assigning cameras to officers in a given department under that department’s existing activation, review, disclosure, and disciplinary regime over the study period. That is a real causal object. It is not the causal object the policy discourse claims to be settling.

The policy question is about an accountability architecture: when officers are required to record, with what carve-outs and exemptions; who reviews the footage and under what cadence; whether civilians and prosecutors can compel disclosure; whether misconduct revealed by footage is reliably punished; whether the union contract permits review at all; whether officers learn to manage the camera as a shield or a witness over time; whether the department’s promotion incentives reward camera-friendly behavior. A body camera is not a treatment in that sense. It is a sensor inside an accountability system that the trial holds constant.

The treatment-effect literature can be rigorously conducted, internally credible, and accurately quoted, and still be silent on whether the system should be redesigned around cameras. The studies cannot have answered that question. By construction, they vary one input within an existing institutional architecture; they do not vary the architecture.

The political mirror — minimum-wage employment effects

The same shape recurs across the political axis. Card and Krueger’s 1994 New Jersey–Pennsylvania fast-food study, and Jardim et al.’s Seattle minimum-wage phase-in work, estimate short-run employment-margin changes for a defined low-wage workforce in defined geographies during defined windows. The discourse summarizes these as the evidence shows minimum-wage increases don’t cost jobs or the evidence shows they do.

Whichever direction the employment-margin estimate runs, it does not settle the policy. That question routes through monopsony structure in local labor markets, long-run automation incentives, price pass-through, regional substitution, enforcement equilibrium, family-income incidence, and the political dynamics that determine future floor increases. A short-run employment estimate is a real causal object, identified inside an existing labor-market structure. What policy should be is a question about which labor-market structure to build, and the studies can be rigorously conducted while remaining silent on that. Body cameras and minimum wage differ in political valence and share methodological shape; the scope-laundering move appears on both. Whether the laundering benefits the reader’s preferred policy varies; the operator does not.


II. Levels of causal aggregation

A study can identify a bounded component estimand. A policy claim depends on a composite of components plus composition assumptions. The discourse routinely treats one as if it were the other.

A component estimand has the shape: for population P, under intervention I, what is the change in measured outcome Y against counterfactual C over window W? Well-identified study designs answer this. Identification holds the surrounding system constant by construction. Without that constancy, the estimate has no causal interpretation.

A composite policy claim is the actor-action-population-timeframe-threshold combination the decision is really about: should this jurisdiction install this accountability regime, given these alternatives, under these constraints, with these tradeoffs? The composite depends on a bundle of component estimands plus assumptions about interaction, sequencing, scale, adaptation, and institutional capacity. The composite cannot be answered by holding the surrounding system constant. It requires varying the system.

These are different levels of causal aggregation, not different types of question. The gap between them is well charted in the methodological literature: Cartwright and Hardie (Evidence-Based Policy, 2012) on the inference from “it worked there” to “it will work here,” with explicit support-factor identification; Deaton and Cartwright on what trial-sample ATEs do and do not identify; Heckman and Vytlacil on the distance between treatment-effect estimands and structural policy evaluation; Pawson and Tilley’s realist evaluation tradition on context-mechanism-outcome configurations; Pritchett and Sandefur on the development-economics literature where rigorous local evidence routinely fails to transport across contexts.

Public citation drops the qualifiers the methodological literature preserves.

The diagnostic below treats the best case: the cited study is well-identified, the qualifiers exist on paper, and the only failure is that they get dropped between paper and citation. Large parts of the behavioral evidence base do not meet that standard — in the Open Science Collaboration 2015 reproducibility project, 36% of replication studies produced statistically significant results; Camerer and colleagues 2018 replicated 13 of 21 Nature/Science social-science experiments, with replication effect sizes roughly half of originals. Specification searches, researcher degrees of freedom, convenience samples, and qualifiers that exist nominally but cannot survive hostile review are routine. Scope laundering operates at the citation-warrant layer regardless of whether the underlying study is rigorous; the floor case is strictly worse, because the citation now does work for a mechanism-design claim while the underlying estimate cannot bear even its narrow weight.


III. The laundering move

Causal scope laundering runs through five operational steps. Each is invisible alone; the composite converts a valid narrow estimate into authority over a question no one has answered.

  1. Evidence anchor. The speaker cites a real study and reports what it found. (Whether the study itself is well-identified or weakly-identified does not change the rest of the sequence.)
  2. Material scope compression. The qualifiers attached to the original estimate — population, intervention, counterfactual, window, identification strategy — drop out of the sentence. The estimate becomes free-floating. Compression is universal in citation; it becomes material when the omitted qualifier is necessary to prevent the warrant from expanding beyond what the cited evidence identified.
  3. Warrant transfer. The free-floating estimate is offered as an answer to the larger composite policy claim the audience is actually arguing about.
  4. Burden reversal. Anyone who points out that the cited study did not identify the broader composite is framed as anti-evidence, ideological, or unserious. Objecting is now reputationally costly.
  5. Public-institutional closure. The composite policy claim drops out of the live decision process. Closure does not require literal cessation of thought; it requires that the institutional or public surface stop treating the residue as decision-relevant.

Each step is locally deniable. Step 1 is good practice. Step 2 is normal compression. Step 3 is what citation is for. Step 4 enforces evidential discipline. Step 5 is closure under uncertainty. Steps 2 and 3 often fuse in a single public sentence (“the evidence shows X” simultaneously drops scope and answers the larger question). Steps 4 and 5 are conceptually distinct (enforcement vs outcome) but operationally tied. The composite move is what should be diagnosed, not any single step.

The pattern recurs across domains. Charter-school lottery studies estimate effects for applicants admitted to oversubscribed urban schools and are cited as if they settled the architecture of school choice at scale. Project STAR’s Tennessee K–3 RCT estimated within-state class-size effects in a specific era and is cited as if it settled national class-size mandates. Grade-retention studies estimate outcomes for retained students under existing systems and are cited as if they settled whether minimum-competence progression rules should exist. Each is a real, well-identified estimate. Each becomes authority over a question its identification strategy could not have answered.

The logical gap (component cited as warrant for composite) is visible from inside the discourse. The sociological steps — burden reversal, pathologized objection — are claims about how speakers and audiences behave, and would be sharpened by documented institutional chains. That documentation is a separate empirical task; what is shown here is the recurring pattern, not a worked single closure event.


IV. The evidential surface — dual-use infrastructure

The evidential surface is democratic infrastructure before it is a failure mode. Citations are public artifacts. The study exists or it does not. The design can be summarized. The sample can be named. The estimate can be contested by other studies. Evidence review has institutional practices — systematic reviews, replication, meta-analysis, risk-of-bias tools, pre-registration, confidence grading. These are imperfect but they create a shared review surface across factions. A libertarian, a socialist, a police chief, a union lawyer, and a public-choice economist can disagree about body cameras while pointing at the same DC RCT.

The mechanism layer does not have that property by default. Mechanism stories are private goods — analysts, modelers, agency insiders, and policy professionals can produce elaborate causal narratives that ordinary citizens cannot easily audit. Without the evidential surface, debate becomes a competition between insider mechanism stories, and the best-rhetorician or highest-status-expert wins. Show me the study is a democratic improvement over accept my equilibrium model.

So the failure is not the evidential surface. The failure is unscoped citation used as a stopping rule for a question whose decisive causal components the cited evidence did not vary. The same surface that makes debate publicly reviewable becomes corrupt when it claims unearned scope.

The corruption survives because it is the cheapest defensible move in a public-discourse environment. Computing the composite policy claim is expensive — it requires specifying the proposed architecture, modeling incentive responses, naming the hidden ledgers, estimating equilibrium effects, and accepting uncertainty about all of them. Doing so visibly assigns the analyst the costs of the proposed system. The computation creates an obligation.

Citing a study, by contrast, costs nothing. The citation routes the burden of objection to the other side. The other side now has to produce their study, or explain why the cited one shouldn’t count, or — most damaging — propose alternative policy without an equivalently authoritative-sounding evidential anchor. None of those responses scales as well as the citation did.

When a citation is used this way, the cited study need not be wrong, well-identified, or even rigorous; it becomes the evidential surface over an uncomputed composite. In the limit, a citation to a non-replicated, single-site, specification-searched estimate carries the same closure warrant in discourse as a citation to a pre-registered multisite RCT — both surfaces are equally cheap to deploy and equally costly to contest. Carol Weiss called this the symbolic use of research: research deployed to retroactively defend an already-formed decision rather than to inform a new one. Pawson and Tilley’s realist-evaluation tradition was built to discipline the equivalent failure in evaluation practice. The evidential-surface failure is the public-discourse version of the same problem.

This is not Goodhart’s Law. Goodhart says that when a measure becomes a target, agents game the measure until it no longer tracks the underlying thing. In causal scope laundering, no one gamed the study. The cited estimate can be honestly conducted, well-identified, accurately quoted, and entirely on-target for its own design. The failure is at the citation-warrant layer: the estimate’s bounded identification is silently used as an unbounded stopping rule. Donald Campbell’s law of social indicators is structurally adjacent but distinct — Campbell’s mechanism requires gaming, and the laundering mechanism does not.


V. The Causal-Scope Gate

The gate fires only when a citation is being used to close a policy dispute. Routine citations as one input among several need no gate. The burden is the claimant’s, not the reader’s. When closure is being claimed, the person doing the closing is responsible for stating four things; if they cannot, the citation has not closed the dispute, regardless of whether any specific objector could have filled the fields in their place.

When citation is not laundering

The gate is narrower than it might sound. A citation is not laundering when any of the following hold:

Ordinary compressed citation in public discourse — “the evidence shows X” — is laundering only when it functions as closure against an action-relevant composite the cited study did not vary. Compression is universal; closure-by-compression is what the gate names.

The gate binds the closure level — and the gate-invoker

A symmetric obstruction risk runs in the opposite direction. The closure speaker can be tempted to use a narrow study as warrant for a broader composite (the laundering move). The objector can be tempted to upscale the live decision into a broader composite that no available evidence could close, then invalidate the citation for failing to answer the enlarged question. Both are the same move at different layers.

The gate therefore binds both sides at the level of the actual decision. Before invoking the gate, the objector must state whether the cited evidence is being used to close a premise, an implementation decision, a procurement decision, or a full institutional architecture — and whether the closure claim being challenged actually operates at the enlarged level the objector is naming. If the cited study fits the narrower decision being made, the gate does not fire even if a broader architecture remains unspecified.

Gate in use — body cameras

Applied to a citation of the DC RCT closing the claim that body cameras are not effective police accountability architecture:

The gate does not say the DC RCT is wrong. It says the cited estimate cannot have answered the closure claim, because the variables the closure claim depends on are precisely the ones the identification strategy held constant. Labeling each bullet’s input source — study-recoverable, design-inferred, policy-context, mechanism-analysis — keeps the gate transparent about what the citation itself supplies versus what the gate-application requires from outside the cited study.

The Mechanism-Claim Gate — symmetric in burden, different in fields

A common counter-claim is: cameras will be useless because officers will learn to game activation and disclosure timing. That is a mechanism-design hypothesis, not a citation. The same anti-closure principle applies, but the field structure differs — a citation gate maps cited evidence to a closure claim; a mechanism-claim gate bounds an asserted hypothesis. Both are denied unbounded warrant; the symmetry is normative, not structural.

The mechanism-claim gate asks:

The counter-claim survives as a stated hypothesis with bounded warrant. It can inform the policy debate; it does not close it. A mechanism-design objection used as a stopping rule against a closure claim must pass this gate; otherwise it is itself laundering at a different layer.

The gate is not a veto

Three closures need to be distinguished. Evidentiary closure — the claim that this dispute over what the evidence shows is settled. Deliberative closure — the claim that the reasoned case for an action is complete. Procedural closure — a vote, a ruling, a budget deadline, a statutory clock. The gate disciplines the first two. Procedural closure can legitimately occur under uncertainty even when scope statements are not produced; legitimacy in such cases flows from procedure, not from evidence.

A causal-scope objection defeats evidentiary or deliberative closure, not action. The gate says a citation has not settled the question; it does not say the policy must wait until every mechanism residue is computed. Anyone using the gate to delay must state the default action they are choosing, the costs of delay, the evidentiary threshold that would change their mind, and whether their mechanism objection is itself identified, bounded, or speculative. When action proceeds under uncertainty, someone must own the residue: a named owner who tracks whether the unresolved mechanism residues materialize and reopens the decision if they do. See Corrective Closure Ownership for the architecture of that role. Without a residue-owner, the gate-invocation collapses into the obstructionist move the diagnostic names elsewhere.

The 3-line working version

Under time pressure, the gate compresses to three questions anyone can ask the claimant:

  1. What exactly did the cited study estimate?
  2. What exact policy action is the citation being used to close?
  3. Which action-relevant mechanisms or contexts were not varied by the study?

A citation that closes a policy dispute should let the claimant answer all three before the closure stands. A mechanism-claim that closes the same dispute should let its claimant answer the parallel three (direction / falsifier / evidence range) before its closure stands.


VI. Close

A study cannot close a question it did not identify. The evidential surface is democratic infrastructure; the failure is its use as a stopping rule for a question whose decisive causal components the cited evidence did not vary. The Causal-Scope Gate disciplines evidentiary and deliberative closure, not procedural action — a residue stopping rule, a stated decision-under-uncertainty default, and a named owner for unresolved residues keep the discipline from collapsing into the obstructionist move it diagnoses.


Related:


Sources and notes

Central interlocutor.

Causal scope, external validity, transportability.

Mechanism vs treatment-effect.

Replication crisis (the floor case).

Adjacent failure modes (distinguished from).

Empirical anchors for §I.

Stack placement. In Stack terms, this is a Layer 1 / 4 / 6 / 7 failure: mechanism, measurement, compilation, and computation collapse into a citation surface.

On the status caveat. The status note appearing immediately after the thesis is load-bearing for this essay’s epistemic posture, not a polite hedge. The diagnostic and the gate are derived from methodological literature plus type-level discourse patterns; they have not been tested against a documented institutional chain in any single policy domain. The proper next step for anyone using the diagnostic is to find a case, run the gate against it, and see what the gate-application reveals about the candidate operator’s empirical density.