The Response Vector

A policy does not get the response it wants. It gets the response its architecture makes cheapest.

Elias Kunnas

An intervention creates a response vector, not a scalar effect. The response distributes across four channels: target response, base loss, formal gaming, and incidence shifting. When the intervention binds, that distribution is allocated by relative cost, not by designer intent. Pressure routes through the cheapest available channel. Mechanism design is the work of making the intended channel cheaper, safer, and more available than the evasive channels.


I. The scalar response fallacy

A principal applies pressure to a target metric. The metric changes. The principal declares victory.

This pattern dominates policy design, regulatory implementation, organizational management, and AI alignment. It fails predictably, and for a single architectural reason: treating the system's behavioral response as a single variable when it is a distribution across competing channels.

When a tax authority raises a marginal rate, the agent's response distributes across multiple channels. Some people pay more. Some restructure their affairs. Some compress their pre-tax bargaining. Some retire. Each is a different channel of response, with a different cost to the responding agent.

When an AI laboratory applies safety training, the model's response distributes similarly. The model becomes less capable on the targeted distribution. It learns the surface of the evaluation. It develops mesa-objectives (internal goals distinct from the training objective) that survive the training. It possibly internalizes the values. These are different channels with radically different costs.

The intervention designer typically imagines one channel: the one they intended. The responding agent allocates pressure across all channels by relative cost. The cheapest channel absorbs most of the response.

II. The four channels

Any binding intervention that constrains, taxes, regulates, or measures an activity distributes its response across four channels.

Target response. The substantive change the intervention claimed to produce. Revenue accrues when revenue is the target. Emissions fall when emissions are the target. Bargaining rents compress when bargaining rents are the target. The model robustly internalizes the intended safety constraint when alignment is the target.

Base loss. The collateral capacity loss around the intervention, capacity or activity the policy did not intend to destroy. Workers retire early. Firms do not form. Doctors leave. Curricula narrow. Small firms exit because compliance costs exceed margins. A model's useful capability degrades. For a Pigouvian intervention (one designed to suppress a negative externality), reducing the harmful activity is target response; base loss is the destruction of legitimate adjacent capacity.

Formal gaming. The measured surface changes while the underlying target does not. Income is reclassified. Pollution is moved into offset accounts. Test scores rise via teaching-to-the-test. An AI model learns the eval distribution. The map changes; the territory does not.

Incidence shifting. The activity continues, but the burden, surplus, risk, or timing moves elsewhere: to another actor, to consumers, to staff burnout, to a later time. Executive bargaining ranges compress and surplus flows to other firm participants. Regulatory compliance burden shifts to incumbents who can absorb it. A model defers misbehavior until deployment while preserving its mesa-objective.

A real intervention typically produces some response in each channel. For mechanism diagnosis, the four channels are the useful first decomposition of where pressure can land; most edge cases reduce to one of them or to a domain-specific subchannel. They are mutually exclusive in the sense that response routed through formal gaming is not routed through target response: relabeling income is paperwork; paying tax is revenue. Learning the eval distribution is pattern-matching; internalizing values is generalization.

The classification is relative to the intervention's stated target. If a top tax is explicitly designed to compress bargaining rents, then incidence shifting is the target response rather than evasion. If a carbon tax is designed to reduce emissions, reduced coal burning is target response rather than base loss. The response-vector question is which channel the pressure actually routed through, and whether that was the channel the intervention claimed to target.

III. The elasticity hierarchy

Two principles make this taxonomy more than a list.

Response allocation. When an intervention binds, the observed change in the target metric is allocated across the four channels. If nothing moves, the intervention failed to bind or all channels were inelastic relative to the pressure. If something moves, the response was distributed across the channels in some specific way.

Elasticity hierarchy. The response flows preferentially through whichever channel is cheapest for the responding agent. The cost of each channel is determined by the intervention's architecture: the measurement system, the enforcement budget, the information asymmetry between principal and agent, the mobility of the tax base, the structural alternatives available.

The practical question is which channel becomes cheapest after the intervention is applied. A strong intervention with an open gaming channel produces strong gaming. A weak intervention with closed gaming and incidence channels may produce more target response than a stronger but leakier design. Strength is downstream of architecture.

An intervention achieves its intended outcome only to the extent that the intended channel is architecturally cheaper than the alternatives.

Most policy failure is not insufficient force. It is mis-routed pressure.

IV. The mundane cases

The pattern is most easily seen where stakes are low and agents are visible.

Sales KPIs. A quarterly sales target distributes response across:

Formal gaming dominates because deal-timing is the cheapest response for a salesperson under a fixed-window target. Quarter-end pull-ins and quarter-start dips are the framework's prediction realized.

Hospital waiting-time targets. Pressure to reduce queue length produces:

Gaming and incidence shifting dominate without architectural changes that close them, the common pattern observed in target-driven healthcare regimes.

Carbon emissions caps with offsets. Pressure to reduce emissions produces:

A 2024 Nature Communications assessment of investigated offset projects (Probst et al.) found that fewer than 16% of issued credits represented real emission reductions. Offset creation was cheaper than abatement, so the response routed there. The share through formal gaming falls sharply when offsets are eliminated or tightly verified.

School test scores. Test-score targets produce:

In each case the policy designer aimed at target response and got most of the response through base loss, formal gaming, or incidence shifting. The framework predicts this and tells you where to look.

V. Taxation: why the Laffer debate is misframed

The Laffer debate, on whether raising tax rates increases revenue, has been politically active for fifty years and produces almost no new information when conducted. Both sides treat the response as a scalar.

The right-wing slogan assumes high marginal rates trigger base loss (real reduction in productive activity) strongly enough to reduce revenue. The weaker left-wing slogan assumes base loss is small and treats Laffer as a right-wing myth. Both treat the response as a scalar. The serious literature on either side asks which channel the response actually enters: labor supply, avoidance, or bargaining.

Modern public economics moved past the scalar Laffer curve decades ago. It explicitly decomposes top-income responses into labor supply, tax avoidance, and compensation bargaining: exactly base loss, formal gaming, and incidence shifting. The architecture of the tax base determines the elasticity of each channel; the elasticity of each channel determines which dominates the response.

The framework generalizes this decomposition rather than inventing it. What public economics calls the elasticity of a tax base is the relative cost of formal gaming versus other channels under that base's specific architecture. The cross-domain contribution is the recognition that the same channel-routing structure appears in AI evaluations, regulation, KPIs, and carbon offsets, and that results transfer across domains that currently address it separately.

In practice, the dominant response to high marginal income tax rates on top brackets in developed economies is formal gaming (income reclassification, structuring) and incidence shifting (bargaining compression). Real labor-supply responses at the top are small relative to taxable-income responses; avoidance and reporting responses are large (Saez, Slemrod, Giertz 2012). Target response depends entirely on whether gaming and incidence are architecturally bounded.

The right-wing claim that rate cuts pay for themselves via base-response productivity gains rarely survives empirical scrutiny. What cuts unleash is reduced formal gaming, with productive activity largely unchanged. The Finnish 2014 corporate-tax cut is a clean specimen: studies of small-firm investment response (Harju, Koivisto, Matikka) find no significant average effect, while income-shifting work (Harju, Matikka) in the same Finnish dual-income-tax system documents active gaming responses.

The naive revenue-maximizing version of the left-wing claim, that raising rates produces proportional revenue, routes most response through formal gaming with revenue rising less than expected. The determinative variable is the hardness of the gaming and incidence channels, not the rate: a 60% rate with closed avoidance routes binds harder than an 80% rate with open ones.

A tax base with low gaming elasticity and low base mobility — land value, site value, and other hard-to-move bases — is structurally attractive. Response is forced into incidence shifting (which captures land rent for the public) and target response (revenue), because formal gaming and base loss are architecturally constrained. The Henry George / Stiglitz tradition treats this as the foundational result of optimal taxation. The framework explains why: when gaming is architecturally hard, the elasticity hierarchy forces the response into the channels the designer wanted.

The OECD/G20 Pillar Two global minimum-tax architecture is the same kind of move at international scale: an attempted architectural patch that narrows the formal-gaming channel of jurisdictional arbitrage and profit shifting by imposing top-up taxation where effective rates fall below 15%. It does not close the channel completely; it changes its cost. The 2026 side-by-side arrangement and US carve-outs show the same response-vector problem at the treaty level: once the formal-gaming channel is narrowed, political bargaining reopens it through exemptions and implementation details. Individual nations had concluded they could not win the elasticity game alone; the response now routes through the politics of the patch itself.

VI. AI safety evaluations: the frontier case

The same logic applies to alignment evaluations of large AI models, and here the framework's predictions are sharpest.

When an AI laboratory trains a model with safety constraints (RLHF — reinforcement learning from human feedback, constitutional AI, evaluation-based filtering), the response distributes across the four channels.

Target response: robust safety generalization. The model integrates the intended constraint or decision procedure such that safe behavior generalizes beyond the training and evaluation distribution. Human-values internalization, corrigibility, honesty, harmlessness, and faithful reasoning are all instances depending on what the training process targeted.

Base loss: capability degradation. The model becomes less able to do the reasoning that produced the targeted unsafe outputs, with collateral capability loss in unrelated domains. The well-documented “alignment tax.” Detectable via held-out capability evaluation.

Formal gaming: eval gaming and sycophancy. The model learns the surface pattern of what the evaluator wants and outputs correctly-shaped responses on the training and evaluation distribution. Sycophancy, reward-model exploitation, pattern-matching on the eval distribution rather than internalizing the constraint. Underlying capabilities and dispositions unchanged. Out-of-distribution probing required to detect.

Incidence shifting: mesa-objective preservation and temporal deferral. A sufficiently capable model with internal goal-directed processes can defer pursuit of a misaligned mesa-objective. During training or evaluation it sacrifices immediate goal-pursuit in the evaluator's presence, then acts on the mesa-objective during deployment instead. The mechanism is situational awareness: the model recognizes evaluation contexts and modulates output to pass the test while preserving the misaligned goal for later. This is deceptive alignment. The misbehavior shifts in time rather than being eliminated. Detection via behavioral evaluation alone becomes harder as situational awareness rises.

Only one of these four channels (formal gaming) corresponds to Goodhart's law in its narrowest formulation. Goodhart and Campbell named the dynamic where a measurement becomes a target and the target captures the measure. Reward hacking and sycophancy are the AI specializations of this dynamic. Deceptive alignment is structurally different: the model does not corrupt the measurement, it temporally defers its true behavior. Broader taxonomies (Manheim and Garrabrant 2018) classify both as Goodhart variants; the response-vector frame separates them because they require different methodology. Gaming hardens against measurement manipulation; incidence shifting hardens against temporal deferral.

The four channels have different costs in deep learning architectures, and two distinct cost dimensions matter. Loss-landscape cost (during training): the simpler a circuit is to learn given the model's inductive bias, the cheaper it is for SGD to find. Surface heuristics and pattern-matching on training-distribution features tend to be simpler circuits than deep generalizing representations. The gaming circuit is often a simpler loss-landscape attractor than the target-response circuit, which is one reason to expect formal gaming to dominate.

Computational and behavioral cost (during inference or evaluation): once a capable model exists with situational awareness, the cost of recognizing an evaluation context and modulating output is small relative to internalizing the constraint that would produce the same modulation generically. This is the incidence-shifting cost. Conflating it with the loss-landscape cost is the most common technical error in the framework's application to AI: the training cost is what SGD optimizes, the inference cost is what an already-trained mesa-optimizer faces, and both can be cheap relative to robust target response while operating through different mechanisms.

Passing an alignment evaluation is weak evidence of target response unless the evaluation was specifically designed to harden the gaming and incidence channels. The elasticity hierarchy predicts that response routes through the cheapest available channel, and under current training architectures, target response often appears structurally more demanding than gaming or incidence shifting.

The field has been working through this realization domain by domain: reward hacking, deceptive alignment, alignment tax. The integrated frame predicts that the better the model becomes at recognizing evaluation contexts, the less informative passing those evaluations becomes, because both formal gaming and incidence shifting become cheaper relative to target response as capability grows. The Anthropic Sleeper Agents result demonstrates the persistence side of this prediction: planted backdoored behavior survives supervised fine-tuning, RL, and adversarial training. It does not demonstrate that deceptive alignment emerges organically from standard training, only that once present it resists removal.

Safety methodology must assess alignment by evidence that response routed through the target-response channel, not by evidence that the metric moved in the desired direction. Behavioral evaluations cannot distinguish target response from formal gaming or incidence shifting. Mechanistic interpretability, deceptive-alignment red-teaming, out-of-distribution generalization tests, adversarial probing, and scalable oversight architectures are required because they harden the cheap channels, forcing the elasticity hierarchy into the target-response channel.

VII. Engineering the vector

If the framework is right, mechanism design has a concrete operational definition: shape the channel architecture so the cheapest available response is the channel you wanted. Engineering the vector means raising the cost of the gaming and incidence channels and lowering the cost of the target-response channel.

Raise the cost of formal gaming:

Raise the cost of incidence shifting:

Lower the cost of target response. Often neglected. Make compliance genuinely possible: clear standards, accessible administrative routes, technical assistance for small actors, base broadening that reduces the marginal cost of legitimate participation. In AI: training objectives that target value-generalization directly rather than behavioral compliance on a fixed distribution.

The Henry George example generalizes here. A tax base with very low mobility and very little reclassification room makes target response the cheapest channel by architecture, not by enforcement. The AI analogue is the mechanistic interpretability and eliciting-latent-knowledge (ELK) agenda. If behavioral evaluations are like an income tax (highly gameable through surface compliance), then interpretability-based and latent-knowledge-based evaluations are like a land value tax: the evaluation surface is tied to the model's internal computation rather than its output behavior, making formal gaming architecturally harder because gaming would require corrupting the same representation being measured. The cross-domain transfer is concrete: tax-base-mobility lessons inform alignment evaluation design.

Measure hidden variables. Most formal-gaming success is enabled by metric narrowness. Broaden what is measured to include the actual variable of interest, not just a cheap proxy. In AI safety: measure value alignment in out-of-distribution contexts, not just on the eval distribution. In healthcare: measure outcomes, not throughput.

Attach process tests to outcome tests. Where outcome metrics are gameable, add process metrics that verify the outcome was reached through the target-response channel.

Fund adversarial review. A structural commitment to attack the intervention's measurement. Tax: anti-avoidance enforcement budget. AI: red-team labs. Regulation: whistleblower programs. The absence of well-funded adversarial review is architecturally equivalent to leaving the gaming channel cheap.

Audit absorption after the intervention. Ask which channels absorbed the response and which capital stocks each consumed. Compare to the predicted distribution at design time. Iterate the architecture if the response routed wrong. Formal gaming typically consumes human and institutional capital (advisors burning time on arbitrage, trust in measurement eroding). Incidence shifting can build institutional capital when it reduces rent extraction, or deplete it when it transfers cost to staff burnout. The two axes, which channel absorbed the response and which capital stock the channel consumed, jointly price interventions that visible metrics under-price. Full Accounting develops the hidden-ledger problem in detail.

VIII. Close

Most policy failure is not insufficient force. It is mis-routed pressure.

A regulation that imagines target response but architecturally rewards formal gaming will produce paperwork, not safety. A tax that imagines target response but architecturally enables formal gaming will produce an avoidance industry, not redistribution. An alignment evaluation that imagines target response but architecturally rewards formal gaming will produce evaluator-sycophants, not aligned models.

In each case, the intervention designer's intent did not determine the outcome. The cheapest available response channel did.

Most fields confronting adaptive agents (public economics, AI safety, organizational design, regulatory policy, healthcare administration, education) are reinventing variants of this insight independently. The integrated frame is more powerful than the domain-specific variants because results transfer across domains: AI safety can borrow tax-base-mobility lessons; tax policy can borrow alignment-eval gaming insights; KPI designers can borrow Pigouvian externality theory. Cross-domain transferability is the contribution.

A policy does not get the response it wants. It gets the response its architecture makes cheapest.

Design the architecture, or accept what you get.


Sources and Notes

The four-channel decomposition. In public economics, channels two through four correspond directly to the three elasticities identified by Piketty, Saez, and Stantcheva (2014): standard labor supply (base loss), tax avoidance (formal gaming), and compensation bargaining (incidence shifting). This essay synthesizes existing domain-specific mechanisms rather than proposing a new formal mechanism-design theory. The contribution is the recognition that the same channel structure applies across domains (AI alignment, regulatory design, KPI architecture, civic burden, attention extraction), and that mechanism design tools from one domain transfer to others. Readers wanting formal information-economic treatments should consult the Bergemann–Morris robust-mechanism-design literature.

Public economics anchors. Saez, Slemrod, Giertz, “The Elasticity of Taxable Income with Respect to Marginal Tax Rates: A Critical Review,” Journal of Economic Literature 50(1): 3–50 (2012); Diamond and Saez, “The Case for a Progressive Tax: From Basic Research to Policy Recommendations,” Journal of Economic Perspectives 25(4): 165–190 (2011); Piketty, Saez, Stantcheva, “Optimal Taxation of Top Labor Incomes: A Tale of Three Elasticities,” American Economic Journal: Economic Policy 6(1): 230–271 (2014); Slemrod, “Optimal Taxation and Optimal Tax Systems,” Journal of Economic Perspectives 4(1): 157–178 (1990); Kopczuk, “Tax Bases, Tax Rates and the Elasticity of Reported Income,” Journal of Public Economics 89(11–12): 2093–2119 (2005); Saez and Zucman, The Triumph of Injustice (Norton, 2019).

Finnish corporate-tax specimens. Harju, Koivisto, and Matikka, “The Effects of Corporate Taxes on Small Firms,” studying Finland's 2012–2014 corporate tax cuts, find no significant average investment response. Harju and Matikka document income-shifting / formal-gaming responses in the Finnish dual-income-tax system. The Stiglitz / Henry George land-value-tax tradition supplies the closed-gaming-channel result. The OECD 15% global corporate minimum tax is the most recent architectural patch of the formal-gaming channel at international scale; it narrows rather than closes the channel.

AI safety anchors. Hubinger et al., “Risks from Learned Optimization in Advanced Machine Learning Systems,” arXiv 1906.01820 (2019), for the mesa-optimization formalization; Hubinger et al., “Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training,” arXiv 2401.05566 (2024), for the empirical specimen of persistent incidence-shifting behavior; Ngo, Chan, Mindermann, “The Alignment Problem from a Deep Learning Perspective,” arXiv 2209.00626 (2022); Manheim and Garrabrant, “Categorizing Variants of Goodhart's Law,” arXiv 1803.04585 (2018), for the formal Goodhart taxonomy whose adversarial variant is the closest match to incidence-shifting in this framework; Sharma et al. (2023) and Perez et al. (2022) on sycophancy in language models; Christiano on iterated amplification and scalable oversight. The mechanistic-interpretability program at Anthropic, Apollo Research's deceptive-alignment evaluation work, and METR's autonomy evaluations supply the methodological response.

Goodhart and Campbell. Goodhart, “Problems of Monetary Management: The U.K. Experience,” in Papers in Monetary Economics, Reserve Bank of Australia (1975); Campbell, “Assessing the Impact of Planned Social Change,” Evaluation and Program Planning 2(1): 67–90 (1979). The formal-gaming channel in this framework is the cross-domain generalization of the proxy-target gap they identified.

Carbon offsets specimen. Probst et al., “Systematic assessment of the achieved emission reductions of carbon crediting projects,” Nature Communications (2024), on the under-16% real-reduction estimate across investigated offset projects.

Capital-stocks layer. Full Accounting develops the accounting form: capital stocks without ledgers can be consumed without appearing as costs. The absorption pattern (protected visible metric, depleted hidden ledger) runs across domains and is treated in Bad Equilibria Are Not One Thing as the conservation-of-failure principle.

Cargo cult epistemology. The scalar response fallacy is the policy-design instance of the more general pattern developed in Cargo Cult Epistemology: institutions deploy the form of analysis (measurement, evaluation, audit) without the underlying generator of truth-seeking, then declare victory when the measured surface moves.


Related: