Your KPIs are creating the behaviour you're trying to measure.

The Founder Test: Has this KPI forced a decision in the last 30 days that wouldn't have happened otherwise? If not, it's decoration, not infrastructure.

Section 1: The Institutional Blind Spot

Most CEOs operate on a simple equation: measurement equals management. You set a KPI, the team aligns to it, performance improves. The metric is assumed to be neutral, a window into reality. This is the institutional blind spot.

Metrics aren't windows. They're levers. The moment you publish a KPI, you've stopped measuring organic behaviour and started manufacturing optimised behaviour. The question isn't whether your team will game the metric. The question is how long until they discover the optimal gaming strategy, and whether you'll notice before it becomes systemic.

Wells Fargo discovered this the expensive way. Between 2011 and 2015, employees created approximately 3.5 million unauthorised accounts[1]. The metric was simple: eight products per customer, internally branded as "Eight is Great". The assumption was equally simple: more products per customer equals deeper engagement equals revenue growth.

The metric didn't measure engagement. It created a fabrication engine. Branch staff opened accounts without customer knowledge, transferred funds between accounts to generate activity, and issued credit cards that were never requested. The target wasn't just missed, it was systematically subverted. Wells Fargo ultimately paid approximately $3 billion in fines and settlements[2], but the deeper cost was the 14-year erosion of the assumption that drove the entire consumer banking strategy.

The lesson isn't that Wells Fargo hired bad actors. The lesson is that a clear, trackable, incentive-aligned metric produced catastrophic outcomes because the institutional assumption was that metrics measure rather than manufacture.

Activation filter: For every KPI you currently track, name the owner, the action it should trigger, and the date by which inaction becomes negligence. If you can't complete this sentence, the metric is noise.

Section 2: UK Evidence - The Gaming Patterns

The UK has its own precedent library. When the Department for Education published the first school league tables in 1992, the explicit goal was to improve educational outcomes through transparency. The metric was straightforward: percentage of students achieving five or more GCSEs at grades A*-C.

Schools optimised. Within 18 months, measurable distortions appeared[3]: students were entered for easier subjects rather than rigorous ones, borderline students (those likely to achieve grade C with intervention) received disproportionate resource allocation whilst high performers and struggling students were neglected, and schools began counselling low-performing students to leave before taking exams to protect the headline percentage.

The metric didn't improve education. It reoriented institutional behaviour toward the metric itself. Teachers stopped asking "what does this student need?" and started asking "will this student move the percentage?"

Composite case (three UK SMEs, £8-12m revenue, Q2 2018 to Q1 2020, details anonymised):
A professional services firm set a utilisation target of 75% billable hours for consultants. The gaming mechanism was time manipulation: within six months, consultants were logging internal meetings as "client development", stretching 4-hour engagements across 2-day site visits to inflate hours, and declining complex, lower-margin projects that required more internal coordination. Revenue grew 11%, but client satisfaction scores dropped 19 points and two major accounts didn't renew. The CEO only discovered the pattern when a departing consultant admitted in their exit interview that the team had an informal Slack channel dedicated to "making the numbers work".
A manufacturing business introduced a defect rate KPI tied to production line bonuses. The target was <2% defects per batch. The gaming mechanism was classification manipulation: within three months, line supervisors were reclassifying defects as "cosmetic" rather than "functional", holding back problematic batches for rework after the inspection window, and in one case, a supervisor was discovered physically removing defect tags before audits. The CEO only identified the pattern when a major customer threatened to pull a £1.2m contract over quality issues that weren't appearing in internal reports.

A SaaS scale-up tied sales bonuses to contract value, not renewal probability. The gaming mechanism was contract structuring: the sales team optimised by offering aggressive discounts on annual contracts to enterprise clients, knowing that the contracts included complex integrations the product team couldn't reliably deliver within the promised timeline. First-year ARR grew 34%, but renewal rates in year two dropped to 52% as clients churned due to unfulfilled technical commitments. The metric produced short-term growth that became a 24-month revenue cliff.

The pattern across all three: the metric was technically accurate (hours were logged, defects were recorded, contracts were signed), but the behavioural response optimised the number whilst destroying the strategic intent behind it.

Goodhart Tripwires: Five Gaming Patterns to Watch

If you see these signals, your metrics are being gamed:

1. Threshold hugging: Performance clusters tightly around the minimum target (e.g., 80% of team hits exactly 100-102% of quota). Natural performance distributes normally; gaming concentrates at thresholds.

2. Early greens: Metrics report success in months 1-3, then plateau or decline. Gaming is easiest early when scrutiny is low and baseline expectations unclear.

3. Language tells: Team begins using phrases like "technically we hit it" or "the number looks good but..." Linguistic hedging precedes metric manipulation.

4. Counter-metric collapse: When one metric improves, logically connected metrics deteriorate (sales up, NPS down). Gaming optimises the measured whilst destroying the unmeasured.

5. Informal shadow metrics: Team creates unofficial tracking that contradicts official KPIs. When your sales team tracks "real pipeline" separate from CRM pipeline, they're signalling the official metric is compromised.

Note #11 (Operational Scar Tissue): In 2019, a £10m professional services firm (circa 85 staff) implemented a utilisation percentage KPI without chaperone metrics. Within 11 months: consultant billable hours increased 9%, client Net Promoter Score dropped 19 points, and gross margin fell approximately £600k due to project overruns and rework not captured in utilisation tracking. The metric optimised time logging whilst destroying client value and profit quality.

Section 3: The Unpopular Truth - Metrics Introduce Fragility

Here's what your board won't want to hear: the more precisely you measure something, the more fragile your operation becomes.

Nassim Taleb's fragility test is simple. A fragile system breaks under stress. An antifragile system improves under stress. Metrics create fragility because they concentrate risk at thresholds.

Consider the standard sales commission structure: no bonus below 80% of target, accelerating commission above 100%. The threshold creates a fragility point. A rep at 78% of target in December has two options: abandon strategic deals that won't close before year-end and chase transactional wins to cross the threshold, or give up entirely because the gap is unbridgeable. The metric doesn't just measure sales performance, it manufactures binary decision-making.

The same threshold fragility appears in operational KPIs. A logistics business sets a 98% on-time delivery target. Drivers start refusing loads that might jeopardise the metric, even if the customer would accept a slight delay for a consolidated shipment. The metric optimises for the number, not for customer value or operational efficiency.

Threshold heat-check: For each of your top 5 KPIs, calculate what percentage of team pay or status depends on hitting that single metric. If any metric controls >30% of compensation or promotion decisions, you've created a fragility point where gaming becomes rational.

This is the unpopular truth: KPIs don't reveal what's working. They reveal what your team has learned to game, and where your operation will fracture when the gaming strategy stops working.

Most CEOs respond to this by adding more metrics to create checks and balances. This accelerates fragility because now your team is optimising across multiple thresholds simultaneously, which doesn't reduce gaming, it professionalises it.

Section 4: Five Diagnostic Questions (Not an Implementation Checklist)

You can't eliminate gaming. You can design for Goodhart resistance by making gaming visible rather than hidden. Five diagnostic questions to stress-test your current metrics:

1. Outcome or Activity? Are you measuring the strategic outcome you want, or the activity you assume produces it? "Revenue growth" is an outcome. "Pipeline coverage ratio" is an activity. Activities can be gamed without moving the outcome. If your KPI is an activity metric, what's the direct falsifiable link to the outcome?

2. What's the Gaming Handbook? If you publicly announced the exact formula for this KPI and the incentive structure six months in advance, what would your team do by month three? If you can't articulate the gaming strategy, you haven't stress-tested the metric.

  • Sales revenue ↔ Net Revenue Retention: Revenue can be gamed through discounting or contract stuffing; NRR reveals if revenue was real or borrowed from future quarters
  • Operations efficiency ↔ Utilisation percentage: Efficiency can be gamed by declining difficult work; utilisation reveals if efficiency came from selectivity rather than capability
  • Customer acquisition ↔ Customer Acquisition Cost: Acquisition can be gamed through aggressive spend; CAC reveals if growth was bought or earned
  • Product velocity ↔ Technical debt ratio: Velocity can be gamed by shipping fast without testing; debt ratio reveals if speed came from discipline or shortcuts
  • Employee headcount ↔ Revenue per employee: Headcount can be gamed through hiring sprees; RPE reveals if scale was productive or dilutive
  • Chaperone Pairs (use to diagnose gaming before it compounds):

If your primary metric moves positively whilst its chaperone deteriorates, you're measuring gaming, not performance.

3. Where's the Threshold Concentration? What percentage of team compensation or status depends on hitting this single number? If it's above 30%, you've created a fragility point. When the threshold is hit, optimisation stops. When it's missed, effort collapses. Diversify threshold risk.

4. Who Owns the Counter-Narrative? When this metric reports success, who in the organisation is incentivised to argue it's misleading? If nobody, you've created an echo chamber. Effective KPI design includes a mandatory dissenting view, someone whose role is to explain why the number might be lying.

5. What's the Rotation Cadence? How long has this exact metric been in place? If it's been unchanged for more than 18 months, assume it's being gamed and you haven't noticed yet. Metrics need rotation to prevent institutionalised gaming. Not constant change, but periodic recalibration that forces fresh optimisation strategies.

Section 5: Decision Stack - When to Redesign

If you observe three or more consecutive quarters where KPI performance improves but board confidence in strategic direction declines, interpret this as metric-reality divergence. The numbers are moving up whilst underlying strategic health is deteriorating.

Consider two paths:

Option A: Maintain current KPIs and investigate execution quality. The assumption is that the metrics are correct and the team isn't delivering the expected strategic outcomes. Risk: you spend 6-12 months diagnosing execution problems that don't exist whilst the real issue (metric misalignment) compounds.

Option B: Pause KPI-driven decisions for one quarter and conduct a strategic health audit independent of current metrics. Measure customer retention, employee confidence in strategy, product-market fit signals, and competitive position deterioration. Cost: one quarter of potential KPI-driven decisions deferred. Benefit: if the metrics are misleading, you identify it before it becomes structural.

Decision Clarifier:

If Then Suspend KPI-led decisions
KPI trend ↑ while board confidence ↓ for 3+ quarters Metric-reality divergence detected Yes - audit strategic health independent of current metrics
KPI trend ↑ and board confidence stable/↑ Metrics aligned with reality No - continue KPI-driven decisions
Decision Stack 1 falsifier: If you implement a KPI redesign in Q2 2025 and the board confidence-performance divergence pattern doesn't reappear by Q4 2026, the original metrics were manufacturing false signals.

Section 6: Cultural Implications - The Authority Problem

The hardest part of KPI redesign isn't technical. It's authority. Metrics create organisational certainty. Your FD can present a dashboard, your ops director can cite utilisation rates, your sales leader can point to pipeline coverage. The numbers feel objective.

Changing the metrics admits the old ones were wrong, which means every decision justified by those metrics is now suspect. This is why KPI redesign faces institutional resistance even when the CEO knows the current metrics are failing.

The cultural question is: who has the authority to declare a metric dead? In most £5-20m businesses, the answer is nobody. The CEO can question it, the board can debate it, but the institutional momentum behind an established metric is significant. Teams have built workflows around it, compensation structures depend on it, and credibility has been staked on improvement.

Metric Sunsetting (Governance Protocol)

Effective metric design requires pre-authorising metric death. When you introduce a KPI, simultaneously specify the conditions under which it will be retired:

Renew (extend for 12-18 months): Metric still measures strategic intent; gaming hasn't compromised signal quality; chaperone metrics confirm validity.

Revise (modify formula/threshold): Strategic intent unchanged but measurement approach needs recalibration; gaming patterns identified and formula adjusted to close loopholes.

Retire (sunset immediately): Metric no longer aligned with strategic priorities; gaming has compromised beyond repair; or strategic pivot makes metric irrelevant.

Review cadence: Every 18 months for operational KPIs, every 24 months for strategic KPIs. This removes the authority problem by making metric mortality an expected outcome, not an admission of failure.

Five-minute board ritual (truth-check every quarter): When reviewing each KPI, ask: "Who in this organisation is paid or promoted based on arguing this number is misleading?" If the answer is "nobody", you've created an institutional incentive to accept the metric at face value. Assign an explicit counter-narrative owner for each KPI whose role is to present the case for why the reported success might be masking strategic failure.

Counter-Case: When Metric Consistency Wins

The strongest argument against metric rotation is Amazon. Jeff Bezos anchored the entire organisation to a single metric: customer obsession, measured through long-term customer value rather than quarterly revenue. The metric has remained essentially unchanged since the late 1990s. The consistency has been strategic, not fragile.

The difference is how the intent is structured. Amazon's metric is an outcome (long-term customer value), not an activity (transactions per quarter). It's difficult to game because gaming it requires actually improving customer experience, which is the strategic goal. Threshold concentration risk is low because no single decision or team can move the metric materially in isolation.

Amazon counter-case guardrail: When your anchor metric is an outcome rather than an activity proxy, metric consistency becomes defensible because the only way to game the metric is to do the thing you actually want done. This structure lowers the ROI of shortcuts whilst raising the ROI of genuine strategic execution.
The lesson from Amazon isn't that metric consistency always works. The lesson is that outcome-based metrics with low threshold concentration can sustain consistency because gaming them requires doing the thing you actually want done.
Tripwire: If a UK SME maintains a founding metric through three strategic pivots and delivers >15% annual growth from January 2025 to December 2027 without material customer churn, metric consistency was strategically aligned.
Until that evidence appears, the default assumption for £5-20m UK businesses should be periodic metric recalibration, because the institutional capacity to resist gaming through consistency alone typically doesn't exist at this scale.

One-Screen Founder Action

Open a blank spreadsheet. Five columns:

1. List your top 5 KPIs (the metrics that drive board discussion and comp decisions)

2. Pay/status %: What percentage of team compensation or status rests on each metric? Flag any >30%.

3. Outcome vs Activity: Label each as "outcome" (measures strategic result) or "activity" (measures proxy behaviour). Flag all activities.

4. Public-formula test: For each KPI, write what your team would do in 90 days if you announced the exact formula and incentive structure today. If you can articulate a gaming strategy, the metric needs redesign.

5. Chaperone assignment: For each primary KPI, identify the counter-metric that would reveal gaming (revenue ↔ NRR, efficiency ↔ utilisation, acquisition ↔ CAC). If you don't have the chaperone, you're flying blind.

Final step: Pick one metric to retire this quarter. Not revise, retire. Choose the metric that's been unchanged longest or where threshold concentration exceeds 30%. Replace it with an outcome-based metric that has an explicit 18-month review date.

The moment you assume your KPIs are neutral windows into reality, they've already started playing you.

---

References

[1] Consumer Financial Protection Bureau. (2016). Consumer Financial Protection Bureau Fines Wells Fargo $100 Million for Widespread Illegal Practice of Secretly Opening Unauthorized Accounts. Retrieved from https://www.consumerfinance.gov/about-us/newsroom/consumer-financial-protection-bureau-fines-wells-fargo-100-million-widespread-illegal-practice-secretly-opening-unauthorized-accounts/

[2] US Department of Justice. (2020). Wells Fargo Agrees to Pay $3 Billion to Resolve Criminal and Civil Investigations Into Sales Practices. Retrieved from https://www.justice.gov/opa/pr/wells-fargo-agrees-pay-3-billion-resolve-criminal-and-civil-investigations-sales-practices

[3] Gillborn, D., & Youdell, D. (2004). Rationing education: Policy, practice, reform and equity. Journal of Education Policy, 19(1). Retrieved from https://www.tandfonline.com/doi/abs/10.1080/0305498042000211529