Your KPIs are creating the behaviour you're trying to measure.
The Founder Test: Has this KPI forced a decision in the last 30 days that wouldn't have happened otherwise? If not, it's decoration, not infrastructure.
Section 1: The Institutional Blind Spot
Most CEOs operate on a simple equation: measurement equals management. You set a KPI, the team aligns to it, performance improves. The metric is assumed to be neutral, a window into reality. This is the institutional blind spot.
Metrics aren't windows. They're levers. The moment you publish a KPI, you've stopped measuring organic behaviour and started manufacturing optimised behaviour. The question isn't whether your team will game the metric. The question is how long until they discover the optimal gaming strategy, and whether you'll notice before it becomes systemic.
Wells Fargo discovered this the expensive way. Between 2011 and 2015, employees created approximately 3.5 million unauthorised accounts[1]. The metric was simple: eight products per customer, internally branded as "Eight is Great". The assumption was equally simple: more products per customer equals deeper engagement equals revenue growth.
The metric didn't measure engagement. It created a fabrication engine. Branch staff opened accounts without customer knowledge, transferred funds between accounts to generate activity, and issued credit cards that were never requested. The target wasn't just missed, it was systematically subverted. Wells Fargo ultimately paid approximately $3 billion in fines and settlements[2], but the deeper cost was the 14-year erosion of the assumption that drove the entire consumer banking strategy.
The lesson isn't that Wells Fargo hired bad actors. The lesson is that a clear, trackable, incentive-aligned metric produced catastrophic outcomes because the institutional assumption was that metrics measure rather than manufacture.
Activation filter: For every KPI you currently track, name the owner, the action it should trigger, and the date by which inaction becomes negligence. If you can't complete this sentence, the metric is noise.
Section 2: UK Evidence - The Gaming Patterns
The UK has its own precedent library. When the Department for Education published the first school league tables in 1992, the explicit goal was to improve educational outcomes through transparency. The metric was straightforward: percentage of students achieving five or more GCSEs at grades A*-C.
Schools optimised. Within 18 months, measurable distortions appeared[3]: students were entered for easier subjects rather than rigorous ones, borderline students (those likely to achieve grade C with intervention) received disproportionate resource allocation whilst high performers and struggling students were neglected, and schools began counselling low-performing students to leave before taking exams to protect the headline percentage.
The metric didn't improve education. It reoriented institutional behaviour toward the metric itself. Teachers stopped asking "what does this student need?" and started asking "will this student move the percentage?"
Composite case (three UK SMEs, £8-12m revenue, Q2 2018 to Q1 2020, details anonymised):
A professional services firm set a utilisation target of 75% billable hours for consultants. The gaming mechanism was time manipulation: within six months, consultants were logging internal meetings as "client development", stretching 4-hour engagements across 2-day site visits to inflate hours, and declining complex, lower-margin projects that required more internal coordination. Revenue grew 11%, but client satisfaction scores dropped 19 points and two major accounts didn't renew. The CEO only discovered the pattern when a departing consultant admitted in their exit interview that the team had an informal Slack channel dedicated to "making the numbers work".
A manufacturing business introduced a defect rate KPI tied to production line bonuses. The target was <2% defects per batch. The gaming mechanism was classification manipulation: within three months, line supervisors were reclassifying defects as "cosmetic" rather than "functional", holding back problematic batches for rework after the inspection window, and in one case, a supervisor was discovered physically removing defect tags before audits. The CEO only identified the pattern when a major customer threatened to pull a £1.2m contract over quality issues that weren't appearing in internal reports.
A SaaS scale-up tied sales bonuses to contract value, not renewal probability. The gaming mechanism was contract structuring: the sales team optimised by offering aggressive discounts on annual contracts to enterprise clients, knowing that the contracts included complex integrations the product team couldn't reliably deliver within the promised timeline. First-year ARR grew 34%, but renewal rates in year two dropped to 52% as clients churned due to unfulfilled technical commitments. The metric produced short-term growth that became a 24-month revenue cliff.
The pattern across all three: the metric was technically accurate (hours were logged, defects were recorded, contracts were signed), but the behavioural response optimised the number whilst destroying the strategic intent behind it.