The Source of Failure: We Optimize What We Measure

May 30, 2016

Rather than measure consumption and metrics that incentivize debt, what if we measure well-being and opportunities offered in our communities?

The problems we face cannot be fixed with policy tweaks and minor reforms. Yet policy tweaks and minor reforms are all we can manage when the pie is shrinking and every vested interest is fighting to maintain their share of the pie.

Our failure stems from a much deeper problem: we optimize what we measure. If we measure the wrong things, and focus on measuring process rather than outcome, we end up with precisely what we have now: a set of perverse incentives that encourage self-destructive behaviors and policies.

The process of selecting which data is measured and recorded carries implicit assumptions with far-reaching consequences. If we measure "growth" in terms of GDP but not well-being, we lock in perverse incentives to boost 'growth" even at the cost of what really matters, i.e. well-being.

If we reward management with stock options, management has a perverse incentive to borrow money for stock buy-backs that push the share price higher, even if doing so is detrimental to the long-term health of the company.

Humans naturally optimize what is being measured and identified as important.

If students’ grades are based on attendance, attendance will be high. If doctors are told cholesterol levels are critical and the threshold of increased risk is 200, they will strive to lower their patients’ cholesterol level below 200.

If we accept that growth as measured by gross domestic product (GDP) is the measure of prosperity, politicians will pursue the goal of GDP expansion.

If rising consumption is the key component of GDP, we will be encouraged to go buy a new truck when the economy weakens, whether we need a new truck or not.

If profits are identified as the key driver of managers’ bonuses, managers will endeavor to increase net profits by whatever means are available.

The problem with choosing what to measure is that the selection can generate counterproductive or even destructive incentives.

This is the result of humanity’s highly refined skill in assessing risk and return. All creatures have been selected over the eons to recognize the potential for a windfall that doesn’t require much work to reap.

When humans were hunter-gatherers—our natural state for hundreds of thousands of years, compared to roughly 5,000 years of agriculture—those on the lookout for a calorie-rich windfall that didn’t require a lot of work ate better (and had more offspring that survived) than those who failed to reap windfalls. In the natural world, such windfalls might be a tree heavy with ripe fruit or a beehive loaded with honey.

Calories were scarce, and work burns a lot of calories, so the ideal scenario for the hunter-gatherer is a windfall that can be harvested with a minimal investment of calories/effort.

In our economy, qualifying for a positive reward without investing too much effort is a windfall. As a result, whatever is measured sets up a built-in incentive to game the system (i.e. exploit short-cuts) or cheat to qualify for the reward with the least effort possible.

So if students are graded on attendance, and attendance is measured by the students signing in at the start of class, students can get the reward of a high grade by signing in and then sneaking away.

If students are graded on submitting homework daily, some students will extract homework from other students that can be copied with less effort than actually doing the work. Those seeking a windfall might use bribes or threats or blandishments to get the free homework, as the investment required to pursue these strategies is still smaller than that needed to do the homework.

If the grades are measured by a multiple-choice exam, some students will attempt to steal the answers ahead of the exam.

Compare these relatively easy-to-game thresholds to difficult-to-game tests such as long-hand answers to randomly selected questions assigned to each individual at the start of the exam. If the answers must be composed within the test period, it is essentially impossible to learn which questions students will receive beforehand and therefore impossible to prepare an answer (or pay someone else to answer) beforehand.

Once the time and effort needed to game the system exceeds the investment required to learn the material, the incentives shift to learning the material with the least effort possible.

Notice that the system’s cost of measuring data and enforcing compliance is correlated to the effectiveness of the enforcement and the value of the data. The lower the system’s costs, the lower the compliance rate and the value of the data. Any system which makes compliance cheaper in effort invested than shortcutting the system will have high costs. The more effort invested in obtaining meaningful data, the higher the value of the data.

In our example, the cheapest measures of student performance--attendance, multiple-choice tests, etc.--do the poorest job of measuring actual student learning. To actually measure student learning requires significant investment in the process, and a careful analysis of what metrics best reflect real student learning.

There is a growing dissatisfaction in the economics field with the current measures of economic activity: GDP, unemployment, and so on. This dissatisfaction reflects a growing awareness that these legacy metrics do a poor job of capturing what is actually important in fostering sustainable, broad-based prosperity, what many call well-being.

What are we measuring in healthcare?

Healthcare metrics offer a useful analogy. If cholesterol levels are a critical measure of health, then the medical community devotes its resources to lowering cholesterol levels below whatever threshold has been identified as critical. But what if a better overall yardstick of health and risk is the body mass index (BMI), which measures height and weight?

While there are limits to BMI (for example, some super-fit bodybuilders might have an elevated BMI, even though they are not fat), for the vast majority of people BMI is a useful measure of risk for cardiovascular and metabolic syndrome-related diseases such as diabetes.

Unlike measuring cholesterol, BMI does not require drawing blood; anyone with access to a tape measure and a cheap scale can calculate their BMI. Unlike cholesterol and blood pressure, both of which can be lowered with medications, BMI is difficult to short-cut. The only way to lower your BMI is to lose weight, which in the vast majority of people means losing accumulated fat via a disciplined regime of diet and fitness.

The status quo is based on legacy metrics that are misleading or counter-productive. The status quo has been optimized to gather these measurements and assign great meaning to them.

Does it make sense to optimize expanding consumption when resources are finite and the incentives to squander resources on unproductive consumption are so high?

If profits are the only metric that matter, and labor costs are rising for structural reasons, then why would private enterprise hire more workers when robots and software are cheaper and more productive in terms of boosting profits?

If we measure academic achievement by the issuance of a college degree, but the process of earning that degree does not measure real student learning, then what are we measuring with college diplomas? What we’re really measuring is the students’ ability to navigate an academic bureaucracy for four or five years. Since we’re not measuring useful learning, we have no way to hold colleges accountable for their demonstrable failure to teach useful skills.

What are we measuring in the workplace?

The key point here is systemic success or failure arises from our choices of what to measure and what thresholds we set as meaningful. Whatever we select to measure and deem important, participants will optimize their choices and behaviors to reach the rewards that are incentivized.

If we choose counterproductive metrics, we built perverse incentives into the system, incentives that guide the goals, strategies and behaviors of participants.

Rather than measure consumption and metrics that incentivize debt, what if we measure well-being and opportunities offered in our communities? What if we measured doing more with less rather than consuming more? What if our primary measure of economic well-being was the reduction of inputs (resources, labor, capital, etc.) that resulted in higher output (increased well-being)?

How can we select metrics that productively measure well-being, sustainability and opportunity not just for elites but for every participant? What thresholds can we set that will create incentives for adopting best practices and appropriate technologies?

These questions help us see that to create a sustainable system that alleviates inequality and poverty, we must first choose metrics that create productive incentives for best practices and disincentives for fraud, corruption, waste and inefficiencies.

This essay was drawn from my new book Why Our Status Quo Failed and Is Beyond Reform.

A Radically Beneficial World: Automation, Technology and Creating Jobs for All is now available as an Audible audio book.

My new book is #2 on Kindle short reads -> politics and social science: Why Our Status Quo Failed and Is Beyond Reform ($3.95 Kindle ebook, $8.95 print edition) For more, please visit the book's website.

NOTE: Contributions/subscriptions are acknowledged in the order received. Your name and email remain confidential and will not be given to any other individual, company or agency.

Thank you, Michael K. ($5/month), for your fabulously generous re-subscription to this site-- I am greatly honored by your steadfast support and readership.

Discover why I’m looking to retire in a SE Asia luxury resort for $1,200/month.