Skip to content(if available)orjump to list(if available)

Goodhart's law isn't as useful as you might think (2023)

nrnrjrjrj

I want to block some time to grok the WBR and XMR charts that Cedric is passionate about (for good reason).

I might be wrong but I feel like WBR treats variation (looking at the measure and saying "it has changed") as a trigger point for investigation rather than conclusion.

In that case, lets say you do something silly and measure lines of code committed. Lets also say you told everyone and it will factor into a perforance review and the company is know for stack ranking.

You introduce the LOC measure. All employees watch it like a hawk. While working they add useless blocks of code an so on.

LOC commited goes up and looks significant on XMR.

Option 1: grab champagne, pay exec bonus, congratulate yourself.

Option 2: investigate

Option 2 is better of course. But it is such a mindset shift. Option 2 lets you see if goodhart happened or not. It lets you actually learn.

jjmarr

I can confirm this. We've standardized Goodhart's law creating a 90-day rotation requirement for KPIs. We found that managers would reuse the same performance indicators with minor variations and put them on sticky notes to make them easier to target.

hilux

Wow. That is an extremely cool idea - new to me.

Do you have enough KPIs that you can be sure that these targets also serve as useful metrics for the org as a whole? Do you randomize the assignment every quarter?

As I talk through this ... have you considered keeping some "hidden KPIs"?

jjmarr

I'm riffing on password rotation requirements and the meta-nature of trying to make Goodhart's law a target. I could've been a bit more obviously sarcastic.

Spivak

If your managers are doing that it's a strong signal your KPIs are a distraction and your managers are acting rationally within the system they're been placed.

They need something they can check easily so the team can get back to work. It's hard to find metrics that are both meaningful to the business and track with the work being asked of the team.

bachmeier

Just a side note that this usage isn't really the application Goodhart had in mind. Suppose you're running a central bank and you see a variable that can be used to predict inflation. If you're doing your job as a central banker optimally, you'll prevent inflation whenever that variable moves, and then no matter what happens to the variable, due to central bank policy, inflation is always at the target plus some random quantity and the predictive power disappears.

As "Goodhart's law" is used here, in contrast, the focus is on side effects of a policy. The goal in this situation is not to make the target useless, as it is if you're doing central bank policy correctly.

thenobsta

This doesn't feel well elucidated, but I've been thinking about Goodhart's law in other area's of life -- e.g. Owning a home is cool and can enable some cool things. However, when home ownership becomes the goal, it's becomes easy to disregard a lot of life giving things in pursuit of owning a home.

This seems to pop up in a lot of areas and I find myself asking is X thing a thing I really desire or is it something that is a natural side effect of some other processes.

nrnrjrjrj

If you are smart and think alot you can do well renting and investing elsewhere.

You can also ask what is life about?

This is hard to do because the conclusion may need to break moulds, leading to family estrangement and losing friends.

I suspect people who end up having a TED talk in them are people who had the ability through courage or their inherited neural makeup to go it alone despite descenting voices. Or they were raised to be encouraged to do so.

lamename

This is all well and good, but unfortunately depends on the people pushing for the metric/system to give a shit about what the metric is supposed to improve. There are still far too many that prefer to slap 1 or 2 careless metrics on an entire team, optimize until they're promoted, then leave the company worse off.

skmurphy

There is a very good essay in the first comment by "Roger" dated Jan-2023, reproduced below. Skip the primary essay and work from this:

"I really appreciated this piece, as designing good metrics is a problem I think about in my day job a lot. My approach to thinking about this is similar in a lot of ways, but my thought process for getting there is different enough that I wanted to throw it out there as food for thought.

One school of thought 9https://www.simplilearn.com/tutorials/itil-tutorial/measurem...) I have trained in is that metrics are useful to people in 4 ways:

    1. Direct activities to achieve goals
    2. Intervene in trends that are having negative impacts
    3. Justify that a particular course of action is warranted
    4. Validate that a decision that was made was warranted
My interpretation of Goodhart’s Law has always centered more around duration of metrics for these purposes. The chief warning is that regardless of the metric used, sooner or later it will become useless as a decision aid. I often work with people who think about metrics as a “do it right the first time, so you won’t have to ever worry about it again”. This is the wrong mentality, and Goodhart’s Law is a useful way to reach many folks with this mindset.

The implication is that the goal is not to find the “right” metrics, but to instead find the most useful metrics to support the decisions that are most critical at the moment. After all, once you pick a metric, 1 of 3 things will happen:

    1. The metric will improve until it reaches a point where you are not improving it anymore, at which point it provides no more new information.
    2. The metric doesn’t improve at all, which means you’ve picked something you aren’t capable of influencing and is therefore useless.
    3. The metric gets worse, which means there is feedback that swamps whatever you are doing to improve it.
Thus, if we are using metrics to improve decision making, we’re always going to need to replace metrics with new ones relevant to our goals. If we are going to have to do that anyway, we might as well be regularly assessing our metrics for ones that serve our purposes more effectively. Thus, a regular cadence of reviewing the metrics used, deprecating ones that are no longer useful, and introducing new metrics that are relevant to the decisions now at hand, is crucial for ongoing success.

One other important point to make is that for many people, the purpose of metrics is not to make things better. It is instead to show that they are doing a good job and that to persuade others to do what they want. Metrics that show this are useful, and those that don’t are not. In this case, of course, a metric may indeed be useful “forever” if it serves these ends. The implication is that some level of psychological safety is needed for metric use to be more aligned with supporting the mission and less aligned with making people look good."

turtleyacht

Thank-you. The next time metrics are mentioned, one can mention an expiration date. That can segue into evolving metrics, feedback control systems, and the crucial element of "psychological safety."

A jaded interpretation of data science is to find evidence to support predetermined decisions, which is unfair to all. Having the capability to always generate new internal tools for Just In Time Reporting (JITR) would be nice, even so reproducible ones.

This encourages adhoc and scrappy starts, which can be iterated on as formulas in source control. Instead of a gold standard of a handful of metrics, we are empowered to draw conclusions from all data in context.

skmurphy

I am not "Roger," but I can recognize someone who has long and practical experience with managing metrics and KPIs and their interaction with process improvement. Instead of an "expiration date" I would encourage you to define a "re-evaluation date" that allows enough time to judge the impact and efficacy of the metrics proposed and make course corrections as needed (each with its own review dates).

One good book on the positive impact of a metric that everyone on a team or organization understands is "The Great Game of Business" by Jack Stack https://www.amazon.com/Great-Game-Business-Expanded-Updated-... I reviewed it at https://www.skmurphy.com/blog/2010/03/19/the-business-is-eve...

Here is a quote to give you a flavor of his philosophy:

"A business should be run like an aquarium, where everybody can see what's going on--what's going in, what's moving around, what's coming out. That's the only way to make sure people understand what you're doing, and why, and have some input into deciding where you are going. Then, when the unexpected happens, they know how to react and react quickly. "

Jack Stack in "Great Game of Business."

shadowsun7

I should note that this essay kicks off an entire series that eventually culminates in a detailed examination of the Amazon Weekly Business Review (which takes some time to get to because of a) an NDA, and b) it took some time to test it in practice). The Goodhart’s Law essay uses publicly available information about the WBR to explain how to defeat Goodhart’s Law; the WBR is two decade-old mechanism on how to actually accomplish these high-falutin’ goals.

https://commoncog.com/the-amazon-weekly-business-review/

Over the past year, Roger and I have been talking about the difficulty of spreading these ideas. The WBR works, but as the essay shows, it is an interlocking set of processes that solves for a bunch of socio-technical problems. It is not easy to get companies to adopt such large changes.

As a companion to the essay, here is a sequence of cases about companies putting these ideas to practice:

https://commoncog.com/c/concepts/data-driven/

The common thing in all these essays is that it doesn’t stop at high-falutin’ (or conceptual) recommendation, but actually dives into real world application and practice. Yes, it’s nice to have a re-evaluation date? But what does that look like in practice?

bediger4000

Seems like the headline should be:

Is Goodhart's Law as useful as you think?