01 October, 2015

Leadership: Do you get what you inspect or expect?

Prelude


The capstone ROTC class is titled, simply, "Leadership."  Mine was taught in 1998 by CAPT J.A. Fischbeck, a nuclear-trained former skipper of the USS La Jolla (SSN 701) and later the director of the Navy's Arctic Submarine Laboratory.

Our final exam was to write an essay on the topic "Do you get what you inspect, or expect?"  I chose the latter, and wrote an essay on the power of setting clear, high expectations and demanding people be accountable to them.

It was too easy, I argued, for mere inspection to devolve into a checklist mentality.  I had been on board a submarine - I think it was the USS Oklahoma City (SSN 723) - as a midshipman when its computer emergency scrammed the nuclear reactor quite unexpectedly.  Someone had gotten complacent filling out a routine inspection form, and hadn't noticed an unsafe condition develop*.

It was an... interesting... experience.  Subs are usually trimmed for slight negative buoyancy and controlled with diving planes; without propulsion an already submerged submarine will slide backward into the depths while the operators scramble to bring the emergency electric propulsion online.  The backup to the backup is an emergency blow, which we did not have to perform that day.



Also, I had been inculcated early on by Mark Stehlik's words in the first day of 15-127, echoed by every professor up through the 400-level operating systems class whenever anyone asked about specs for an assignment: "Your program should just do the right thing."

Alas, my answer was not what CAPT Fischbeck expected.  I was called to his office, and queried for half an hour to ensure I was sufficiently enamored of checklists and inspections.  After a bit of reflection, I was awarded an "A".

Inspection is a critical part of management, and often one of the least well understood.  In many cases, what's being measured isn't the actual behavior or outcome you want, but a proxy.  If people can figure out how to deliver the metric for less cost than doing the desired behavior, they will commit arbitrage.  Measuring reality real-time instead of before/after proxies is is both a promise and a peril of the coming IoT.

As a leader, you must understand and anticipate this arbitrage - ideally even design systems which produce quality the first time because they cannot be completed incorrectly (poka yoke).

Because when (not if) your followers commit arbitrage, it's usually not just a moral issue: it's also a failure of design and leadership.

ACT I: Amazon.com pickers deliver what's measured instead of what's desired


My MBA internship during the summer of 2003 was spent at their automated fulfillment center in Fernley, Nevada - about an hour east of Reno on Rt 80.  My project was to understand why - despite the best algorithms available - their indirect labor for pickers was about 40% higher than anticipated.

After working with the pickers for a few days, the answer was quite apparent: pickers gamed the system, because their managers were measuring the wrong thing.

The managers had been measuring the pickers on their "pick rate" - that is, the number of things they picked per hour.  Only this isn't the behavior they wanted.  What they really wanted was pickers who would walk to wherever the handheld computer told them to go next, pick up the next item and put it on the conveyor belt.  PhD programmers and line managers assumed equivalence - that the pick rates on the assigned paths through the warehouses would balance out.

What workers had discovered was that this assumption was incorrect.  The random stow area often had pick rates of 2-3x the pick-to-light areas, and almost 10x some of the outlying areas.  Once the computer moved a picker to a "slower" area, that picker tended to be assigned picks in the slow area for a while - which killed his productivity numbers.

Workers learned quickly to logout of the system instead of accepting a penalty-inducing walk across the warehouse to purgatory.  They would wait a minute or two and log back in, report their current position, and be assigned a nearby item.  And because the system optimized into the future, each time a worker "punted" an item, the whole queue would disappear - often several dozen items.  Later, trusted people had to be pulled off the floor and sent to the far corners of the warehouse to chase down red-flagged items that were approaching a mandatory ship time... because things which were punted once tended to keep getting punted.

It took me a few more weeks to understand their computer systems well enough to hack together an awk script which analyzed several weeks of data to prove my theory.  And incidentally to document why the timecard system and picking assignment system should use the same database.  Of course, a good programmer could have done the job in 3 hours using PERL - but I had spent four years driving ships and a year in business school so I was more than a little rusty.  Hey - at least with my leadership experience I knew right where to look.

Later, a dedicated team revamped the system to measure workers on how they performed relative to the actual path through the warehouse the computer assigned them.  A sabermetric approach, if you will.

ACT II: Melamine scandal - first in dog food, then a year later in baby food


In 2007, there was a dog food scandal involving melamine.  Because buyers were paying for protein, they used a chemical test to confirm the amount of protein.  Only, this test could be fooled by a chemical called melamine.  So sellers sometimes cut their wares with a little bit of melamine so they would get paid more for it.  This was widely known to anyone who cared to look.

In 2008, there was a baby food scandal involving melamine that eventually spread to chocolate and other dairy foodstuffs.

The first scandal is as much a failure of leadership and management as it is scandalous behavior.  It's not blaming the victim to observe that the system seemed to demand cheating - when American consumers wanted low cost pet food, the companies providing it sourced cheap ingredients from China.

Those execs, for whatever reason, chose to rely on an inspection that could be easily fooled.  And they either didn't know or care about cost and competitive structure that would have tipped them off to questionable bids.  It's a lot like the arguments around doping from cyclists, sponsors, and WADA.  Or Captain Renault in Casablanca - everyone's "shocked."

It's never been clear to me how the second scandal could happen.  It seems straightforward to determine, once the dog food scandal occurred, what other food products could likely be economically and chemically vulnerable to the exact same scam.

Perhaps even though they both involve melamine, the scandals are quite different - the wikipedia entry on the baby food doesn't even identify the dog food incidents the year before as a precursor.  Or perhaps everyone in the baby food industry was simply "shocked" too.

ACT III: Volkswagen


Recently, Volkswagen has been in the news for fielding diesel engines in their cars which trick emissions testers then pollute at higher levels on the road.  First 400,000 cars in the U.S., it now looks like 11 million cars may be affected globally.

Transport and Environment, and the International Council on Clean Transportation, say that VW "is just the tip of the iceberg."  The T&E article linked there goes into moderate detail about how other manufacturers falsify environmental and fuel economy tests in the EU, whose regulations - though weak - are stronger than most other non-US nations.

Tyler Cowen cites Glaeser's law of microeconomic puzzles, "It's either taxes or fraud."  I'd not heard of that before, but it fits with my experience. (note that fraud in this context is not the legal sense, but the structural sense)

As Prof. Cowen said re VW, "Manipulated data will be one of the big, big stories of the next twenty years, or longer."
There is a “regulation ought to be tougher” framing, but there is also a “we’ve been overestimating the benefits of regulation” framing too.  Don’t let your moral outrage, which leads you to the former lesson, distract you from absorbing some of the latter lesson too. 
How did the European government come to encourage diesel in such a distorting way?  VoxEU explains it started with a lower gas tax on diesel during the 1973 oil crisis, to protect agriculture and transport.  This increased its attractiveness for automotive use, and by 1990 diesel had 10% of the European auto market.  That 10% was enough to ensure a minimum level of mechanics and gas pumps across the countries; then EU emissions changed to tightly control CO2 at the expense of extra NOx; this was the equivalent of a 20% import duty on US cars which set more balanced limits.  I read other things that suggested the EU subsidized research on TDI and other diesel technology to open that gap further.

Brookings suggests that the VW scandal will set back the Transatlantic Trade and Investment Partnership, the Atlantic equivalent of the Trans Pacific Partnership you've read so much about.  The TTIP may be set back so far that the TPP regulations will set the global precedent, shifting some degree of power away from Europe towards Asia.

Various people have speculated on the human and economic cost of the additional pollution in the United States; whatever the cost it certainly was more than VW gained by not adding additional anti-pollution devices in the US like BMW does.

It is perhaps culturally insensitive to point out that the name of the Chairman of VW's Supervisory Board is named Olaf Lies.  He's also the Minister of Economic Affairs, Labor and Transport for the Federal State of Lower Saxony and Deputy Prime Minister - which to me seems like a conflict of interest.  And unlikely to ensure top performance of VW's famed anti-corruption system.

CODABe careful what you decide to inspect... or expect


That which is measured is often delivered... and sometimes distorted.  And setting expectations beyond the capability of economics or physics will only make it worse.

If you've structured a situation where your followers must choose between a path of high morals/low evaluation and low morals/high evaluation, you've given them what's known in psychology as a double-bind.

Too often, "stretch goals" are passed downstream by leaders who don't know (or perhaps care about) the difference between hard and impossible.  Or, it may be possible just not with the resources available.  In politics an example is an unfunded mandate.

In the extreme, a bad leader may prefer plausible deniability: he may construct a heads-I-win-tails-you-lose situation where he refuses to accept no for an answer, then is "shocked" to find out his followers cheated to accomplish the task.

I'm not saying to lead like the king in the little prince, the one who only orders his subjects to do things they were going to do anyway

I am saying that instead of blindly demanding results, demand a known process which leads to that result.  And once you have a process that delivers the result, get it under control (reduce the variance).  And once it's under control, demand improvements to the process (reduce the mean).

Because nine women can't make a baby in a month, regardless of what your spreadsheet says.


Thanks for reading,
Greg



*The actual issue, if I recall correctly, was an inspection form which included readings for two (redundant) pressure gauges.  As long as both gauges were within a certain tolerance of each other, the procedure was to sum the readings, and divide by two to average them.

Because one of the gauges was temporarily inop, the procedure had been modified to just using the second gauge's reading.  The watch was relieved, and somehow knowledge wasn't transferred to the next section.  The new scribe faithfully wrote down the two gauge readings - let's say 0 and 200 - and divided by two, wrote down the answer continued on his rounds, and brought the checklist back to the reactor control room.

The EOOW focused on the final number - 100 in our example - and adjusted the rest of the reactor controls in a futile attempt to get that 100 back up towards 200.  As the system got farther and farther out of whack, eventually the computer took over and shut it down before things got too far out of hand.  At least that's how this non-submariner, non-nuclear engineer remembers it...

No comments:

Post a Comment

Focused, topical critiques are welcome though my responses may be limited. Off-topic comments, trolling, spam, or anything not pulling its weight will be deleted.