Even though I often profess that my primary interests are technical, by this point in my life I have been exposed to a variety of different organisations and management styles: From the self-organizing chaos of the 1996-2002 cracking/hacking groups, through the small engineering-centric startup zynamics, via the various organisations (both governmental and industry) I consulted for at some point, to the large (but nonetheless engineering-centric) culture at Google.
I enjoy thinking about organisations - their structure, how information flows, their strengths and dysfunctions. Part of it may be the influence of my father (who wrote extensively on matrix organisations, but also on organisations that fail); the other part is certainly the recognition that both company culture and organisational culture matter. In any organisation, setting the culture and organisational structure - and keeping it healthy - is paramount, and probably the key element that will allow long-term success. Ignore culture and organisation structure (both explicit and implicit) at your peril.
I had a lot of time to think in the last year, so in the coming months I will write a few posts / essays about company culture and management.
The first post is about organisational processes - why they are important, but also how they can take on a life of their own and strangle flexibility.
A technical anecdote to start with
In early 2004, the first prototype of BinDiff started to work properly - just when Microsoft released MS04-001: A series of amusing little memory corruptions inside the H.323 parsing component of Microsoft ISA server (a now-discontinued firewall product). Using BinDiff on the patch, it was evident that the problems were inside the ASN.1 PER parsing routines in a central library - but instead of fixing the library, the patch fixed the issue inside ISA server.
The patch fixed only one exploit path, but the actual vulnerability was still there. This meant that any other program using the same library remained vulnerable, and the patch had now effectively disclosed the security issue. I started searching for other applications that used this library. The first program I found which was also affected by this vulnerability was Netmeeting - Microsoft had inadvertently given a remote code execution bug in Netmeeting to everybody. It wasn't until MS04-011, at some point in April, that this vulnerability got fixed in the correct place -- the library.
The technical details of the bug are not terribly interesting - what is interesting is what the mistake revealed about flaws in Microsoft’s organisational structure, and how they reacted to the bug report.
What could we deduce/learn/extrapolate from this event?
- Bug reports were likely routed to the product teams - e.g. if a bug is reported in your product, the bug report is routed to you.
- Responsibility for fixing a bug appears to lie with the product teams (see above), and teams are incentivized (either directly or indirectly through feature deadlines etc.) to get bug reports “off their desk” quickly.
- Patching shared central code is harder than patching code you own (for various reasons - perhaps compatibility concerns, other priorities from other teams, or perhaps even a heavyweight process to ask for changes in critical code).
What likely happened is that the ISA team decided that dealing with the issue on their side is enough - either because they did not realize that the same issue will affect others, or because dealing with the other team / the library is a pain, or for some other unknown reason. Microsoft’s bug fixing process incentivized “shallow” fixes, so for attackers, finding the ultimate root cause of a vulnerability could expose other vulnerable programs.
This is a classical example of making a locally convenient decision that adversely affects the larger organisation.
From what I heard, Microsoft learned from this event and made organisational changes to prevent similar mistakes in the future. They introduced a process where all patches are reviewed centrally before they go out to ensure that they don't inadvertently fix a bug in the wrong spot, or disclose a vulnerability elsewhere.
Processes as organisational learning
In what an MBA would call ‘organisational learning’, a process was created out of the experience with a previous failure in order to prevent the mistake from happening again. A process is somewhat similar to organisational scar tissue - the organisation hurt itself, and to prevent such injury in the future, the process is established.
Surprisingly, most organisations establish processes without documenting explicitly what sort of failure and what sort of incident caused the process to be established. This knowledge usually only lives in the heads of individuals that were there, or in the folklore of those that talked to those that were there. After a half a decade or so, nobody remembers the original incident - although the process will be alive and kicking.
A process can prevent an organisation from doing something stupid repeatedly - but all too often, the process takes on a life of its own: People start applying the process blindly, and in the hands of an overly-literally-minded person, the process becomes an obstacle to productivity or efficiency. The person in charge of applying and enforcing the process may themselves not know why it is there - just that it is "the process", and that bad things can happen when one doesn't follow it.
My grandfather used to say (I will paraphrase) : "a job with responsibility is a job where you don’t simply apply the rules, but need to make judgements about how and where to make exceptions". This quote carries an important truth:
People at all places in an organisation need to be ...
- Empowered to make exceptions: After demonstrating sound judgement, people need to feel empowered to make exceptions when the letter of a process gets in the way of the greater good and changing the process would be excessive (for example, in a one-off situation).
- Empowered to challenge processes: The reasoning behind a process must to be accessible to organisation members, and there needs to be a (relatively pain-free) method to propose changing the process. Since powerlessness is one of the main drivers of occupational burnout, this will help keep individuals and the organisational structure healthy.
Some organisations get the “exception” part right - most big organisations only function because people are regularly willing to bend / twist / ignore processes. Very, very few organisations get the “challenge” part right-- making sure that every employee knows and understands that processes are in the service of the company, and that improvements to processes are welcome.
I think that the failure to achieve the challenge-process frequently arises due to "lack of institutional memory". When organisations fail to keep track of why a process was created, all sorts of harmful side-effects arise:
- Nobody can meaningfully judge the spirit of the process - what was it designed to prevent?
- Making an exception to the process is riskier - if you do not know what it was designed to prevent, how can you know that in this particular case that risk does not apply?
- Amending the process becomes riskier. (Same reason as above.)
- Challenging the process cannot happen in a decentralized / bottom-up fashion: It is often the most junior employees who may have the freshest eyes for obstructive processes - but since they do not know the history of why the processes exists, they often can't effectively propose a change since they don’t know the organisation well enough to rule out unwanted side-effects. This directly sabotages decentralised, bottom-up improvements of workflows.
What is a healthy way to deal with processes?
- Realize that they are a form of “organisational memory”: They are often formed as reaction to some unpleasant event - with the intent of preventing this event from repeating. It is also important to realize that unchecked and unchallenged processes can become organisational “scar tissue” - more hindrance than help.
- Keep track of the exact motivation for creating each process -- the “why”. This will involve writing half a page or more, and checking with others involved in the creation of the process that the description is accurate and understandable.
- The motivations behind the process should be accessible to everybody affected by it.
- Everybody should know that company processes are supposed to support, not hinder, getting work done. Everybody should feel empowered to suggest changes in a process - ideally while addressing why these changes will not lead to a repeat of the problem the process was designed to prevent.
- People should be empowered to deviate from the process or ignore it - but frequent or even infrequent-but-recurring exceptions are a red flag that the process needs to be improved. Don't accumulate "legacy process" and "organisational debt" through the mechanism of exception-granting.
- Everybody should be aware that keeping processes functional and lean is crucial to keeping the organisation healthy. Even if a process is unreasonable and obstructive, most people instinctively try to accept it - but the first instinct should ideally be to change it for the better. Constructively challenging a broken process is a service to the organisation, not an attack.
- It may be sensible to treat processes a bit like code - complete with ownership of the relevant process, and version control, and handover of process ownership when people change jobs. Amendments to processes can then be submitted as text, reviewed by the process owner, discussed, and eventually approved - much like a patch or removal of dead code.
Keeping an organisation healthy is hard. The most crucial ingredient to keeping it healthy, though, is that the members of the organisation care to keep it healthy. Therefore it is absolutely critical to encourage fixing the organisation when something is broken - and to not discourage people into "blindly following the process".