One of the concerns often raised when people hear about kanban is that the weakest/slowest link will slow down the whole chain.
For example if testing is a bottleneck what will happen is that the whole chain will accommodate its pace.
Similarly in scrum a team that actually does realistic planning will commit to a goal that stretches the bottleneck leaving other resources some serious slack.
In both approaches this is indeed a valid concern.
The worst thing that can happen is if the bottleneck causes the rest of the links to adjust their pace, and worse than that their “pace memory”. I’ve been thinking about this for a while and am also asked about this quite frequently whenever I introduce kanban to Managers with some experience…
In an old but wise article about the “Four Roles of Agile Management” David Anderson refers to it as “The team can forget how to run fast – when there is a bottleneck/drum”. I recently had a twitter chat with David on the subject. David sees this problem as an optimization problem for high maturity organizations. Based on the discussions I’m having, I think the optimization might indeed by a high maturity tweak, but even the concern about this happening is a roadblock to accepting the concept of Kanban Limited WIP or Scrum Whole Team Commitment.
I think we need to have a better answer for this question as part of the kanban “sales pitch”. At least I need it…
So what do we say? Based on the discussion with David and some more thoughts some of them fueled by recently reading The Principles of Product Development Flow: Second Generation Lean Product Development, I would recommend the following.
Basically, what we want to do is solve a conflict. On one hand we don’t want to create inventory and increase the gap between faster stations and the bottleneck as we know that creates slow feedback, lower quality, and we will have to close the gap at some point. On the other hand we don’t want to slow down the other stations as we know its both lost capacity, as well as can lead to lower capabilities over time if they “forget how to run fast”.
How can we solve the conflict? By looking at the assumption that the other stations always work on flows that must involve the bottleneck. Can we break this assumption? YES we can…
There might be work types that don’t need to go thru the bottleneck. Not all work is created equal. For example, if Server guys are the bottleneck, choose work that is not as Server-Heavy. If Testing is the bottleneck, choose work that is not Testing-heavy, or even items that can be tested without the involvement of the testing bottleneck.
Now the purists will say that the priority always needs to be the business priority. But now we’re pulling and prioritizing work based on our capabilities. Yes we are. Prioritizing purely based on the business priority will lead to lower business outcome overall. Our aim is throughput of business value. We achieve that through the right mix of Business Priority and right exploitation of our resources.
Having said that, if we see that we keep skipping priorities due to our capabilities, its time to go to the next step. Create a work item / class of service that serves to realign the business needs and the factory/machine capabilities. For example, in the world of testing this can be test automation done by developers in case testing is a bottleneck. If Server are the bottleneck, we can define a backlog of items that reduce the workload on Server (e.g. Refactoring and returning Technical Debt), or cross-train UI people to gain Server capabilities.
There are more ways to do this, but the bottom line is to always have items in the backlog that the non-bottlenecks can pull and run as fast as they can on. Preferably some of them are aimed at helping balance the line, driven by a process of ongoing improvement.
Since I started talking about this with Management, I see much more traction for the various ways to limit WIP, whether Scrum Sprint Commitment or the more explicit Kanban WIP Limit. I think the idea of “Too much slack” is currently a truth the mainstream is simply not ready for. Beyond that, I think its not fair to ask people/teams to solve this conflict on their own. Help them by discussing the various ways to address the problem, by helping them create backlogs of improvement ideas they should pull in those situations, and by setting the right classes of service / work types that create the alternative routes around the bottlenecks. I think this IS a management role in an agile environment.
PS None of this is really new. The innovation is in setting up the right classes of service and the right risk profiling to effectively manage the line. Elements like choosing items with low cost of delay so they can be used to “Fill Slack” and not pulled as part of the normal priorities. And risk profiling so we re-route or skip the bottleneck on the items where the risk of doing that is minimal. Add to that measuring local cycle times (e.g. with tools like LeanKitKanban ) so each Capability can focus on its own performance as well as the overall cycle time, and you get quite an elaborate system. Sounds advanced? It is. I intend to cover this in Advanced Kanban Workshops we will be starting to run, since we know have quite a community of Kanban Practitioners around Israel. Hopefully we can extend that community to the region soon.
Hey, great article. Nice to know I wasn’t alone in thinking that we our productivity may be affected by the LCD.
I like your arguments and I think you’ve put forward some good suggestions on how to mitigate the effects of this.
I’m thinking about moving to a split team structure and thereby bypassing the whole team commitment, since each team will have it’s own commitment, sort of scrum type C. This will hopefully raise the level of the LCD, since each team will have a slight specialisation, with resources split to match their strengths. Be good to hear your thoughts.
Usually splitting teams is a good idea, but bottom line it depends on the context – and mainly whether the teams can independently deliver.
I would experiment with such teams and see if the bottlenecks disappear, and also consider experimenting with ATDD, stronger definition of done, and the other suggestions mentioned in the post.
Splitting the teams usually helps with self-organization as well as team energies and motivation. This can reduce the effect of LCD, but I wouldn’t count on it solving your problem completely.
In addition, if the teams can stay persistent then it will increase their ability to overcome LCD. If they keep changing, it will be a challenge, unless you are able to achieve high bonding among the whole group.
A bigger team that splits into small dynamic feature teams is a pattern that seems to work when you get to high enough maturity.
Another approach you can try is measuring the local cycle time across the LCD for each work item, and in your retrospective/operational review/whenever a work item finishes, look at its performance compared to your baseline (we typically use a run chart / control chart for this). If its significantly faster, consider it a bright spot that you should learn from. If its significantly slower, try to understand what happened, and collect the cause. with time, typical causes will emerge, and you can think what to do about them. You will have a drill down on how specifically LCD affects you, which makes it more actionable…
Hope this helps, will be interested to hear how it all comes out.