Last Wednesday I got a full scale indoctrination into the agile software development methodology called Kanban, loosly based on the Toyota Production System (TPS) mechanism with the same name. Toyota uses the kanban as a mechanism to allow for just the right amount of parts to be ordered and to be delivered just in time (JIT) in order to avoid overproduction and waste in the production line. Kanban Software Development Methodology (KSDM) brings the same lean ideas to a development team.
The Kanban track at QCon, the SD conference in San Francisco, had a line of speakers coordinated to give a thorough explanation: David Anderson, Jeff Patton, Henrik Kniberg, Chris Shinkle, and David Laribee. What I found impressed me as an approach that may answer some of the problems I have been having in moving teams to agile development.
My first thought was that it should not be called “Kanban” since there is no kanban card: neither physical nor virtual nor electronic. Instead the focus is on “leveling” (also known in TPS as “heijunka”) which is a way of keeping the amount of work in each category steady. Kanban is the mechanism in TPS for achieving this leveling, but in KSDM achieves this leveling without a Kanban token itself. I won’t dwell any more on the name, because it is in widespread use, and the ideas attached to the name as so important.
Being new to the topic, I refer you right away to resources with a more thorough treatment of the subject:
- LimitedWIPSociety – a focal point for this philosophy. “WIP” means “Work In Progress”.
- The kanbandev group at Yahoo has discussions on these topics.
- Kanban Applied to Software Development: from Agile to Lean – a good article from Kenji Hiranabe at InfoQ that lays things out in good detail.
- The Kanban Primer: A Cultural Evolution in Software – a good overview article from David Anderson
- Taiichi Ohno Reinterpreted – my post from a few weeks ago visiting the fundamentals of Toyota Production System
My goal will be to simply cover my impression of what is different or unique about this approach to agile software development — there is a lot more to it than I can cover here.
The key to any Lean (with a capital L) method is to eliminate waste. In software, work that is partially complete is waste. This partially complete work is known as “Work In Progress” or WIP. There is a necessity to have a certain amount of WIP, but the point is to minimize that to the degree possible. You minimize WIP, and as a result that work gets accomplished quickly. It only stands to reason: with a fixed amount of workers, reducing the number of things being worked on will allow them to spend more time on those fewer things, and get them to a completed state more quickly.
Let me emphasize how important this is. Development projects that go for 8 months with no visible results are dangerous on many levels. By taking such a big bite of work at once, you commit the entire department to a direction without knowing whether it is going to work. There is no way to gauge the progress of the team, except in vague status measures which can be manipulated by incompetent players to hide reality. KSDM is about nibbling away at the work. Each small unit is completed and made customer ready BEFORE the next unit is started. In my reading of Taiichi Ohno, this production in small continuous batches, instead of huge batches, is the essential ingredient of Just In Time manufacturing.
The focus is on “flow” of development activities. Focus on flow does two thing: first it obviously help to continuously get things completed. The other thing is that it helps to expose trouble. Whenever there is a disruption in the flow, when the flow does not work quite right, it is an indication of a problem that needs addressing. If works starts to pile up at one step, this is an indication that you should focus effort on that step and find out what is not working well.
How it Works
This seems in many ways contrary to the punctuated approach emphasized by Scrum and other strict iterative development. This apparent difference is an illusion caused simply by scale. KSDM allows for finer grained control. Scrum is a fairly course grained approach, e.g. all work gets done in a three week cycle. If you look at the larger scale, over the year, Scrum is providing a steady flow of features into the product. The 3 week iteration is a mechanism to assure that there is not a huge huge backlog of incomplete work. All the work, of the complete development cycle must be started and finished in that one sprint, which can be difficult if you have people that specialize in certain phases of development.
KSDM takes this control to a different level. You break the work process in a series different activities (phases). You then set a limit of how many job units you will allow at any phase. A simple rule of thumb: you can have a few more work units as you have people doing that job, so that each has one thing to work on at a time, and a small cache of completed jobs. The people in a given phase will do their work on a job unit to completion, so that it is ready for the next phase of work.
This is where the somewhat brilliant key idea behind KSDM comes to play. The completed job does NOT free the person for working on another job, until that job is pulled into the following phase and work is started there. If work is piling up at a particular phase, those people are NOT ALLOWED to work ahead. As Taiichi Ohno makes so clear, that working ahead is waste. Instead of working ahead, they can look around to see what is wrong.
“I feel a disturbance in the flow…” – Obi Wan Kenobi on software development
To put this in concrete terms, consider a process which involves (1) detailed design, (2) coding, (3) testing, and (4) documenting. Each of these stages you place a limit on the number of jobs, and for the sake of example lets say that limit is 4. Say for example that the coders have finished coding on their four job units, and are ready to take a new one. But the testing is backed up for some reason, still working on the last four job units, and are not ready to take a new job. The developers are not allowed to pull in a fifth job unit. There is no point in coding up more features when the earlier features are not getting tested or documented. It is also possible that because the developers are not pulling jobs from design, that the design phase becomes filled up with completed tasks. When work backs up in this way, one should go and figure out why testing is stuck. Maybe the real problem is that the documentation is blocking test. Whatever it is, the primary job of the entire team is to identify the problem with the flow, and fix it. Do not simply work ahead accumulating a huge pile of work for “someone else”. Instead, focus on the big goal, which is to get features completed to a customer ready state as quickly as possible.
This idea of setting limits on the number of discrete jobs that can exist at any given phase at a time is so simple, and yet so profound, that I want to format these paragraph all in bold italics. It allows people to specialize into different role, to focus on their particular work, while the mechanism assures that the primary goal is not being lost. Some teams can take a feature (a story actually, a small separable part of a feature) from design to completely implemented, tested and document in only four days.
Kanban works in a car factory because the time to complete a particular job is well known. A Corolla comes off the line every 97 seconds. The factory is set up to produce a car’s worth of parts every 97 seconds as well. The parts are highly repeatable, so if it takes more than 97 seconds to build four doors, then you need either multiple steps, or parallelism, whatever makes the most sense. Once you have figured out exactly how long it takes to build a door it can be done again and again. Software is not so repeatable. Every software job is different and unique. How does that work with Kanban? The interesting thing: it does not turn out to matter. You break features into stories that are approximately the same size, but if one story takes three times the effort as another story, no problem. With software the job does not need to be timed to fit into 97 second slots. We can be flexible in the time dimension: one story takes one day, and another takes three days. What is important is that that story, no matter how long it takes, is completed before another is pulled in by that resource. With this understanding, my biggest concern was eliminated.
Clearly the method requires that features be broken down into fine grained stories. If you did not do this, you run the risk of “starving the line”. That is, one phase is filled with long running jobs, and the following phase has completed all their work. One area to explore is when the feature designer claims that the feature can not be broken down into small stories. There are surely techniques to address this, including job classification and prioritization, minimum marketable features, and other details which I can’t cover in this post.
Finally, it appears to me that a teams with a long practice of waterfall development might be able to evolve into this method. There is no big disruptive change. Just make the process visible, and limit the amount of work that is in progress at any given time. This allows Taiichi Ohno’s concept of “autonomism” to take over, and team can self organize to become more efficient. This make a lot of sense.