Category Archives: Lean Product Development Flow

Waste Deep in Being Done – Or…Why it’s Shorter to Take Longer – Guest Post by Benjamin Spector

Estimated Reading Time: 5 minutes

Introduction

I met Benjamin Spector in one of the recent Agile Boston meetings. He told me a story that I liked a lot since it brought to life one of the key concepts I presented – The Transaction Cost vs Cost Of Delay curve (from Principles of Product Development Flow by Reinertsen). I was able to persuade Benjamin to write up this story…

Waste Deep in Being Done – Or…Why it’s Shorter to Take Longer 

“We should have finished a month ago.”  That was the item with the most votes during the team’s last sprint retrospective.  This last sprint completed the final production-ready feature the team had been working on.  It was delivered just in time for the scheduled annual major product release.  Everyone was decompressing from having faced the possibility of delivery failure.  But even as we celebrated our success, there was a sense of disappointment that we had taken as long as we did.

I had been the team’s scrum master for about 6 months, starting up with them as this latest project began.  The team was small with 4 developers (3 full-time software engineers and 1 QA specialist) plus a product owner and a scrum master…me.  When I started working with them, the team was practicing scrum at a beginner-level.  My initial focus was mainly on getting the team to perform basic agile practices more effectively for estimating and planning, as well as for daily standups, sprint reviews and retrospectives.

Over the course of the project the team was dealing with regression failures that came back to them 1-2 weeks, or longer after their original code submissions.  The problem was not terribly taxing on the team in the early stage of the project.  We’d get 1 or 2 regression failure bug tickets and just included them in the next sprint backlog to fix them.  Sometimes we didn’t get to fixing a regression failure until 2 sprints down the road.  It didn’t seem like any harm to kick it to another future sprint.  It was tolerable…or so it seemed.

The team’s practice was to submit production-ready code after running a suite of “smoke test” regression tests.  The product itself was a complex CAD tool with over a million lines of code and up to 25,000 separate automated regression tests.  Running the full suite of tests was an overnight process.  Whereas, running a smaller subset of selected regression tests that focused mainly on the part of the code base the team worked in was a common practice among all our scrum teams.  It allowed for quicker turnaround. In general, it was felt that running the regression “smoke test” suite enabled everyone to deliver quicker at a relatively low risk to the product quality.  If a couple of regression failures slipped through the net, no one thought it was a big deal.  In fact, this practice was explicitly called out as part of my team’s definition of done for user stories.

But, the frequency of regression failures began to increase.  As we got closer to the release deadline, there were more regression bugs to fix, and the time spent fixing them consumed a greater portion of the team’s capacity.   As the scrum master, this issue did not go unnoticed by me.  I wrestled with the question of when it would be the best time to raise it to the team.  We were within striking distance of our goal and the team was focused on finishing the project and complying with all the acceptance criteria for the project.  Significantly, one of those criteria was delivery with zero regression failures.

About a week before we finished the project, I began reading Jeff Sutherland’s latest book “Scrum: The Art of Doing Twice the Work in Half the Time.”  I came to the chapter called Waste is a Crime, and a section called Do It Right the First Time, and the words leapt out the pages.  Sutherland gives a powerful example with Palm Inc.’s research on the amount of time taken to fix bugs not found and fixed right away (page 99).  The research showed that it took 24 times longer to fix a bug a week or more after code submission, than if it was found and fixed at the time the developer was actively working on the code, regardless of the size or complexity of the defect.  24 times!

So there we were at the end of the project, with everyone experiencing the elation and relief of a successful delivery mixed with a sense of disappointment that we did not finish as quickly or as cleanly as we had expected.  “We should have finished a month ago.” Why didn’t we?

It was at this moment that I jumped up and said, “hang on just a second.  I’ve got to get something at my desk that will be really interesting for everyone to hear.”  I bolted out of the conference room, ran to my desk grabbing the Sutherland book, and returned slightly breathless with the page opened at the passage describing the Palm Inc. research about bug fixing taking 24 times longer.  The gist of the Palm Inc. story was about one and half pages long, so I asked the team’s permission to read it aloud before we continued with our retrospective discussion.  Everyone agreed with some amusement and curiosity about what I was up to.  When I finished reading the passage, I could see the impact in the eyes of every team member.  Each member of the team began looking at each other recognizing this shared insight.  That’s the moment when I knew I had their attention.

I put the question to the team, “How many regression bugs have we fixed since we started this project?”  The answer was 35.  I had already sized the problem when I began monitoring it closely over the last 3 sprints.  I quickly showed them my query in the Jira database, displaying the search criteria and the tally of regression bugs on the conference room overhead projector.  Everyone agreed that it was a valid.

Then I asked, “On average, how long does it take us to fix a regression bug?”  We started opening up individual records so we could see our tasked-out estimates for each one.  Examples ranged from 8 hours to 16 hours typically including tasks for analysis, coding, code review, adjustment of some of the tests themselves to accommodate the new functionality, submission for regression testing and final validation.  Some took a little more time.  Some took a little less.  After a few minutes of review, the team settled on the figure of 12 hours or work per regression bug.  So, I did the simple arithmetic on the white board: 35 x 12 = 420 hours.  Then I applied the “24 times” analysis: 420 / 24 = 17.5.  I said, “If the rule holds true, then if we had fixed the regression bugs at the time they were created, in theory it would only have taken only 17.5 hours to fix them, not 420 hours.”  Then I doubled the number just to make everyone a little less skeptical.  35 hours seemed more reasonable to everyone.  Nevertheless, it was still a jaw-dropping figure when compared with 420 hours.  While I stood pen in hand at the white board, everyone on the team sat in stunned silence.  While they were absorbing the impact of this new insight, I took to the whiteboard again and wrote down 420 – 35 = 385 hours.  Then I reminded them of our sprint planning assumptions. “Based on our team’s capacity planning assumptions, we plan for 5 hours per day per person for work time dedicated to our project.  For the 4 of you that equals 100 hours per week of work capacity.”  I completed the simple arithmetic on the white board showing 385 / 100 = 3.85 weeks, underlining and circling the 3.85 weeks.  Then I pointed back to the retrospective item with the most votes, I said, “There’s your lost month.”

When our retrospective ended we left the meeting with a significant adjustment to our team’s definition of done.  We replaced the “smoke test” regression testing requirement with the practice of always running the full regression test suite on the code submitted for a story and resolving all regression failures before considering the story done.  This change was made with the enthusiastic and universal agreement of every team member.  Everyone recognized that it would take longer to finish each story up front.  But, they were happy to accommodate the extra time because now we all knew, without even the slightest doubt, that even though it would take longer to finish the story the first time, it was always going to take a lot less time than having to go back and really finish it later.

About Benjamin

Benjamin Spector

Benjamin Spector

Benjamin Spector has worked as a software product development professional and project manager for over 20 years.  For 4 of his last 9 years at Autodesk, Inc., he has worked as a scrum master for several teams and as a full-time agile coach introducing and supporting agile practices throughout the organization. Reach out to him on Linkedin

 

Don Reinertsen’s Cost of Delay Intuition Exercise – a facilitator’s guide

Estimated Reading Time: 3 minutes

Introduction

Don Reinertsen frequently talks about how surprised teams are when they first try to quantify the cost of delay on one of their projects/programs and an exercise he runs to help surface the wide range of intuitive estimates of the cost of delay amongst the team and how a little bit of analysis can help get a much better figure. Inspired by my “Cost of Delay”-themed last week (when I had a hefty amount of Don time spiced up by meeting Joshua Arnold face to face and spending some time with him) I decided to bring the Cost of Delay topic more explicitly into a management thinking workshop I had with a HW/SW Product Development company this week. When I tweeted about the experience it got a couple of retweets as well as a request for more details.

I couldn’t find a canonical description of the exercise so here is a short facilitation guide below. I hope this helps. And again, to be clear, this is just going through what I learned from Donald Reinertsen in the Lean Product Development 2nd Generation workshop last week.

Running the Cost of Delay Intuition Exercise

After spending a bit of time explaining the need and potential for thinking about product development as a flow-based design factory I explained the need to quantify the Cost of Delay in order to be able to effectively weigh the tradeoff of things like Holding Cost and Transaction cost when deciding upon Batch Size. I mentioned Don’s results when asking teams what would be the Cost of a 1-month Delay on one of their programs and asked them if they want to try that exercise themselves. They were quite enthusiastic to try.

We chose a relatively big project in their portfolio. We had the finance people in the room so it was easy to get the projected total life-cycle profit for that project which was
around 16M$. I then made sure we all understand the scenario – the product will be delivered and available for the customers to integrate one month later than originally planned for.I DIDN’T hold any discussion with them about profit curves over time etc. since I wanted them to think fresh of their own assumptions.

I then asked each participant to think what the impact would be on the total life-cycle profit and write the answer to himself without sharing or consulting with his peers. We then collected the results by doing a round-robin poll of the group. The range was 0-16M$ (How’s that for a wide range???) with most answers ranging between 15

0K and 2M$. So even ruling out the 0 the range was 1-100. At that point we asked some of the extremes to explain their assumptions and way of thinking and I drew some accumulated cost of delay  curves to help picture the different alternatives.

After a bit of discussion it became clear that the 16M$ was based on a “lose the opportunity window entirely” which can happen if you’re aiming for a big design win without any other market opportunities for that product (Think aiming to deliver the iPhone display and delivering late… ). This came from one of the senior guys in the room. The other extreme of zero impact basically assumed that the sales/evaluations ramp-up was slow anyhow and they could make do with hand-waving and expectations management to avoid losing any deals due to the delay. The finance and product management people estimated the number to be around 1-2M$.

Summary

That was it. The exercise took around 15-30 minutes I believe and was well worth the time. In the retrospective at the end of the two days Cost of Delay was mentioned as one of the A-Ha moments. I found that having this discussion was also a great way for me to gain insight into their ecosystem and way of working. It surfaced issues around customer expectations and interfaces very quickly.

I’m glad I used this exercise and will probably use it often moving forward.