Interview For years the Adobe Photoshop team has been trying to get away from the traditional death march to a more agile development style. For its CS3 release, it made the jump, with the help of VP Dave Story. The result? More weekends off, and a third fewer bugs to fix. Mary Branscombe quizzed co-architect Russell Williams on how they did it.
If it's such a good idea, why did it take so long – and how did you manage to change this time?
We had been trying to make the change for a couple of versions but hadn't really been able to make it stick. Nobody on the team had experience using an incremental development process, and we kept sliding back into our old ways because of our own habits and pressures from other groups at Adobe who wanted to see an early "feature complete" milestone. We'd always successfully delivered using the old method, so when things got difficult, we'd revert to things we knew would work.
The difference Dave Story [Adobe's vice president of Digital Imaging Product Development – Ed] made was he had successfully managed incremental development before at SGI and Intuit, and he had the commitment and experience to work through objections - from within or without.
And what actually changed about the way you're developing CS3?
The change we made was going from a traditional waterfall method to an incremental development model. Before, we would specify features up front, work on features until a "feature complete" date, and then (supposedly) revise the features based on alpha and beta testing and fix bugs. But we were scrambling so hard to get all the committed features in by the feature complete date - working nights and weekends up to the deadline - that the program was always very buggy at that point. We'd be desperately finding and fixing bugs, with little time to revise features based on tester feedback.
At the end of every cycle, we faced a huge "bugalanch" that required us to work many nights and weekends again. Of the three variables: features, schedule, and quality, the company sets the schedule and it's only slightly negotiable. Until feature complete, we could adjust the feature knob. But when we hit that milestone, quality sucked and we had only a fixed amount of time until the end. From there to the end, cutting features was not an option and all we could do was trade off our quality of life to get the quality of the product to the level we wanted by the ship date. We've never sacrificed product quality to get the product out the door, but we've sacrificed our home lives.
We couldn't cut features to meet the schedule because by the time we realize we're in trouble, the features have been integrated and now have interactions with other features, and trying to pull them out would introduce more bugs.
Probably the most effective thing we did was institute per-engineer bug limits: if any engineer's bug count passes 20, they have to stop working on features and fix bugs instead. The basic idea is that we keep the bug count low as we go so that we can send out usable versions to alpha testers earlier in the cycle and we don't have the bugalanch at the end.
The goal is to always have the product in a state where we could say "pencils down. You have x weeks to fix the remaining bugs and ship it". The milestones are primarily measurement points, with the acceptance metric being quality instead of something like "all features in" or "UI Frozen". We can keep adding or refining features until later in the cycle, and we can cut features if things are running behind (OK, when things are running behind).
Features are developed in separate, private copies of the source, and only merged into the main product when QE [Quality Engineering - Ed] has signed off on the quality level. Since each of those "sandboxes" has only one major feature under development at a time that differs from the main copy of the source, it's practical to send copies of the sandbox version out to testers to test that specific feature -- the rest of the program isn't too buggy to use. So the new feature gets reasonably tested and refined before being put into the main copy of the source. That keeps the main code base from becoming a mess of buggy and incomplete new features.
Making life easier for the developers matters – but has it been good for the code too?
The quality of the program was higher throughout the development cycle, and there have been fewer total bugs. Instead of the bug count climbing towards a (frighteningly high) peak at "feature complete", it stayed at a much lower plateau throughout development. And we were able to incorporate more feedback from outside testers because we didn't switch into "frantic bug fix mode" so early.
Did it change the way you put out betas?
An automatic process builds the program every night and runs a set of tests before posting the build on our internal servers for QE to test. We could take almost any of those daily builds and use them for demos.
The public beta was basically just "whatever build is ready on date X". There were only a couple of "we really gotta fix this before we send out the public beta" bugs. With past versions, we couldn't have done a public beta at all that far ahead of release - there would have been far too many bugs.
We weren't swamped with a pile of bugs from the hundreds of thousands of people who downloaded - it really was in the good shape we thought it was. With several hundred thousand downloads, there were fewer than 25 new bugs found.
Overall, did you end up with fewer bugs, more bugs, the same number of bugs fixed faster? Did you have to sacrifice features to work this way?
Some people feared this would mean fewer features. That hasn't been the case. We certainly had far fewer bugs overall and fewer during mid-cycle (about a third less in total last time I checked). Better quality, plenty of features, fewer nights and weekends: what's not to like? ®