The team with king-sized technical debt
An organizational reorg moved my entire team to a new product, X. The product X, plagued with confounding features and architectural atrocities, confused users and developers equally. Simultaneously, with its unpredictable shipping cycles and abysmal dev experiences, X’s engineering system made shipping anything a pain.
I don’t doubt that product X broke all records for engineering debt; everything that could be wrong was not only wrong; it was always worse than you could have imagined!
A vicious self-reinforcing loop
One of my team’s responsibilities was driving the monthly product releases, which required a painful and error-prone process to corral other team’s features. Like clockwork, partner teams surreptitiously snuck in half-baked features bugs to meet release dates.
These secretive and hasty bypasses compromised quality, and my team bore the brunt of investigating, triaging and routing bugs.
- Round-up phase: Release drivers grumbled, mumbled and stumbled in misery – The tedious process could be likened to getting ten toddlers to stay still for ten minutes.
- Release phase: The only predictable guarantee was that the date would slip. Murphy’s law ruled here.
- React phase: A postmortem showed that peer teams had caused 11 out of 13 regressions.
The toilsome round-up->release->react cycle
Breaking the loop via Engineering Fixes
The solution was a two-pronged approach to address the leverage points in the cycle.
Systems thinking is one of the most versatile techniques for identifying leverage points for breaking out of unwanted loops. It enables practitioners to design effective strategies and predict higher-order effects; thereby avoiding undesirable outcomes.
Release + React phases
Improve quality by shifting left and minimizing reactive work. This meant empowering all feature teams with the ability to run a battery of tests before merging changes. This system moved quality assurance (QA) in-house versus the org’s original reliance on paying customers to fulfil QA roles.
The solution was to set up a custom testing system to pull the latest artefacts and changes from partner repositories. Thereafter, a job would package all the changes (taking dependencies into consideration), provision a brand new environment and install the latest changes on that test environment. The full suite of release tests ran against this test environment, and results were immediately published back to establish a feedback loop.
Also allowed developers to run this pipeline locally to generate the zip file and install it against their target machines to simplify the local development loop.
No one enjoyed the herding cats role, and the fix was to automate the dreary, monotonous tasks. This process suffered the same pitfalls as the earlier stages: it was error-prone since it required a heavy human touch (e.g. determining package version numbers, pulling in artefacts from multiple sources and running package generation).
The great thing was that this benefited immensely from the Release + React phase; it was trivial to repurpose the same pipeline for the production releases.
Overcoming adoption hurdles
Getting org-wide adoption was more challenging than expected. The root cause of the release angst was misaligned incentives – peer teams wanted to get features out at all costs while my team needed to make sure releases were stable.
This scenario illustrates the Pig and Chicken fable – we were fully committed to release quality while partner teams were only involved in the release process. If things went wrong, the onus lay on my team.
The Fable of the Pig and Chicken
The Pig and Chicken are travelling and see a diner with a sign saying “Ham and Eggs”. The Chicken nudges the Pig and cackles in delight: “Look! Look Pig!! We’re famous!!!”. Whereupon, the Pig looks at the same sign and snorts derisively: you’re involved! I am committed!!Tweet
As much as we love to argue that humans are rational, humans are driven mainly by emotions. There were two major factors:
- Organizational Inertia to change: The organization had always done things that way. Any new concept had to overcome that barrier to gain a foothold in the minds of people. This reluctance to change is usual; here is an excellent quote from the marvelous Peopleware book: “People hate change . . . and that’s because people hate change. . . . I want to be sure that you get my point. People really hate change. They really, really do.” —Steve McMenamin Principal, The Atlantic Systems Guild.” The fix was to show the orders-of-magnitude improvement that the new testing infrastructure brought.
- Reluctance to take on ancillary work: The new engineering system meant more work for peer teams; they could no longer chuck their half-finished work over the wall and expect another team to handle the last-mile delivery. I appealed to each team’s self-interest by revealing the unbearable costs of constantly ping-ponging defect ownership. Everyone was paying a high price, and the new model moved us to a more efficient engineering model.
Outcome of Efforts
This was a high-leverage activity that saved the entire organization countless hours of manpower. Estimated productivity boosts were on the magnitude of saving the whole organization three developers every month.
Furthermore, my team shifted from perpetually fighting fires to working on more strategic business-level objectives.
This was how I coined the slogan: happy customers, happy engineers.