Over many years, “DevOps” practitioners applied Theory Of Constraints to our problems, ruthlessly optimizing our delivery pipelines and practices. Manual release management? Hell no, automate that. Deployment? Automate that too. Image management? 🔨 No thanks. Rolling back after we trebuchet a flaming dumpster into production? Automated. Whatever low value activity we could find in the process of getting code from product backlog to customer hands was a bottleneck to be removed or optimized.
The end result of this was that the slowest part of software delivery is testing. Since testing is why continuous delivery exists, that should have been good enough. Yes, we can make our tests faster, more automated, parallelized, etc. But when the highest value activity of a given practice is the bottleneck, you’re optimal. You have achieved “the best possible problem”.
Those habits and behaviors of optimization didn’t stop there. We kept on chopping. 🪓 We squished our integration and end to end tests down to unit tests to parallelize. At the personnel level, we pushed out anybody whom we believed could not code, indiscriminately, at the function level. We decided that testing might not be the bottleneck…the QA team was. The industry began to treat the people in these roles worse and worse. Expectations for them went up, salaries went down(while everyone else’s seemed to be going up!), we contracted the role, we offshored it, pretty much anything we could do to try and stop employing QA Engineers.
This created a self-reinforcing spiral, in which anyone “good enough at coding” or fed up with being treated poorly would leave QA. Similarly, others would assume anyone in QA wasn’t “good enough” to exit the discipline. No one recommends the field to new grads. Eventually, the whole thing seemed like it wasn’t worth it any more. Our divorce with QA was a cold one — companies just said “we’re no longer going to have that function, figure it out.” It was incredibly disrespectful and demoralizing to folks who had spent their career in QA. It also caused a lot of problems, because producing low quality software is actually a huge headache.
You can probably see where this is going by now: developers did not figure this out. Most orgs have no idea who should be doing what in terms of software quality. Those who have kept the function are struggling to find a place for it, because of the damage already done to the discipline.
It turns out, the “One Weird Trick” to faster software delivery was not “fire your testers”.
Wrecking this discipline was one of the worst kind of management mistakes — a choice to destroy something that took decades to develop, and one where the impact might not be felt for years. By the time your org has felt it, you’re likely years away from a meaningful fix.
The parts of the broken QA role we were handed are all still broken, and on the metaphorical workbench. The division of labor simply did not happen. Unsurprisingly, developers did not readily assume the duties of the role without any additional compensation, recognition, or reward. Those of us who can still remember working with a high functioning QA team can impersonate some of those behaviors, but newer engineers and managers have no idea what any of that was about, and aren’t able to tell what’s missing. The guiding principle in the following advice is that quality assurance is work. Just like any other work, assuming “somebody” will do it, letting it be invisible, is a recipe for failure. Denial is not a strategy.
It’s 2023 and it feels silly to have to write this down, but my experience suggests that I absolutely must.
Here is the work to be done in order to manage quality in your software practice:
Defect Tracking: there needs to be a way for your users to send you information about a bug, and for your developers to log a bug. What is a bug? A bug is an individual ticket that describes what’s wrong and how bad it is. It doesn’t describe the work to fix it, only the defect itself, how to reproduce it, and its impact. In recent years I have been surprised to find that most dev teams I work with simply do not track these. These teams have an ocean of excuses: “We won’t ever fix it.” “That’s not my job.” “I don’t want to fix anything, I get rewarded for new features.” None of these is good enough to justify the low quality results that are produced by this approach.
Triage: Bug triage is the process by which your engineering organization assigns, prioritizes, cleans up, categorizes, deduplicates, and otherwise cares for the bugs coming in to your organization. Having a consistent standard for what a high/medium/low severity bug looks like will help your org in a number of ways. We used to call this “bug hygiene”. Similarly, just the task of deciding what team this bug belongs to is work. A high functioning organization can do things like: degrade quality gracefully in the presence of layoffs, hand off a category of bugs to a new team in the event of a reorganization, or jettison all bugs for a feature that’s been cut.
Defect Investigation: Reproduction, or “repro”, is a critical part of managing bugs. In order to expedite fixes, somebody has to do the legwork to translate “I tried to buy a movie ticket and it didn’t work” into “character encoding issues broke the purchase flow for a customer with a non-English character in their name”. Similarly, the questions of “how many times does this happen?” and “what is the user impact?” need a real human to spend a minute answering them.
Focus: There is real value in having people at your company whose focus is on the quality of your end product. Quality might be “everybody’s job”…but it should also be “somebody’s job”. The push/pull of Quality vs Velocity needs an advocate for quality in the discussion — and that dynamic is vital to producing better results. Your testing tools, your test quality, test plans… all of these need an opinionated party to argue in favor of doing the best possible job.
End to End Testing: One of the biggest, most common problems I see in the engineering orgs I work with is ownership of the system. Increases in architectural complexity have been done to try and keep teams and applications small. That’s a perfectly rational strategy, but it leaves a gap around…literally the most important thing about your application. In my experience, the average team no longer does this, because it’s too hard.
It’s easy to look at that list and say “but we’re agile, lean, dynamic, we don’t need to do these things! We’ve moved past this.” But I think that if you look harder, what you’ll find is that this work is probably happening in your organization already — poorly. And imagine if literally any other field tried to make that claim to you. Your car, your bank, your doctor…”we don’t do quality assurance” is just not a great thing to hear or say. Failure to recognize and organize these activities will lead to a really dreadful situation. Tell me if this sounds familiar:
The most conscientious employees in your organization are the most bitter. They see the quality issues, they often address them, and they get no recognition for doing so. When they speak up about quality concerns, they get treated like mouthbreathers who want to slow down. They watch the “move fast and break things” crowd get rewarded time after time, while they run around angrily cleaning up their messes. To these folks, it feels like giving a damn is a huge career liability in your organization. Because it is.