There has been quite a bit of news lately about companies doing large-scale product recalls, particularly for stuff made in China. By now every marketeer has written his opinion on the expense of such recalls and the impact on the brand names involved.
But what I find interesting is not that QA failures happen, but that manufacturing companies do recall and even replace their products at great expense. This is in stark contrast with the software industry, where bug-riddled products seem to be the norm rather than the exception.
It doesn’t have to be that way. A defective software product doesn’t require a recall in the strict sense of the word. There’s no pile of inventory to take back. All that needs to be done is to fix the bugs, and make the new version or a patch available for download. Relative to a tangible product recall, the cost of issuing bug fixes is nil. Sure, time developers spend fixing bugs is time they don’t spend adding new bugs (under the guise of features) to the software. But that time should have been spent before the product was released. Remember that bugs are a business decision.
For typical consumer and business software, where bugs are annoying and perhaps costly to the user, but hardly life-threatening, the business decision will usually favor an early release to bring in money (unreleased products sell even less than bad products) and beat competitors to market. This doesn’t even have to be a bad thing, if each release is followed up with timely updates or service packs. Then customers can choose if they want to get version 3.0.0 hot off the server, or if they’ll wait for 3.1.0 to fix most of the bugs that plagued the initial release.
But this was going to be an article about recalls. Sometimes, a product or an update is released with a severe bug that makes it unusable to a significant portion of is user base. It doesn’t have to be a total crash or data loss bug. E.g. the initial release of Delphi 2007 had a few new bugs in its Forms.pas unit, resulting in applications with improper taskbar button behavior. While that didn’t stop those applications for working, it made Delphi 2007 unsuitable for the production of “just great software”. But instead of swiftly releasing an update, and an apology, CodeGear dodged the issue with some online articles for changes users could make to Forms.pas (not an easy proposition for newbie programmers) and an official update months later (too late to earn back goodwill).
But it doesn’t have to be that way. About a month ago, I released EditPad Pro 6.3.1. It was a minor update with only fixes and small improvements, so QA was limited to testing the changes and normal everyday use. I didn’t notice any problems, and there were no reports from the early downloaders of the new release. So a few days later the monthly Just Great Software newsletter announced the update to all our customers. To reduce the load on our server, it takes a few days to email everybody.
After the whole list had gone out, I got two support requests claiming that EditPad Pro 6.3.1 locked up completely, seemingly at random. I did a few quick tests, but I could reproduce anything, and the list of changes for 6.3.1 didn’t include anything that could cause lockups. It was getting late, so I called it a day. The next morning it hit me: I had made a change to some threading code inside the editor core for an upcoming release of RegexBuddy (which uses the same editor control as EditPad). This change didn’t have any user-perceptible impact on EditPad, so I hadn’t added it to EditPad’s changelog. Stress testing that code in EditPad quickly exposed the problem. Fixing it was tricky and took a few hours, as I didn’t want to roll back the improvements RegexBuddy needed, and I obviously really wanted to get things right second time around.
And then comes the recall. The bug only occurred in certain situations, and certainly wouldn’t affect everybody. But for those affected, it made EditPad unusable. And it must have been a significant slice of the user base, even though in the end only three people reported it before I fixed it. So EditPad Pro 6.3.2 was published right away. And then I did something that really qualifies this release as a recall rather than an update that’s simply put out there, waiting for customers to be found.
Since our download system requires the user to enter their user ID and email address to generate a licensed copy, we also know exactly who downloads their licensed copy of our software exactly when. That makes it trivial to compile a list of email addresses of customers who downloaded version 6.3.1. It turned out to be a pretty long list, as the newsletter had already gone out. I wrote an email explaining the problem and its severity, asking the customer to download 6.3.2, and apologizing for the whole ordeal. I sent the email immediately to all those who downloaded 6.3.1, and waited for the fallout. It could hardly be worse than having to respond to the same angry complaint of 6.3.1 locking up over and over for the next two weeks. Or worse, having people dump EditPad when it crashes, without checking if there’s an update.
While the recall did result in a bunch of extra support requests, I did not receive a single negative reply. Let me quote you a bunch of replies that people did send:
This bug hit me once yesterday but I blamed Windows XP Thanks for the quick repair work.
[...] Thanks again for sending out the notification email recommending that I immediately upgrade to the new version. This is excellent proactive customer support and I commend you for it.
No apology necessary. “The person who never made a mistake never made anything else either.” Your prompt notification and remedy has reinforced the high regard I have for your software.
A quick response to let you know how much I appreciate your attention to product quality and your dedication to your customers. It is your thoroughness that continues to earn my highest personal respect and my loyalty as a customer.
Oh dear – things happen… But it is one of the reasons I have such faith in your software and your service – an immediate apology and an immediate fix. If only more were like that… Thanks once again for your ever efficient and outstanding service.
Jan, WOW! You are really on the ball! If Microsoft were even half as good, maybe their stuff’d be worth what they want people to pay for it… Many thanks for your kind notice! Frankly, I’d rather do text in Editpad than in Word if it’s amenable.
This is a shining example of your perfect committment to quality! I had experienced the problem, but you solved it before I had a chance to report it. Thanks!
The first and last quotes are particularly noteworthy. There were quite a few more who wrote that they’d encountered the problem, but never emailed for support. And those are probably only the tip of the iceberg of those who encountered the problem, but didn’t ask for support, and didn’t say thanks afterwards. (Nor would I want everyone to say thanks. We’d spend all day sorting out the replies.) But the point is: you can’t rely on your customers to find your software’s bugs for you. Particularly threading issues which happen seemingly randomly often go unreported, as people don’t like to embarras themselves with inaccurate descriptions. (Thinking: “It only crashes 10% of the time, so if I write, it won’t happen to them, and I’ll just get a boilerplate response saying there’s no problem.”)
The customer I quoted first blamed the wrong product. That, and all the other praise, is what a software publisher gets when they really put in an effort to fix bugs quickly. Fixing bugs doesn’t cost software developers money. It earns them money through increased goodwill and increased word-of-mouth, eventually leading to increased sales. Ironically, a developer with a track record of resolving problems quickly can more easily get away with shabby releases than a developer without such a track record. If 1.0.0 doesn’t work well, nobody minds waiting for 1.0.1 or even 1.0.5, if they know they’ll get it soon enough.