Wednesday, August 19, 2009

Design for debugging

After we release a product to market, it's not uncommon that our customer report bugs to us, even we gained a lot of confidence from "thorough" tests by ourselves and QA guys. If it occurs, we usually don't have the luxury to perform debugging in production environment with a debugger. So, it's important we design in advance to help us debugging in this situation.
I read from Darin Dillon's Debugging Strategies for .NET Developers that they always have several testing boxes to run released products. And it's strictly prohibited to install any development tools on them. It seems to be an extreme stern rule and hurt debugging efficiency. But I think it's a beneficial rule for producing a great product. This way, we developers are forced to think about how to make debugging easier in this situation which may occur frequently on customers' sites.
Take microsoft's windows for example, there are plenty of ways to assist debugging in released version. Such as dr.watson for collecting dump file, error reporting mechanism, system event log, etc. Engineers at MS are doing their best to collect as much information as possible when unexpected behavior occurs. You can hardly imagine is one of your application crashes and the tech support guy from MS asked you to assist him attaching a remote debugger to your machine. Neither will we be able to.

1. Logging
It's no doubt logging will be the first one jumping into our head. Logging is a good way on customer's site since it doesn't require the customer to install any additional software to enable it. While using this feature, we must always be cautious not to simply use printf or Debug.WriteLine otherwise we'll easily be overwhelmed by a mess of log flood.
A not-bad logging system should at least be able to or have: toggle on/off, consistent style and proper layout. It's better to have the ability to set logging level to avoid log flood when possible. Preferably, we can choose a mature logging library such as log4net.
An important requirement is that the logging system could be configured without recompilation. The best way is it can even be toggled at runtime without restarting the application. DbgView is a great tool can achieve so. (Ken.Zhang explained its work mechanism in this article.)

2. Dump file
Log file has the advantage of keeping record of what's happening as time goes by, but you can only see part of information that you explicitly output. Dump file is a copy of a process on customer's site that you can examine on your developing machine to view other information missing in log. But it's a static image of the process at a fixed time. Usually, it also doesn't require install a specific software to generate dump file. Core dump on linux and Dr. Watson on windows are supported by a naked OS installation.
If we need to perform some customizations on these tools, we'd better provide a friendly GUI tool to the customer so that a average user can use.

3. Source control tag
After we released a product, it's important we keep a record of the source control system version tag correlates to the release. It's mandatory that we examine the correct code to hunt for a bug. It's not possible that we commit new code to repository after the product has been released and the most recent code doesn't match the product release. How disappointed will we be if we seem to get some clue for a bug but are not able to retrieve the correct code. So, it's important for us to find out the source control system version tag for any release.
BTW, on windows, it's also important to keep a copy of related symbol file for each release. Refer to: PDB Files: What Every Developer Must Know by John Robbins

4. Information collector
The product should have at least passed testing in the development department. So we usually need to focus on the difference between the customer's environment and our own. Those difference are very likely to be the culprit. msinfo32.exe is a good candidate for this goal.

No comments: