Seite_06

[ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ]

Testing, and testers, in your team

Let's step back, now, and thinking about the professional place of testing and testers in your team. I said " professional" .

This professional, separate activity requires resources allocated to it, and it requires trained people to do it. Users test also -- that's a big part of what " user acceptance testing" is all about, after all -- but real testing requires repetitive, repeatable, exacting, painstaking attention.

Testing is an activity that should be completely separate from programming. Your own experience should prove the assertion I made earlier in this paper (that testing and logging for bugs separately from fixing bugs makes each bug take less time to fix, and usually renders the fix more reliable as well).

I do not suggest that programmers should not test their own work and be responsible for their own work. There are testing techniques, such as code reviews and walk-throughs, that are uniquely the province of programming professionals, in fact. I only assert that testing also requires some emotional and intellectional distance, and that making testing a separate activity, or (when possible) the province of separate people, provides this distance.

A person who will makes a good tester is patient. S/he has a facility with statistics, and an ability to interpret them. (Testers gather results, but to make sense of those results is a form of data mining that requires experience, instincts, and probably a little magic.) A person who make a good testing professional will have at least a smattering of theoretical knowledge about the subject or a willingness to learn more. (For why you need theory, keep reading.)

Testing also requires that you, as a developer and manager, take a constructive attitude towards defects, and that you display this attitude in your relationship with testers (no matter who your testers are, in the end). This attitude does not result in " carrot-and-stick" approaches to testing. By this I mean that testers:

are not blamed for finding " too many bugs" , no matter how close to a deadline you are or how stressed you are; and
are not rewarded for bug counts by sheer numbers or any other meaningless metric.

Both points above probably look obvious but -- in practice -- they often are over looked.

To amplify the first point, I have observed that testers in many organizations are responsible for bug tracking after they report a bug. This is not necessarily a bad idea, but there are occasions on which the tracking and followup requirements become almost punitive, so testers have a disincentive to find bugs. I sometimes wonder whether management in these cases is trying, perhaps subconsciously, to avoid hearing bad news by creating these requirements.

To show you the gravity of the second possible error, I'll illustrate with an anecdote told by Bob Lewis, a regular columnist in Infoworld .(I only half-recall the story, and will tell it as best I can, but I assure you that Mr. Lewis made it far more vivid. I recommend that you all take a look at his column, " IS Survival Guide" , for technical management issues. Visit http://www.infoworld.com/cgi-bin/displayNew.pl?/lewis/rllist.htm.).

Imagine that your programming shop produces lawnmowers instead of applications.

Imagine that quality inspectors are rewarded by how many defects they find, regardless of the relative importance of the defects.

(Just for kicks: imagine, while we're at it, that the rest of the production line is treated the same way. The number of defects repaired per day is the most important thing, and number of lawnmowers that must be withheld from sale rather than repaired and shipped out, regardless of the reason, will be counted against them. And imagine that the customer service people are rewarded for number of calls handled per hour, regardless of the quality of help or even consideration they provide, per customer call.)

Now imagine that it is very, very easy to find missing paint flecks on the lawnmower handle. On the other hand, it may take hours of investigation to find a lawnmower blade that is improperly attached… and will fly off and kill somebody.

If you've got this thoroughly in mind (and even if your imagination hasn't extended to the inevitable lawsuits, let alone to the blood and gore), you can appreciate the danger of inappropriately-applied performance metrics for testers and other people in your organization.

Use Multiple Testing Methodologies

Here comes the theory part.

In Steve McConnell's Code Complete (Microsoft Press, 1993), a considerable amount of space is devoted to evaluating effectiveness, and relative effectiveness, of various testing methodologies. I've read plenty of articles espousing a particular technique or touting a specific tool, and more than one other book about testing in general. Nothing, however, in all these evaluations and promotions, has impressed me as much as one argument presented by Mr. McConnell: multiple testing techniques used in concert are invariably more effective than any one technique used in isolation.

Mr. McConnell cites a chart on defect-detection in ten common techniques from Capers Jones' Programming Productivity (reprinted as Table 23-1 in section 23-3, " Relative Effectiveness of Testing Techniques" ), from which he draws this inference:

The most interesting fact that this data reveals is that the modal rates don't rise above 65 percent for any single technique. Moreover, for the most common kind of defect detection, unit testing, the modal rate is only 25 percent.

The strong implication is that if project developers are striving for a higher defect-detection rate, they need to use a combination of techniques.

He then supports this inference with data from a study by Glenford Myers, as follows:

When used individually, no method had a statistically significant advantage over any of the others. The variety of errors people found was so great, however, that any combination of two methods (including having two independent groups using the same method) increased the total number of defects found by a factor of almost 2….

Glenford Myers points out that the human processes (inspections and walkthroughs, for instance) tend to be better than computer-based testing at finding certain kinds of errors and that the opposite is true for other kinds of errors. ….

The upshot is that defect-detection methods work better in pairs than they do singly.

If there's anything you'll learn from the rest of this paper, and assuming you didn't already know and believe the old saw about " lies, damned lies, and statistics" , it's that not all research and metrics provide accurate results. However, I found both the statistics and the conclusions reached by Steve McConnell on this subject quite convincing -- because they make common sense.

Bugs come in many species. To catch more than one kind of critter, you need more than one kind of hunting technique. To use more than one kind of hunting technique, you need different weapons and people with different aptitudes.

If you accept this assertion, the requirement for professionalism follows naturally. It's difficult to become familiar with multiple tools and practices, but it's obviously necessary that people doing testing be familiar with multiple tools and practices. These people have to be trained professionals, QED.

I don't have any personal stake in what methodologies you choose. Any combination is better than any one technique, and you should pick from the following possibilities as best suits your development practices and team. Or, perhaps I should say that you should pick the ones that least appeal to you, as a programmer, since these may represent areas in which your programming skills are weak and your code will show the most frequent bugs!

Consider adopting as many of the following practices as you can. Items marked by an asterisk require involvement by developers along with, or in some cases rather than, testers.

*Unit testing

This is the practice of testing each module and each routine or object in isolation as it is developed, and before it is integrated within a larger system.

Integration testing

This comes after unit testing, and tests the suitability for and connections between a module or routine or object and the larger application.

Functional testing

Similar to unit testing, it tests whether each item, in effect, performs the task it was specified to do, which is a little different from simply " working" from the developer's point of view (which is: " it works, because it doesn't crash" ). In my view, functional testing should take place after integration testing, because we're concerned with the ability of the system to provide a feature, regardless of the way the user (or the system) happens to initiate that feature. For example, if a fax should be sent by the system, one " function" of that system is to send the fax. If this " function" is specified to generate a fax log event, then that log should appear regardless of whether the user opts to send the fax manually, from a button on the form or a toolbar or menu item, or whether the fax is automatically sent by the system.

Regression testing

The practice of re-testing all known functions and behavior after changes to the application. This is easiest to do if you have automated tools, a subject I'll cover in the next section.

*Boundary value analysis

A developer has to specify and publish boundary values expected and intepreted by any module. Once this analysis is available, the boundary values should be built into unit and regression tests.

Identification of high-yield test cases

Although this almost always includes testing for boundary values, boundary values are not the only type of high-yield test cases. Testing groups should identify potentially problematic groups of scenarios and build regression tests that focus on these groups.

Why? because resources are limited, tests need to be run repeatedly, and these test cases will provide the most fruitful testing in the shortest time. As part of this identification process, testing groups should build checklists of known problem areas, including those that involve application behavior (such as a CONFIG.FPW not being properly included with the application files) and those that are external to the application and not under direct application control. (For example, running out of diskspace, user deletes required files, DLLs on target machine are the wrong version, ODBC connection is not properly set up.)

Testing of randomly-generated data

Considering that user-supplied data is unpredictable, and externally-created data is unpredictable, what's an application to do? Why, assume the worst of course, and see how it works when it receives garbage input.

Scaffolding

This is the practice of building small stub programs that allow quick entry into otherwise-complex applications or scenarios, so that they can be more easily and more quickly tested. Again, the aim is to put whatever testing resources you have to the most fruitful use.

Defect logging

Bugs won't do any good if you can't track them to see whether they've been reported earlier, whether they're being fixed, whether they are reported as fixed and need re-testing, how severe their implications are, and so on. I can't tell you what your logging mechanism should look like, only that you need one! Considering the fact that you are database management programmers, you should be able to design a simple app, in-house, if you can't find a commercial tool that suits you for this purpose.

Scenario design and use

This is not strictly a testing problem; you need scenarios to design an application properly. My point here is that the same scenarios and use-cases that you created to design the application should be used to validate the application through testing. Tests should be written specifically to evaluate whether an application fulfills those original scenarios it was designed for. This is a " macro -view" of functional testing. It tests the way functions and features fit together to fulfill the user's real tasks.

[ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ]