Thursday, July 07, 2005

Unit Testing




What is a unit?



Some standards—particularly ANSI/IEEE Std 1008-1987 (IEEE
Standard for Software Unit Testing)—use a lax definition of
software unit. According to IEEE-Std-1008 a software unit "...may
occur at any level of the design hierarchy from a single module to
a complete program".


Unit. The smallest compilable component. A unit
typically is the work of one programmer (At least in principle). As
defined, it does not include any called sub-components (for
procedural languages) or communicating components in general.



Unit Testing. In unit testing called components (or
communicating components) are replaced with stubs, simulators, or trusted components. Calling components are replaced with drivers or trusted super-components. The unit is tested in isolation.




For object-oriented programs this means that the unit is usually
a class. Some classes (like a simple Stack) might be
self-contained, but most call other classes and are in turn called
by yet other classes. In an attempt to reduce confusion when things
go wrong, you should try to test each class in isolation. If you
don't test them in isolation, you are implicitly trusting all
classes used by the class you are testing. You are effectivelly
saying: I think all the other classes already work and if they
don't, I'm prepared to sort out the mess myself. That's what
"trusted" means in the above definition. If you don't think the
other classes work, you should test in isolation. This is normally
more work as writing stubs and drivers is a pain.



When to test?



As the scope of unit testing narrows down from complete programs
to individual classes, so does the meaning of integration testing.
Any time you test two or more already unit-tested classes together
instead of using stubs, you are doing a litle bit of integration
testing.



For some systems integration testing is a big issue, because
they wait to finish coding before they start unit testing. This is
a big mistake, as delaying unit testing means you will be doing it
under schedule pressure, making it all-too-easy to drop the tests
and just finish the code. Developers should expect to spend between
25% and 50% percent of their time writing unit tests. If they leave
testing until they have finished, they can expect to spend the same
amount testing as they spend writing the module in the first place.
This is going to be extremely painful for them. The idea is to
spread the cost of unit testing over the whole implementation
phase. This is sometimes called "incremental glass-box testing"
(see Marc Rettig's article).



If you wait until you've finished coding before you start unit
testing, you'll have to choose an integration strategy. Are you
going to start with low level classes first and work your way up
until you reach the classes that expose some functionality through
an public API or start from the top and write stubs for lower level
classes or will you just test them all in one go?



Code Coverage



The greatest doubt I had when writing the standard, was not only
how much coverage to mandate but also whether to mandate any
coverage at all. It is easy enough to come up with a figure: 85%
seems to be pretty standard. But I have to agree with Brian Marick
[BM] that there is no evidence supporting this number. In my
opinion 100% is reasonable, as anything less means you haven't
tested that particular statements at all. Of course it is difficult
to have automatic unit tests for certain parts of the code.
Hardware interaction and UI code are typical examples. Or panics.
If you have acceptance tests that include tests for these parts of
the code or review them thoroughly, and if you make a sincere
effort to minimise the size and complexity of these parts of the
code, you can normally get away with not unit testing them. But I'd
rather include any known exceptions to the coverage rule in the
standard itself instead of arbitrarily lowering the bar for all the
rest of the code.



Three pitfalls to consider if you mandate coverage:




  1. Don't consider tests that don't increase coverage as redundant.
    This is a big mistake. They might not add coverage, but they might
    find bugs. Coverage isn't everything. Brian Marick expressed this
    nicely “Coverage tools don't give commands (make that
    evaluate true), they give clues (you made some mistakes somewhere
    around there)”. Use coverage metrics to improve your test
    design skills.


  2. If you mandate 85% coverage, what's the chance of anybody
    actually achieving significantly more coverage than that? Nil?


  3. If developers get fixated with achieving coverage, they might
    try to reach 100% from the start and keep it there. This might be
    difficult to achieve and impact the productivity. The standard
    addresses the first point by including guidelines on test case
    design as an appendix. The standard addresses the second point by
    mandating 100% code coverage. The risk here is that people will
    tend to shift code into those areas (external interactions, UI)
    where coverage isn't mandated. It is the responsibility of
    everybody reviewing test code to watch out for this. The standard
    addresses the third point by making coverage measures available
    from the start but only requiring compliance with them in mayor
    milestones of the project. Also designing black-box test cases
    improves the chances of the test cases remaining adequate after
    some implementation change.



Picking a bigger unit



Testing each individual class can be a pain. In OO systems some
classes are very closely related and testing each of them in
isolation might mean much more effort in the sense of writing
stubs, etc. Taking this to the logical conclusion means you test
only the public API of the executable. The main reason against this
is that it is extremely difficult to get good coverage using this
approach, even if you settle for a relatively easy goal like
statement coverage. Also testing individual classes means you
actually only need to worry about stubs for external systems for
those classes that actually interact with them. For all classes in
your component that only interact with other classes in the same
component, you might not need stubs at all.



Automatic and Self-documenting



This standard mandates much less unit test documentation than is
required by older standards such as ANSI/IEEE Std 829-1983. The
main reason for this is that these standards more or less imply
manual execution of tests. A lot of documentation is required in
order to make this execution repeatable by somebody other than the
developer. This is not needed as the standard proposes the creation
of self-documenting automatic unit tests. The unit tests created by
following this standard are self-documenting–because all use
the same command-line arguments–and don't require any manual
procedures. When manual procedures are wanted for more in-depth
testing, the standard also specifies a location for that
information (testharness -h).



Test Results



Another area where standards frequently require more
documentation is test results. Which test case failed? Are you sure
you followed the test procedure? What were you doing when the test
failed? All this is redundant. Tests are automatic and only give
two answers for each test case: OK or Not OK. Sometimes standards
require information like: Information Why it is redundant Inputs
Cannot be chosen. Hard-coded. Expected results For each test: Test
Name…OK Actual results For some test: Test Name…[OK
or Not OK] Anomalies Report as Defect. Date and Time Printed by the
unit test Procedure Step Only one step: run them. Environment No
environment required for compliant tests. Attempts to repeat Tests
are automated. One failure means report as defect. Testers Whoever
reports the defect. Observers Whoever is watching the tester (not
very fun with automatic tests…).



Summing it up: “[with other approaches] extensive test
documentation is developed for each class. In contrast, our
approach minimizes the need to document each test suite by
standarizing the structure of all test suites. In other words, the
test framework is documented once and reused for each test
suite.” [CB].



Beyond unit tests



Apart from unit tests other types of tests are needed. Our
products are mainly reusable components (APIs) in contrast with
other companies that produce finished applications. This will be
even more so as the emphasis of producing GUIs for our components
shifts over to techview. Because we're producing components,
developer testing is easier as we can create programs that use our
components in order to test them. Independent testing and
validation is made more difficult for this same reason, as some
types of testing would require the testing team to be composed of
C++ programmers. This being unlikely, the approach to
validation/acceptance testing usually is: 1. Developers provide a
reference UI (either the developer's of the component or techview),
so that the functionality can be tested interactively. 2.
Developers provide a “acceptance test application”.
This application could be driven by a simple text file, allowing
testers to add their own test cases based on their interpretation
of the requirements. These approaches are still valid for testing.
The unit testing approach proposed in this standard affects them:
1. By lowering defect rates in acceptance testing. 2. By providing
build verification tests. 3. By making sure testers aren't the
first ones to run the unit tests. 4. By providing clear guidance on
which unit test failures should be raised as defect: any failure
means a defect. A reasonable cross-team integration approach might
still be necessary and is beyond the scope of this standard as it
heavily depends on project specific issues. The only recommendation
I would make in that respect is to set-up a cross-team automatic
build (with BVT as outlined in the standard).

No comments: