Sunday, November 05, 2006

Life Cycle of a Software Bug


Bug tracking workflow, i.e., the lifecycle of a bug or defect, describes the states of the bug or defect from it is created to it is closed. The following are some commonly used terms for software bug tracking (if you are in a hardware or help desk customer support situation, it could be completely different):

new

When a bug is newly created, it has a state 'new'. Some people separate this 'new' state into two states, namely, 'open' before the bug is assigned and 'assigned' after the bug is assigned. In the case of Bugzero, since a newly created bug is always assigned, you may just call it 'new' or 'open' without the 'assigned' state.

You still can have an 'assigned' state if you decide to have all the newly submitted bugs to be sent to a manager (with 'new' state), and the manager then really assigns the bug. You can configure the workflow such that the next allowed state for 'new' is 'assigned'.

open

Some people call a newly created and yet to be assigned bug a state 'open'.

assigned

Some people call a newly created and assigned bug a state 'assigned'.

fixed

A bug that has been fixed by a developer has a state of 'fixed'. Normally, this is a state before the bug is confirmed (by QA) to be really fixed. If a bug is confirmed to have been fixed, it should have a state of 'closed'. Some systems allow you to configure that, only Developers can fix a bug.

resolved

Similar to 'fixed'.

closed

If a bug is confirmed (by QA) to have been fixed, it should have a state 'closed'. Some systems allow you to configure that, only QA can close a bug.

reopened

A 'closed' bug can be re opened if it re-surfaced or it is found to be not really fixed.

suspended

A bug can be 'suspended' if it is determined that the bug should not be fixed immediately or a fix can be delayed. Some people call it 'deferred'. Some systems allow you to configure that only a certain person, such as a Manager, can suspend a bug.

deferred

Similar to 'suspended'.

analyzed

If more information is needed to fix the bug, it can be conveniently set to a state 'analyzed'. You may want call it a different name, such as 'update'.

Tuesday, October 17, 2006

List of Defect Tracking Tools

This is a list of defect tracking tools. Both commercial and freeware tools are included. The tools on this list are all available standalone, with the exception of a few that are integrated with a test management system.

For More details, check out - http://www.testingfaqs.org/t-track.html

Thursday, September 21, 2006

Saturday, August 12, 2006

Selecting the best defect tracking system

The following is a general comparison overview of Bug Tracking Tools. It is focused particularly on the technical aspect. The aim is to help you to select the best tracking system that meets your requirement. Click on the link for a more detailed comparison between Bugzero, Bugzilla, and Gnats.

System Architecture:

  • Many old bug or defect tracking systems are client server based. You need install the server, and each user need install the client software. If external users were involved, it could be problematic because of issues like firewall etc. Also, it is not always feasible to install client software.
  • Newer systems are more likely web browser based and no client software need to be installed (except a browser). A web-based bug tracking system is especially attractive if your users are located in different locations and are connected through the internet.
  • For a web-based bug or defect tracking system, make sure it supports the browsers your users are using. Be aware that many systems support only IE.


Server Operating System:

  • Most commercial bug tracking systems are Windows based. In such a case, it is likely that it requires an NT/2000/XP server and a SQL Server database. Note that, a Windows XP Professional may not be sufficient, instead, a server may be required.
  • Most free bug or defect tracking systems are Linux/Unix based, and may not work as well on Windows. It may also require more technical skills to install and setup the system.
  • When people say their system is cross-platform, you need make sure they meant the server. Only a very few bug tracking systems are really cross-platform (with the same code base). Some vendors claim to support multiple OS, but they have completely independent versions for each OS and that results in higher costs for the vendor and therefore higher price for the end users.


Backend Database:

  • Most bug or defect tracking systems require a backend database, but a few are file based. In the latter case, make sure it scales well. If someone tells you that a file based system is better than a database, think twice.
  • For Windows based systems, database selection may be limited to only Access and SQL Server. On the other hand, some free systems may lock you into just one database, notably MySQL. Only a very few bug tracking systems are really cross database systems.
  • Be aware of any bug tracking software that uses non-standard proprietary databases. They cannot be better than the public, commonly used database systems.


Language Support:

  • Many bug tracking systems do not support localization, particually, Asian langauges. Note that, it involves the web interface, the data, and the email notification.
  • If you do need localization, you should find a system that can do that easily.


Web Server:

  • For Windows based bug tracking systems, most likely it requires IIS as the web server.
  • For Java-based bug tracking systems, a Servlet or J2EE server is most likely required. There are many high quality servers you can download for free.


Programming Language:

  • Most of the bug tracking systems are written in either c/c++, or perl/php, or Java.
  • Depending on your IT environment and skill set, the programming language may be relevant in selecting your system. For example, if you are developing Java software, it may make sense to use a Java based bug tracking system.


Version Control Integration:

  • Some bug tracking systems have the capability of integrating with source control systems, such as CVS, Source Safe, etc.
  • Be aware of the limitations, and make sure it does the things you want.


Installation and Configuration:

  • A bug tracking system is not a desktop application and it rarely works out-of-the-box. It is not uncommon to spend a few hours to setup such a system, and then more time to customize it.
  • However, if you need only a lightweight bug tracking system, a heavy, complex, can-do-everything system is certainly a over kill and it may do more harm than good.


Maintenance and Support:

  • A bug tracking tool is not a super complex software system, but from time to time you may need technical support. As you certainly know, in most cases, the error messages from these systems are always cryptic, and you won't be able to solve the problem on your own.
  • How is the error handled in a tool is far more important than you might think. You as the administrator may want select a tool that you feel comfortable to work with.
  • When support is needed, it is always urgent to you, but not necessary to the vendor. Before you purchase the software, you should ask what is the response time for support.


Features:

  • Simple is the key here. The system must be simple that people like to use it, but not so complex that people avoid to use it. You might not want to deploy a tool that requires serious end user training. It is really not the initial training, rather the on-going support needed from your end users that you should be concerned with.
  • Yet it should be flexible and configurable enough to satisfy your business needs. If you select a tool that cannot do whatever you intend it to do, then what is the use of it?


Cost of Ownship:

  • The initial cost of a bug tracking system varies from free to tens of thousands of dollars. But be aware that this is not the same as the total cost of the ownership. Some free systems charge a hefty consulting fee for support and you may end up paying much more than you planned.
  • You should select a bug tracking system based on your needs, not just the price. If you know what you are doing and do not need commercial support, go for a free one if it meets your requirement.
  • However, if you unfortunately selected a bad one, you better get out of it as soon as possible, because the longer you keep it, the more moeny and time you will have to spend on it.
In any case, spending many days to setup a free system or even weeks or months to create an in-house system makes no business and economic sense, because if you consider the time spent, you are actually paying much more than just buying one.

Friday, July 21, 2006

Unit Testing with Mock Objects

Unit testing is a fundamental practice in Extreme Programming, but most non-trivial code is difficult to test in isolation. It is hard to avoid writing test suites that are complex, incomplete, and difficult to maintain and interpret. Using Mock Objects for unit testing improves both domain code and test suites. They allow unit tests to be written for everything, simplify test structure, and avoid polluting domain code with testing infrastructure.

You need to make sure that you test one feature at a time, and you want to be notified as soon as any problem occurs. Normal unit testing is hard because you are trying to test the code from outside.

There is a technique called Mock Objects in which we replace domain code with dummy
implementations that emulate real code. These Mock Objects are passed to the target domain code which they test from inside, also termed as Endo-Testing. This practice is similar to writing code stubs with two interesting differences: we test at a finer level of granularity than is usual, and we use our tests and stubs to drive the development of our production code.

Developing unit tests with Mock Objects leads to stronger tests and to better structure of both domain and test code. Unit tests written with Mock Objects have a regular format that gives the development team a common vocabulary. We believe that code should be written to make it easy to test, and have found that Mock Objects is a good technique to achieve this.

An essential aspect of unit testing is to test one feature at time; you need to know exactly what you are testing and where any problems are. Test code should communicate its intent as simply and clearly as possible. This can be difficult if a test has to set up domain state or the domain code causes side effects. Worse, the domain code might not even expose the features to allow you to set the state necessary for a test.

A Mock Object is a substitute implementation to emulate or instrument other domain code. It should be simpler than the real code, not duplicate its implementation, and allow you to set up private state to aid in testing. The emphasis in mock implementations is on absolute simplicity, rather than completeness. For example, a mock collection class might always return the same results from an index method, regardless of the actual parameters.


Mock Objects are not just stubs

As a technique, Mock Objects is very close to Server Stubs. The main concerns about using Server Stubs are: that stubs can be too hard to write, that the cost of developing and maintaining stubs can be too high, that dependencies between stubs can be cyclic, and that switching between stub and production code can be risky.


Why use Mock Objects?

An important aspect of Extreme Programming is not to commit to infrastructure before you have to. For example, we might wish to write functionality without committing to a particular database. Until a choice is made, we can write a mock class that provides the minimum behaviour that we would expect from our database. This means that we can continue writing the tests for our application code without waiting for a working database. The mock code also gives us an initial definition of the functionality we will require from the database.

Unit tests, as distinct from functional tests, should exercise a single piece of functionality. A unit test that depends on complex system state can be difficult to set up, especially as the rest of the system develops. Mock Objects avoid such problems by providing a lightweight emulation of the required system state. Furthermore, the setup of complex state is localised to one Mock Object instead of scattered throughout many unit tests.

Some unit tests need to test conditions that are very difficult to reproduce. For example, to test server failures we can write a Mock Object that implements the local proxy for the server.

Domain objects often fail some time after an error occurs, which is one reason that debugging can be so difficult. With tests that query the state of a domain object, all the assertions are made together after the domain code has executed. This makes it difficult to isolate the exact point at which a failure occurred. One of the authors experienced such problems during the development of a financial pricing library. The unit tests compared sets of results after each calculation had finished. Each failure required considerable tracing to isolate its cause, and it was difficult to test for intermediate values without breaking encapsulation.


Limitations of Mock Objects

As with any unit testing, there is always a risk that a Mock Object might contain errors, for example returning values in degrees rather than radians. Similarly, unit testing will not catch failures that arise from interactions between components. For example, the individual calculations for a complex mathematical formula might be within valid tolerances, and so pass their unit tests, but the cumulative errors might be unacceptable. This is why functional tests are still necessary, even with good unit tests. Extreme Programming reduces, but does not eliminate, such risks with practices such as Pair Programming and Continuous Integration.

Mock Objects reduce this risk further by the simplicity of their implementations.
In some cases it can be hard to create Mock Objects to represent types in a complex external library. The most difficult aspect is usually the discovery of values and structures for parameters that are passed into the domain code. In an event-based system, the object that represents an event might be the root of a graph of objects, all of which need mocking up for the domain code to work. This process can be costly and sometimes must be weighed against the benefit of having the unit tests. However, when only a small part of a library needs to be stubbed out, Mock Objects is a useful technique for doing so.
One important point that we have learned from trying to retrofit Mock Objects is that, in
statically typed languages, libraries must define their APIs in terms of interfaces rather than classes so that clients of the library can use such techniques.

Saturday, June 17, 2006

Checklist for Test Preparation

Listed below are questions/suggestions for systematically planning and preparing software testing.
  • Have you planned for an overall testing schedule and the personnel required, and associated training requirements?

  • Have the test team members been given assignments?

  • Have you established test plans and test procedures for

  • module testing,

  • integration testing,

  • system testing, and

  • acceptance testing?

  • Have you designed at least one black-box test case for each system function?

  • Have you designed test cases for verifying quality objectives/factors (e.g. reliability, maintainability, etc.)?

  • Have you designed test cases for verifying resource objectives?

  • Have you defined test cases for performance tests, boundary tests, and usability tests?

  • Have you designed test cases for stress tests (intentional attempts to break system)?

  • Have you designed test cases with special input values (e.g. empty files)?

  • Have you designed test cases with default input values?

  • Have you described how traceability of testing to requirements is to be demonstrated (e.g. references to the specified functions and requirements)?

  • Do all test cases agree with the specification of the function or requirement to be tested?

  • Have you sufficiently considered error cases? Have you designed test cases for invalid and unexpected input conditions as well as valid conditions?

  • Have you defined test cases for white-box-testing (structural tests)?

  • Have you stated the level of coverage to be achieved by structural tests?

  • Have you unambiguously provided test input data and expected test results or expected messages for each test case?

  • Have you documented the purpose of and the capability demonstrated by each test case?

  • Is it possible to meet and to measure all test objectives defined (e.g. test coverage)?

  • Have you defined the test environment and tools needed for executing the software test?

  • Have you described the hardware configuration an resources needed to implement the designed test cases?

  • Have you described the software configuration needed to implement the designed test cases?

  • Have you described the way in which tests are to be recorded?

  • Have you defined criteria for evaluating the test results?

  • Have you determined the criteria on which the completion of the test will be judged?

  • Have you considered requirements for regression testing?

Wednesday, May 31, 2006

Task-Based Software Testing

Introduction

There is a plethora of software testing techniques available to a development team. A survey by Zhu, identified over 200 unit testing techniques. However, for the services’ operational test agencies, there has been a continuing, unanswered question of how to test software’s impact on a system’s mission effectiveness. I propose a task-based approach as part of an integrated test strategy in an effort to answer this long-standing question.


Why Test?

From a speech by Lloyd K. Mosemann II, at the time the Deputy Assistant Secretary for the Air Force (Communications, Computers, and Support Systems), a customer’s concerns are:

They want systems that are on-time, within budget, that satisfy user requirements, and are reliable.

A report from the National Research Council refines the latter two concerns in his statement by presenting two broad objectives for operational testing:

  1. to help certify, through significance testing, that a system’s performance satisfies its requirements as specified in the ORD and related documents, and

  2. to identify any serious deficiencies in the system design that need correction before full rate production

Following the path from the system level to software, these two reasons are consistent with the two primary reasons for testing software or software intensive systems. Stated generically, these are:

  1. test for defects so they can be fixed, and

  2. test for confidence in the software

The literature often refers to these as “debug” and “operational” testing, respectively. Debug testing is usually conducted using a combination of functional test techniques and structural test techniques. The goal is to locate defects in the most cost-effective manner and correct the defects, ensuring the performance satisfies the user requirements. Operational testing is based on the expected usage profile for a system. The goal is to estimate the confidence in a system, ensuring the system is reliable for its intended use.

Task-Based Testing

Task-based testing, as I define it here, is a variation on operational testing. It uses current DoD doctrine and policy to build a framework for designing tests. The particular techniques are not new, rather it leverages commonly accepted techniques by placing them within the context of current DoD operational and acquisition strategies.

Task Analysis

Task-based testing, as the name implies, uses task analysis. Within the DoD, this begins with the Uniform Joint Task List and, in the case of the Air Force, is closely aligned with the Air Force Task List (AFTL). The AFTL “...provides a comprehensive framework for all of the tasks that the Air Force performs.” Through a series of hierarchical task analyses, each unit within the service creates a Mission Essential Task List (METL). The Mission Essential Tasks (METs) are “...only those tasks that represent the indispensable tasks to that particular organization.”

METLs, however, only describe “what” needs to be done, not “how” or “who.” Further task decomposition identifies the system(s) and people required to carry out a mission essential task. Another level of decomposition results in the system tasks (i.e. functions) a system must provide. This is, naturally, the level in which developers and testers are most interested. From a tester’s perspective, this framework identifies the most important functions to test by correlating functions against the mission essential tasks a system is designed to support.

This is distinctly different from the typical functional testing or “test-to-spec” approach where each function or specification carries equal importance. Ideally, there should be no function or specification which does not contribute to a task, but in reality there are often requirements, specifications, and capabilities which do not or minimally support a mission essential task. Using task analysis, one identifies those functions impacting the successful completion of mission essential tasks and highlights them for testing.

Operational Profiles

The above process alone has great benefit in identifying what functions are the most important to test. However, the task analysis above only identifies the mission essential tasks and functions, not their frequency of use. Greater utility can be gained by combining the mission essential tasks with an operational profile an estimate of the relative frequency of inputs that represent field use. This has several benefits:

“...offers a basis for reliability assessment, so that the developer can have not only the assurance of having tried to improve the software, but also has an estimate of the reliability actually achieved.”

“...provides a common base for communicating with the developers about the intended use of the system and how it will be evaluated.”

“When testing schedules and budgets are tightly constrained, this design yields the highest practical reliability because if failures are seen they would be the high frequency failures.”

The first benefit has the advantage of applying statistical techniques, both in the design of tests and in the analysis of resulting data. Software reliability estimation methods such as those are available to estimate both the expected field reliability and the rate of growth in reliability. This directly supports an answer to the long-standing question about software’s mpact on a system’s mission effectiveness as well as answering Mr. Mosemann II’s fourth concern a customer has (is it reliable).

Operational profiles are criticized as being difficult to develop. However, as part of its current operations and acquisition strategy, the DoD inherently develops an operational profile. At higher levels, this is reflected in such documents as the Analysis of Alternatives (AOA), the Operational Requirements Document (ORD), Operations Plans, Concept of Operations (CONOPS), etc. Closer to the tester’s realm is the interaction between the user and the developer which the current acquisition strategy encourages. The tester can act as a facilitator in helping the user refine his or her needs while providing insight to the developer on expected use. This highlights the second benefit above the communication between the user, developer, and tester.

The third benefit is certainly of interest in today’s environment of shrinking budgets and manpower, shorter schedules (spiral acquisition), and greater demands on a system. Despite years of improvement in the software development process, one still sees systems which have gone through intensive debug testing (statement coverage, branch coverage, etc.) and “test-to-spec,” but still fail to satisfy the customer’s concerns as stated by Mr. Mosemann II. By involving a customer early in the process to develop an operational profile, the most needed functions to support a task will be developed and tested first, increasing the likelihood of satisfying the customer’s four concerns.

Task-Based Software Testing

Task-based software testing, as defined herein, is the combination of a task analysis and an operational profile. The task analysis helps partition the input domain into mission essential tasks and the system functions which support them. Operational profiles, based on these tasks, are developed to further focus the testing effort.

Integrated Testing

Operational testing is not without its weaknesses. As a rather obvious example of this, one can raise the question, “What about a critical feature that is seldom executed?” Operational testing, or task-based testing as defined herein, does not address such questions well. Debug testing, with the explicit goal of locating defects in a cost-effective manner, is more suited to this.

Debug Testing

Debug testing is “...directed at finding as many bugs as possible, by either sampling all situations likely to produce failures (e.g., methods informed by code coverage or specification criteria), or concentrating on those that are considered most likely to produce failures (e.g., stress testing or boundary testing methods).” survey of unit testing methods are examples of debug testing methods. These include such techniques as statement testing, branch testing, basis path testing, etc. Typically associated with these methods are some criteria based on coverage, thus they are sometimes referred to as coverage methods. Debug testing is based on a tester’s hypothesis of the likely types and locations of bugs. Consequently, the effectiveness of this method depends heavily on whether the tester’s assumptions are correct.

If a developer and/or tester has a process in place to correctly identify the potential types and locations of bugs, then debug testing may be very effective at finding bugs. If a “standard” or “blind” approach is used, such as statement testing for its own sake, the testing effort may be ineffectual and wasted. A subtle hazard of debug testing is that it may uncover many failures, but in the process wastes test and repair effort without notably improving the software because the failures occur at a negligible rate during field use.

Integration of Test Methods

Historically, a system’s developer relied on debug testing (which includes functional or “test-to-spec” testing). Testing with the perspective of how the system would by employed was not seen until an operational test agency (OTA) became involved. Even on the occasions when developmental test took on an operational flavor, this is viewed as too late in the process. This historical approach to testing amplifies the weaknesses of both operational and debug testing. I propose that task-based software testing be accelerated to a much earlier point in the acquisition process. This has the potential of countering each respective method’s weaknesses with the other’s strengths. This view is supported by the current philosophy in the test community, to develop a combined test force spanning contractor, developmental, and operational test (CT/DT/OT).

Summary

Task-based software evaluation is a combination of demonstrated, existing methods (task analysis and operational testing). Its strength lies in matching well with the DoD’s current operational strategy of mission essential tasks and the acquisition community’s goal to deliver operational capability quickly. By integrating task-based software testing with existing debug testing, the risk of meeting the customer’s four concerns (on-time, within budget, satisfies requirements, and is reliable) can be reduced.

Friday, April 28, 2006

Test Efficiency Vs Test Effectiveness




S. No

Test Efficiency

Test Effectiveness

1
Test efficiency=
internal in the organization
how much resources were consumed
how much of these resources were utilized
Test effectiveness =
how much the customer's requirements are satisfied by the system
how well the customer specifications are achieved by the system
how much effort is put in developing the system
2
Number of test cases executed divided by unit of time (generally per hour).
Number of defects found divided by number of test cases executed.
3
Test efficiency =
(total number of defects found in unit+integration+system) / (total number of defects found in unit+integration+system+User acceptance testing)
Test effectiveness =
(total number of defects injected +total number of defect found) / (total number of defect escaped)* 100
4
Test Efficiency: Test the amount of code and testing resources required by a program to perfirm a function
Test Effectivness: It judge the Effect of the test enviornment on the application
5
Testing Efficiency = (No. of defects Resolved / Total No. of Defects Submitted)* 100
Test Effectiveness = Loss due to problems / Total resources processed by the system
6
Test Efficiency is the rate of bugs found by the tester to the total bugs found.
Let me explain it more clearly:
When the build is sent to the customer side people for the testing (Alpha and Beta Testing), the customer side people also find some bugs.

Test Efficiency = A/(A+B)
Here A= Number of bugs found by the tester
B= Number of bugs found by the custome side people
This test efficiency should always be greater than 90%

Tuesday, March 14, 2006

Functional Testing Vs Non-Functional Testing


S. No


Functional Testing


Non-Functional Testing

1

Testing developed application against business requirements.

Functional testing is done using the functional specifications provided by the client or by using the design specifications like use cases provided by the design team.

Testing the application based on the clients and performance
requirement.

Non-Functioning testing is done based on the requirements and test scenarios defined by the client.

2

Functional testing covers

· Unit Testing
· Smoke testing / Sanity testing
· Integration Testing (Top Down,
Bottom up Testing)
· Interface & Usability Testing
· System Testing
· Regression Testing
· Pre User Acceptance Testing
(Alpha & Beta)
· User Acceptance Testing
· White Box & Black Box Testing
· Globalization & Localization
Testing

Non-Functional testing covers

· Load and Performance Testing
· Ergonomics Testing
· Stress & Volume Testing
· Compatibility & Migration Testing
· Data Conversion Testing
· Security / Penetration Testing
· Operational Readiness Testing
· Installation Testing
· Security Testing (Application
Security, Network Security,
System Security)



Monday, February 27, 2006

Test Case


What is a Test Case?

Definition of Test Case

- In software engineering, a test case is a set of conditions or variables under which a tester will determine if a requirement upon an application is partially or fully satisfied. It may take many test cases to determine that a requirement is fully satisfied. In order to fully test that all the requirements of an application are met, there must be at least one test case for each requirement unless a requirement has sub requirements. In that situation, each sub requirement must have at least one test case.

More Definitions of a Test Case.

- A test case is also defined as a sequence of steps to test the correct behavior of a functionality/feature of an application.

- A set of inputs, execution preconditions, and expected outcomes developed for a particular objective, such as to exercise a particular program path or to verify compliance with a specific requirement.

- A test case is a list of the conditions or issues of what the tester want to test in a software. Test case helps to come up with test data. A test case has an input description, Test sequence and an expected behavior.

The characteristics of a test case is that there is a known input and an expected output, which is worked out before the test. The known input should test a pre-condition and the expected output should test a post-condition.

Under special circumstances, there could be a need to run the test, produce results - and a team of experts evaluate if the results can be considered as passed. The first test is taken as the base line for subsequent test / product release cycles.

Test cases include a description of the functionality to be tested taken from either the requirements or use cases, and the preparation required to ensure that the test can be conducted.

I will post Sample test cases in the next few posts...

Sunday, January 08, 2006

Smalltalk Testing With Patterns


I found this article on small talk worth a quick read...


Smalltalk is an object-oriented, dynamically typed, reflective programming language.

A Smalltalk program is a description of a dynamic computational process. The Smalltalk programming language is a notation for defining such programs. From ANSI Smalltalk standard, section 3.

Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human-computer symbiosis".

http://docs.python.org/lib/module-unittest.html


Simple Smalltalk Testing: With Patterns

Kent Beck,
First Class Software, Inc.
KentBeck@compuserve.com

This software and documentation is provided as a service to the programming community. Distribute it free as you see fit. First Class Software, Inc. provides no warranty of any kind, express or implied.

(Transcribed to HTML by Ron Jeffries. The software is available for many Smalltalks, and for C++, on my FTP site.)

Introduction

Smalltalk has suffered because it lacked a testing culture. This column describes a simple testing strategy and a framework to support it. The testing strategy and framework are not intended to be complete solutions, but rather a starting point from which industrial strength tools and procedures can be constructed.

The paper is divided into three sections:

  • Philosophy - Describes the philosophy of writing and running tests embodied by the framework. Read this section for general background.
  • Cookbook - A simple pattern system for writing your own tests.
  • Framework - A literate program version of the testing framework. Read this for in-depth knowledge of how the framework operates.
  • Example - An example of using the testing framework to test part of the methods in Set.

Philosophy

I don’t like user interface-based tests. In my experience, tests based on user interface scripts are too brittle to be useful. When I was on a project where we used user interface testing, it was common to arrive in the morning to a test report with twenty or thirty failed tests. A quick examination would show that most or all of the failures were actually the program running as expected. Some cosmetic change in the interface had caused the actual output to no longer match the expected output. Our testers spent more time keeping the tests up to date and tracking down false failures and false successes than they did writing new tests.

My solution is to write the tests and check results in Smalltalk. While this approach has the disadvantage that your testers need to be able to write simple Smalltalk programs, the resulting tests are much more stable.

Failures and Errors

The framework distinguishes between failures and errors. A failure is an anticipated problem. When you write tests, you check for expected results. If you get a different answer, that is a failure. An error is more catastrophic, a error condition you didn't check for.

Unit testing

I recommend that developers write their own unit tests, one per class. The framework supports the writing of suites of tests, which can be attached to a class. I recommend that all classes respond to the message "testSuite", returning a suite containing the unit tests. I recommend that developers spend 25-50% of their time developing tests.

Integration testing

I recommend that an independent tester write integration tests. Where should the integration tests go? The recent movement of user interface frameworks to better programmatic access provides one answer- drive the user interface, but do it with the tests. In VisualWorks (the dialect used in the implementation below), you can open an ApplicationModel and begin stuffing values into its ValueHolders, causing all sorts of havoc, with very little trouble.

Running tests

One final bit of philosophy. It is tempting to set up a bunch of test data, then run a bunch of tests, then clean up. In my experience, this always causes more problems that it is worth. Tests end up interacting with one another, and a failure in one test can prevent subsequent tests from running. The testing framework makes it easy to set up a common set of test data, but the data will be created and thrown away for each test. The potential performance problems with this approach shouldn't be a big deal because suites of tests can run unobserved.

Cookbook

Here is a simple pattern system for writing tests. The patterns are:

PatternPurpose
FixtureCreate a common test fixture.
Test CaseCreate the stimulus for a test case.
CheckCheck the response for a test case.
Test SuiteAggregate TestCases.

Fixture

How do you start writing tests?

Testing is one of those impossible tasks. You’d like to be absolutely complete, so you can be sure the software will work. On the other hand, the number of possible states of your program is so large that you can’t possibly test all combinations.

If you start with a vague idea of what you’ll be testing, you’ll never get started. Far better to start with a single configuration whose behavior is predictable. As you get more experience with your software, you will be able to add to the list of configurations.

Such a configuration is called a "fixture". Examples of fixtures are:

FixturePredictions
1.0 and 2.0Easy to predict answers to arithmetic problems
Network connection to a known machineResponses to network packets
#() and #(1 2 3)Results of sending testing messages

By choosing a fixture you are saying what you will and won’t test for. A complete set of tests for a community of objects will have many fixtures, each of which will be tested many ways.

Design a test fixture.

  • Subclass TestCase
  • Add an instance variable for each known object in the fixture
  • Override setUp to initialize the variables

In the example, the test fixture is two Sets, one empty and one with elements. First we subclass TestCase and add instance variables for the objects we will need to reference later:

Class: SetTestCase     superclass: TestCase     instance variables: empty full

Then we override setUp to create the objects for the fixture:

SetTestCase>>setUp     empty := Set new.     full := Set     with: #abc     with: 5

Test Case

You have a Fixture, what do you do next?

How do you represent a single unit of testing?

You can predict the results of sending a message to a fixture. You need to represent such a predictable situation somehow.

The simplest way to represent this is interactively. You open an Inspector on your fixture and you start sending it messages. There are two drawbacks to this method. First, you keep sending messages to the same fixture. If a test happens to mess that object up, all subsequent tests will fail, even though the code may be correct. More importantly, though, you can’t easily communicate interactive tests to others. If you give someone else your objects, the only way they have of testing them is to have you come and inspect them.

By representing each predictable situation as an object, each with its own fixture, no two tests will ever interfere. Also, you can easily give tests to others to run.

Represent a predictable reaction of a fixture as a method.

  • Add a method to TestCase subclass
  • Stimulate the fixture in the method

The example code shows this. We can predict that adding "5" to an empty Set will result in "5" being in the set. We add a method to our TestCase subclass. In it we stimulate the fixture:

SetTestCase>>testAdd     empty add: 5.     ...

Once you have stimulated the fixture, you need to add a Check to make sure your prediction came true.

Check

A Test Case stimulates a Fixture.

How do you test for expected results?

If you’re testing interactively, you check for expected results directly. If you are looking for a particular return value, you use "print it", and make sure that you got the right object back. If you are looking for side effects, you use the Inspector.

Since tests are in their own objects, you need a way to programmatically look for problems. One way to accomplish this is to use the standard error handling mechanism (Object>>error:) with testing logic to signal errors:

2 + 3 = 5 ifFalse: [self error: ‘Wrong answer’]

When you’re testing, you’d like to distinguish between errors you are checking for, like getting six as the sum of two and three, and errors you didn’t anticipate, like subscripts being out of bounds or messages not being understood.

There’s not a lot you can do about unanticipated errors (if you did something about them, they wouldn’t be unanticipated any more, would they?) When a catastrophic error occurs, the framework stops running the test case, records the error, and runs the next test case. Since each test case has its own fixture, the error in the previous case will not affect the next.

The testing framework makes checking for expected values simple by providing a method, "should:", that takes a Block as an argument. If the Block evaluates to true, everything is fine. Otherwise, the test case stops running, the failure is recorded, and the next test case runs.

Turn checks into a Block evaluating to a Boolean. Send the Block as the parameter to "should:".

In the example, after stimulating the fixture by adding "5" to an empty Set, we want to check and make sure it’s in there:

SetTestCase>>testAdd     empty add: 5.     self should: [empty includes: 5]

There is a variant on TestCase>>should:. TestCase>>shouldnt: causes the test case to fail if the Block argument evaluates to true. It is there so you don’t have to use "(...) not".

Once you have a test case this far, you can run it. Create an instance of your TestCase subclass, giving it the selector of the testing method. Send "run" to the resulting object:

(SetTestCase selector: #testAdd) run

If it runs to completion, the test worked. If you get a walkback, something went wrong.

Test Suite

You have several Test Cases.

How do you run lots of tests?

As soon as you have two test cases running, you’ll want to run them both one after the other without having to execute two do it’s. You could just string together a bunch of expressions to create and run test cases. However, when you then wanted to run "this bunch of cases and that bunch of cases" you’d be stuck.

The testing framework provides an object to represent "a bunch of tests", TestSuite. A TestSuite runs a collection of test cases and reports their results all at once. Taking advantage of polymorphism, TestSuites can also contain other TestSuites, so you can put Joe’s tests and Tammy’s tests together by creating a higher level suite.

Combine test cases into a test suite.

(TestSuite named: ‘Money’)     add: (MoneyTestCase selector: #testAdd);     add: (MoneyTestCase selector: #testSubtract);     run

The result of sending "run" to a TestSuite is a TestResult object. It records all the test cases that caused failures or errors, and the time at which the suite was run.

All of these objects are suitable for storing with the ObjectFiler or BOSS. You can easily store a suite, then bring it in and run it, comparing results with previous runs.

Framework

This section presents the code of the testing framework in literate program style. It is here in case you are curious about the implementation of the framework, or you need to modify it in any way.

When you talk to a tester, the smallest unit of testing they talk about is a test case. TestCase is a User’s Object, representing a single test case.

Class: TestCase     superclass: Object

Testers talk about setting up a "test fixture", which is an object structure with predictable responses, one that is easy to create and to reason about. Many different test cases can be run against the same fixture.

This distinction is represented in the framework by giving each TestCase a Pluggable Selector. The variable behavior invoked by the selector is the test code. All instances of the same class share the same fixture.

Class: TestCase     superclass: Object     instance variables: selector     class variable: FailedCheckSignal

TestCase class>>selector: is a Complete Creation Method.

TestCase class>>selector: aSymbol     ^self new setSelector: aSymbol

TestCase>>setSelector: is a Creation Parameter Method.

TestCase>>setSelector: aSymbol     selector := aSymbol

Subclasses of TestCase are expected to create and destroy test fixtures by overriding the Hook Methods setUp and tearDown, respectively. TestCase itself provides Stub Methods for these methods which do nothing.

TestCase>>setUp     "Run whatever code you need to get ready for the test to run."  TestCase>>tearDown     "Release whatever resources you used for the test."

The simplest way to run a TestCase is just to send it the message "run". Run invokes the set up code, performs the selector, the runs the tear down code. Notice that the tear down code is run regardless of whether there is an error in performing the test. Invoking setUp and tearDown could be encapsulated in an Execute Around Method, but since they aren’t part of the public interface they are just open coded here.

TestCase>>run     self setUp.     [self performTest] valueNowOrOnUnwindDo: [self tearDown]

PerformTest just performs the selector.

TestCase>>performTest     self perform: selector

A single TestCase is hardly ever interesting, once you have gotten it running. In production, you will want to run many TestCases at a time. Testers talk of running test "suites". TestSuite is a User’s Object. It is a Composite of Test Cases.

Class: TestSuite     superclass: Object     instance variables: name testCases

TestSuites are Named Objects. This makes them easy to identify so they can be simply stored on and retrieved from secondary storage. Here is the Complete Creation Method and Creation Parameter Method.

TestSuite class>>named: aString     ^self new setName: aString  TestSuite>>setName: aString     name := aString.     testCases := OrderedCollection new

The testCases instance variable is initialized right in TestSuite>>setName: because I don’t anticipate needing it to be any different kind of collection.

TestSuites have an Accessing Method for their name, in anticipation of user interfaces which will have to display them.

TestSuite>>name     ^name

TestSuites have Collection Accessor Methods for adding one or more TestCases.

TestSuite>>addTestCase: aTestCase     testCases add: aTestCase  TestSuite>>addTestCases: aCollection     aCollection do: [:each  self addTestCase: each]

When you run a TestSuite, you'd like all of its TestCases to run. It's not quite that simple, though. If you have a suite that represents the acceptance test for your application, after it runs you'd like to know how long the suite ran and which of the cases had problems. This is information you would like to be able to store away for future reference.

TestResult is a Result Object for a TestSuite. Running a TestSuite returns a TestResult which records the information described above- the start and stop times of the run, the name of the suite, and any failures or errors.

Class: TestResult     superclass: Object     instance variables: startTime stopTime testName failures errors

When you run a TestSuite, it creates a TestResult which is timestamped before and after the TestCases are run.

TestSuite>>run      result      result := self defaultTestResult.     result start.     self run: result.     result stop.     ^result

TestCase>>run and TestSuite>>run are not polymorphically equivalent. This is a problem that needs to be addressed in future versions of the framework. One option is to have a TestCaseResult which measures time in milliseconds to enable performance regression testing.

The default TestResult is constructed by the TestSuite, using a Default Class.

TestSuite>>defaultTestResult     ^self defaultTestResultClass test: self  TestSuite>>defaultTestResultClass      ^TestResult

A TestResult Complete Creation Method takes a TestSuite.

TestResult class>>test: aTest     ^self new setTest: aTest  TestResult>>setTest: aTest     testName := aTest name.     failures := OrderedCollection new.     errors := OrderedCollection new

TestResults are timestamped by sending them the messages start and stop. Since start and stop need to be executed in pairs, they could be hidden behind an Execute Around Method. This is something else to do later.

TestResult>>start     startTime := Date dateAndTimeNow
TestResult>>stop     stopTime := Date dateAndTimeNow

When a TestSuite runs for a given TestResult, it simply runs each of its TestCases with that TestResult.

TestSuite>>run: aTestResult     testCases do: [:each  each run: aTestResult]

#run: is the Composite selector in TestSuite and TestCase, so you can construct TestSuites which contain other TestSuites, instead of or in addition to containing TestCases.

When a TestCase runs for a given TestResult, it should either silently run correctly, add an error to the TestResult, or add a failure to the TestResult. Catching errors is simple-use the system supplied errorSignal. Catching failures must be supported by the TestCase itself. First, we need a Class Initialization Method to create a Signal.

TestCase class>>initialize     FailedCheckSignal := self errorSignal newSignal     notifierString: 'Check failed - ';     nameClass: self message: #checkSignal

Now we need an Accessing Method.

TestCase>>failedCheckSignal     ^FailedCheckSignal

Now, when the TestCase runs with a TestResult, it must catch errors and failures and inform the TestResult, and it must run the tearDown code regardless of whether the test executed correctly. This results in the ugliest method in the framework, because there are two nested error handlers and valueNowOrOnUnwindDo: in one method. There is a missing pattern expressed here and in TestCase>>run about using ensure: to safely run the second halt of an Execute Around Method.

TestCase>>run: aTestResult     self setUp.     [self errorSignal         handle: [:ex  aTestResult error: ex errorString in: self]         do:              [self failedCheckSignal                 handle: [:ex  aTestResult failure: ex errorString in: self]                 do: [self performTest]]] valueNowOrOnUnwindDo: [self tearDown]

When a TestResult is told that an error or failure happened, it records that fact in one of its two collections. For simplicity, the record is just a two element array, but it probably should be a first class object with a timestamp and more details of the blowup.

TestResult>>error: aString in: aTestCase     errors add: (Array with: aTestCase with: aString)  TestResult>>failure: aString in: aTestCase     failures add: (Array with: aTestCase with: aString)

The error case gets invoked if there is ever an uncaught error (for example, message not understood) in the testing method. How do the failures get invoked? TestCase provides two methods that simplify checking for failure. The first, should: aBlock, signals a failure if the evaluation of aBlock returns false. The second, shouldnt: aBlock, does just the opposite.

should: aBlock     aBlock value ifFalse: [self failedCheckSignal raise]  shouldnt: aBlock     aBlock value ifTrue: [self failedCheckSignal raise]

Testing methods will run code to stimulate the test fixture, then check the results inside should: and shouldnt: blocks.

Example

Okay, that's how it works, how do you use it? Here's a short example that tests a few of the messages supported by Sets. First we subclass TestCase, because we'll always want a couple of interesting Sets around to play with.

Class: SetTestCase     superclass: TestCase     instance variables: empty full

Now we need to initialize these variables, so we subclass setUp.

SetTestCase>>setUp     empty := Set new.     full := Set          with: #abc          with: 5

Now we need a testing method. Let's test to see if adding an element to a Set really works.

SetTestCase>>testAdd     empty add: 5.     self should: [empty includes: 5]

Now we can run a test case by evaluating "(SetTestCase selector: #testAdd) run".

Here's a case that uses shouldnt:. It reads "after removing 5 from full, full should include #abc and it shouldn't include 5."

SetTestCase>>testRemove     full remove: 5.     self should: [full includes: #abc].     self shouldnt: [full includes: 5]

Here's one that makes sure an error is signalled if you try to do keyed access.

SetTestCase>>testIllegal     self should: [self errorSignal handle: [:ex  true] do: [empty at: 5. false]]

Now we can put together a TestSuite.

 suite  suite := TestSuite named: 'Set Tests'. suite addTestCase: (SetTestCase selector: #testAdd). suite addTestCase: (SetTestCase selector: #testRemove). suite addTestCase: (SetTestCase selector: #testIllegal). ^suite

Here is an Object Explorer picture of the suite and the TestResult we get back when we run it.

The test methods shown above only cover a fraction of the functionality in Set. Writing tests for all the public methods in Set is a daunting task. However, as Hal Hildebrand told me after using an earlier version of this framework, "If the underlying objects don't work, nothing else matters. You have to write the tests to make sure everything is working."