There is a plethora of software testing techniques available to a development team. A survey by Zhu, identified over 200 unit testing techniques. However, for the services’ operational test agencies, there has been a continuing, unanswered question of how to test software’s impact on a system’s mission effectiveness. I propose a task-based approach as part of an integrated test strategy in an effort to answer this long-standing question.
From a speech by Lloyd K. Mosemann II, at the time the Deputy Assistant Secretary for the Air Force (Communications, Computers, and Support Systems), a customer’s concerns are:
They want systems that are on-time, within budget, that satisfy user requirements, and are reliable.
A report from the National Research Council refines the latter two concerns in his statement by presenting two broad objectives for operational testing:
to help certify, through significance testing, that a system’s performance satisfies its requirements as specified in the ORD and related documents, and
to identify any serious deficiencies in the system design that need correction before full rate production
Following the path from the system level to software, these two reasons are consistent with the two primary reasons for testing software or software intensive systems. Stated generically, these are:
test for defects so they can be fixed, and
test for confidence in the software
The literature often refers to these as “debug” and “operational” testing, respectively. Debug testing is usually conducted using a combination of functional test techniques and structural test techniques. The goal is to locate defects in the most cost-effective manner and correct the defects, ensuring the performance satisfies the user requirements. Operational testing is based on the expected usage profile for a system. The goal is to estimate the confidence in a system, ensuring the system is reliable for its intended use.
Task-based testing, as I define it here, is a variation on operational testing. It uses current DoD doctrine and policy to build a framework for designing tests. The particular techniques are not new, rather it leverages commonly accepted techniques by placing them within the context of current DoD operational and acquisition strategies.
Task-based testing, as the name implies, uses task analysis. Within the DoD, this begins with the Uniform Joint Task List and, in the case of the Air Force, is closely aligned with the Air Force Task List (AFTL). The AFTL “...provides a comprehensive framework for all of the tasks that the Air Force performs.” Through a series of hierarchical task analyses, each unit within the service creates a Mission Essential Task List (METL). The Mission Essential Tasks (METs) are “...only those tasks that represent the indispensable tasks to that particular organization.”
METLs, however, only describe “what” needs to be done, not “how” or “who.” Further task decomposition identifies the system(s) and people required to carry out a mission essential task. Another level of decomposition results in the system tasks (i.e. functions) a system must provide. This is, naturally, the level in which developers and testers are most interested. From a tester’s perspective, this framework identifies the most important functions to test by correlating functions against the mission essential tasks a system is designed to support.
This is distinctly different from the typical functional testing or “test-to-spec” approach where each function or specification carries equal importance. Ideally, there should be no function or specification which does not contribute to a task, but in reality there are often requirements, specifications, and capabilities which do not or minimally support a mission essential task. Using task analysis, one identifies those functions impacting the successful completion of mission essential tasks and highlights them for testing.
The above process alone has great benefit in identifying what functions are the most important to test. However, the task analysis above only identifies the mission essential tasks and functions, not their frequency of use. Greater utility can be gained by combining the mission essential tasks with an operational profile an estimate of the relative frequency of inputs that represent field use. This has several benefits:
“...offers a basis for reliability assessment, so that the developer can have not only the assurance of having tried to improve the software, but also has an estimate of the reliability actually achieved.”
“...provides a common base for communicating with the developers about the intended use of the system and how it will be evaluated.”
“When testing schedules and budgets are tightly constrained, this design yields the highest practical reliability because if failures are seen they would be the high frequency failures.”
The first benefit has the advantage of applying statistical techniques, both in the design of tests and in the analysis of resulting data. Software reliability estimation methods such as those are available to estimate both the expected field reliability and the rate of growth in reliability. This directly supports an answer to the long-standing question about software’s mpact on a system’s mission effectiveness as well as answering Mr. Mosemann II’s fourth concern a customer has (is it reliable).
Operational profiles are criticized as being difficult to develop. However, as part of its current operations and acquisition strategy, the DoD inherently develops an operational profile. At higher levels, this is reflected in such documents as the Analysis of Alternatives (AOA), the Operational Requirements Document (ORD), Operations Plans, Concept of Operations (CONOPS), etc. Closer to the tester’s realm is the interaction between the user and the developer which the current acquisition strategy encourages. The tester can act as a facilitator in helping the user refine his or her needs while providing insight to the developer on expected use. This highlights the second benefit above the communication between the user, developer, and tester.
The third benefit is certainly of interest in today’s environment of shrinking budgets and manpower, shorter schedules (spiral acquisition), and greater demands on a system. Despite years of improvement in the software development process, one still sees systems which have gone through intensive debug testing (statement coverage, branch coverage, etc.) and “test-to-spec,” but still fail to satisfy the customer’s concerns as stated by Mr. Mosemann II. By involving a customer early in the process to develop an operational profile, the most needed functions to support a task will be developed and tested first, increasing the likelihood of satisfying the customer’s four concerns.
Task-Based Software Testing
Task-based software testing, as defined herein, is the combination of a task analysis and an operational profile. The task analysis helps partition the input domain into mission essential tasks and the system functions which support them. Operational profiles, based on these tasks, are developed to further focus the testing effort.
Operational testing is not without its weaknesses. As a rather obvious example of this, one can raise the question, “What about a critical feature that is seldom executed?” Operational testing, or task-based testing as defined herein, does not address such questions well. Debug testing, with the explicit goal of locating defects in a cost-effective manner, is more suited to this.
Debug testing is “...directed at finding as many bugs as possible, by either sampling all situations likely to produce failures (e.g., methods informed by code coverage or specification criteria), or concentrating on those that are considered most likely to produce failures (e.g., stress testing or boundary testing methods).” survey of unit testing methods are examples of debug testing methods. These include such techniques as statement testing, branch testing, basis path testing, etc. Typically associated with these methods are some criteria based on coverage, thus they are sometimes referred to as coverage methods. Debug testing is based on a tester’s hypothesis of the likely types and locations of bugs. Consequently, the effectiveness of this method depends heavily on whether the tester’s assumptions are correct.
If a developer and/or tester has a process in place to correctly identify the potential types and locations of bugs, then debug testing may be very effective at finding bugs. If a “standard” or “blind” approach is used, such as statement testing for its own sake, the testing effort may be ineffectual and wasted. A subtle hazard of debug testing is that it may uncover many failures, but in the process wastes test and repair effort without notably improving the software because the failures occur at a negligible rate during field use.
Integration of Test Methods
Historically, a system’s developer relied on debug testing (which includes functional or “test-to-spec” testing). Testing with the perspective of how the system would by employed was not seen until an operational test agency (OTA) became involved. Even on the occasions when developmental test took on an operational flavor, this is viewed as too late in the process. This historical approach to testing amplifies the weaknesses of both operational and debug testing. I propose that task-based software testing be accelerated to a much earlier point in the acquisition process. This has the potential of countering each respective method’s weaknesses with the other’s strengths. This view is supported by the current philosophy in the test community, to develop a combined test force spanning contractor, developmental, and operational test (CT/DT/OT).
Task-based software evaluation is a combination of demonstrated, existing methods (task analysis and operational testing). Its strength lies in matching well with the DoD’s current operational strategy of mission essential tasks and the acquisition community’s goal to deliver operational capability quickly. By integrating task-based software testing with existing debug testing, the risk of meeting the customer’s four concerns (on-time, within budget, satisfies requirements, and is reliable) can be reduced.