Wednesday, November 30, 2005

Load Testing

What is Load Testing?

Definition of Load Testing

- Load testing is the act of testing a system under load.

Load testing is usually carried out to a load 1.5x the SWL (Safe Working Load) periodic recertification is required.

Load testing is a way to test performance of an application/product.

In software engineering it is a blanket term that is used in many different ways across the professional software testing community.

Testing an application under heavy but expected loads is known as load testing. It generally refers to the practice of modeling the expected usage of a software program by simulating multiple users accessing the system's services concurrently. As such, load testing is most relevant for a multi-user system, often one built using a client/server model, such as a web server tested under a range of loads to determine at what point the system's response time degrades or fails. Although you could perform a load test on a word processor by or graphics editor forcing it read in an extremely large document; on a financial package by forcing to generate a report based on several years' worth of data, etc.

There is little agreement on what the specific goals of load testing are. The term is often used synonymously with performance testing, reliability testing, and volume testing.


Why is load testing important?

Increase uptime and availability of the system

Load testing increases your uptime of your mission-critical systems by helping you spot bottlenecks in your systems under large user stress scenarios before they happen in a production environment.

Measure and monitor performance of your system

Make sure that your system can handle the load of thousands of concurrent users.

Avoid system failures by predicting the behavior under large user loads

It is a failure when so much effort is put into building a system only to realize that it won't scale anymore. Avoid project failures due to not testing high-load scenarios.

Protect IT investments by predicting scalability and performance

Building a product is very expensive. The hardware, the staffing, the consultants, the bandwidth, and more add up quickly. Avoid wasting money on expensive IT resources and ensure that the system will all scale with load testing.

What is a Test Plan?


Definition of a Test Plan


A test plan can be defined as a document describing the scope, approach, resources, and schedule of intended testing activities. It identifies test items, the features to be tested, the testing tasks, who will do each task, and any risks requiring contingency planning.


In software testing, a test plan gives detailed testing information regarding an upcoming testing effort, including

  • Scope of testing
  • Schedule
  • Test Deliverables
  • Release Criteria
  • Risks and Contingencies

It is also be described as a detail of how the testing will proceed, who will do the testing, what will be tested, in how much time the test will take place, and to what quality level the test will be performed.


Few other definitions –

The process of defining a test project so that it can be properly measured and controlled. The test planning process generates a high level test plan document that identifies the software items to be tested, the degree of tester independence, the test environment, the test case design and test measurement techniques to be used, and the rationale for their choice.



A testing plan is a methodological and systematic approach to testing a system such as a machine or software. It can be effective in finding errors and flaws in a system. In order to find relevant results, the plan typically contains experiments with a range of operations and values, including an understanding of what the eventual workflow will be.



Test plan is a document which includes, introduction, assumptions, list of test cases, list of features to be tested, approach, deliverables, resources, risks and scheduling.



A test plan is a systematic approach to testing a system such as a machine or software. The plan typically contains a detailed understanding of what the eventual workflow will be.



A record of the test planning process detailing the degree of tester indedendence, the test environment, the test case design techniques and test measurement techniques to be used, and the rationale for their choice.



Best practices for testing J2EE database applications


Best practices for testing J2EE database applications

It wasn't too long ago that quality assurance (QA) teams played the leading (if not the only) role when it came to software testing. These days, however, developers are making an enormous contribution to software quality early in the development process, using automated unit-testing techniques. Most developers take for granted the need to use tools such as JUnit for comprehensive, automated testing, but there is little consensus on how to apply unit testing to the database operations associated with an application.

Let us have a look at the best practices that can help you get the most out of your database testing environment and can ensure that your applications are robust and resilient. The primary focus here is on J2EE applications interfacing with Oracle Database through JDBC, although the concepts also apply to applications written for other application environments, such as .NET.

Database Operations Tests:

Testing J2EE applications is often difficult and time-consuming at best, but testing the database operations portion of a J2EE application is especially challenging. Database operations tests must be able to catch logic errors—when a query returns the wrong data, for example, or when an update changes the database state incorrectly or in unexpected ways.

For example, say you have a USER class that represents a single USER table and that database operations on the USER table are encapsulated by a Data Access Object (DAO), UserDAO, as follows:

public interface UserDAO  {
  /**   * Returns the list of users      
              * with the given name
          * or at least the minimum age
   */
  public List listUsers(String name, Integer minimumAge);
  /**
   * Returns all the users
   * in the database
   */
  public List listAllUsers();
    } 

In this simple UserDAO interface, the listUsers() method should return all rows (from the USER table) that have the specified name or the specified value for minimum age. A test to determine whether you've correctly implemented this method in your own classes must take into consideration several questions:

  • Does the method call the correct SQL (for JDBC applications) or the correct query-filtering expression (for object role modeling [ORM]-based applications)?
  • Is the SQL- or query- filtering expression correctly written, and does it return the correct number of rows?
  • What happens if you supply invalid parameters? Does the method behave as expected? Does it handle all boundary conditions appropriately?
  • Does the method correctly populate the users list from the result set returned from the database?

Thus, even a simple DAO method has a host of possible outcomes and error conditions, each of which should be tested to ensure that an application works correctly. And in most cases, you'll want the tests to interact with the database and use real data—tests that operate purely at the individual class level or use mock objects to simulate database dependencies will not suffice. Database testing is equally important for read/write operations, particularly those that apply many changes to the database, as is often the case with PL/SQL stored procedures.

The bottom line: Only through a solid regime of database tests can you verify that these operations behave correctly.

The best practices in this article pertain specifically to designing tests that focus on these types of data access challenges. The tests must be able to raise nonobvious errors in data retrieval and modification that can occur in the data access abstraction layer. The article's focus is on database operations tests—tests that apply to the layer of the J2EE application responsible for persistent data access and manipulation. This layer is usually encapsulated in a DAO that hides the persistence mechanism from the rest of the application.

Best Practices

The following are some of the testing best practices:

Practice 1: Start with a "testable" application architecture.
Practice 2: Use precise assertions.
Practice 3: Externalize assertion data.
Practice 4: Write comprehensive tests.
Practice 5: Create a stable, meaningful test data set.
Practice 6: Create a dedicated test library.
Practice 7: Isolate tests effectively.
Practice 8: Partition your test suite.
Practice 9: Use an appropriate framework, such as DbUnit, to facilitate the process.

Bugzilla

What is BugZilla?

BugZilla is a bug tracking system(also called as issue tracking system).

Bug tracking systems allow individual or group of developers effectively to keep track of outstanding problems with their product. Bugzilla was originally in a programming language called TCL, to replace a rudimentary bug-tracking database used internally by Netscape Communications. Terry later ported Bugzilla to Perl from TCL, and in Perl it remains to this day. Most commercial defect-tracking software vendors at the time charged enormous licensing fees, and Bugzilla quickly became a favorite of the open-source crowd (with its genesis in the open-source browser project, Mozilla). It is now the de-facto standard defect-tracking system against which all others are measured.

Bugzilla boasts many advanced features. These include:

  • Powerful searching

  • User-configurable email notifications of bug changes

  • Full change history

  • Inter-bug dependency tracking and graphing

  • Excellent attachment management

  • Integrated, product-based, granular security schema

  • Fully security-audited, and runs under Perl's taint mode

  • A robust, stable RDBMS back-end

  • Web, XML, email and console interfaces

  • Completely customisable and/or localisable web user interface

  • Extensive configurability

  • Smooth upgrade pathway between versions

Why Should We Use Bugzilla?

For many years, defect-tracking software has remained principally the domain of large software development houses. Even then, most shops never bothered with bug-tracking software, and instead simply relied on shared lists and email to monitor the status of defects. This procedure is error-prone and tends to cause those bugs judged least significant by developers to be dropped or ignored.

These days, many companies are finding that integrated defect-tracking systems reduce downtime, increase productivity, and raise customer satisfaction with their systems. Along with full disclosure, an open bug-tracker allows manufacturers to keep in touch with their clients and resellers, to communicate about problems effectively throughout the data management chain. Many corporations have also discovered that defect-tracking helps reduce costs by providing IT support accountability, telephone support knowledge bases, and a common, well-understood system for accounting for unusual system or software issues.

But why should you use Bugzilla?

Bugzilla is very adaptable to various situations. Known uses currently include IT support queues, Systems Administration deployment management, chip design and development problem tracking (both pre-and-post fabrication), and software and hardware bug tracking for luminaries such as Redhat, NASA, Linux-Mandrake, and VA Systems. Combined with systems such as CVS, Bonsai, or Perforce SCM, Bugzilla provides a powerful, easy-to-use solution to configuration management and replication problems.

Bugzilla can dramatically increase the productivity and accountability of individual employees by providing a documented workflow and positive feedback for good performance. How many times do you wake up in the morning, remembering that you were supposed to do something today, but you just can't quite remember? Put it in Bugzilla, and you have a record of it from which you can extrapolate milestones, predict product versions for integration, and follow the discussion trail that led to critical decisions.

Ultimately, Bugzilla puts the power in your hands to improve your value to your employer or business while providing a usable framework for your natural attention to detail and knowledge store to flourish.

Thursday, November 24, 2005

Important Considerations for Test Automation


Important Considerations for Test Automation

Often when a test automation tool is introduced to a project, the expectations for the return on investment are very high. Project members anticipate that the tool will immediately narrow down the testing scope, meaning reducing cost and schedule. However, I have seen several test automation projects fail - miserably.

The following very simple factors largely influence the effectiveness of automated testing, and if not taken into account, the results is usually a lot of lost effort, and very expensive ‘shelfware’.

  1. Scope - It is not practical to try to automate everything, nor is there the time available generally. Pick very carefully the functions/areas of the application that are to be automated.

  1. Preparation Timeframe - The preparation time for automated test scripts has to be taken into account. In general, the preparation time for automated scripts can be up to 2/3 times longer than for manual testing. In reality, chances are that initially the tool will actually increase the testing scope. It is therefore very important to manage expectations. An automated testing tool does not replace manual testing, nor does it replace the test engineer. Initially, the test effort will increase, but when automation is done correctly it will decrease on subsequent releases.

  1. Return on Investment - Because the preparation time for test automation is so long, I have heard it stated that the benefit of the test automation only begins to occur after approximately the third time the tests have been run.

  1. When is the benefit to be gained? Choose your objectives wisely, and seriously think about when & where the benefit is to be gained. If your application is significantly changing regularly, forget about test automation - you will spend so much time updating your scripts that you will not reap many benefits. [However, if only disparate sections of the application are changing, or the changes are minor - or if there is a specific section that is not changing, you may still be able to successfully utilise automated tests]. Bear in mind that you may only ever be able to do a complete automated test run when your application is almost ready for release – i.e. nearly fully tested!! If your application is very buggy, then the likelihood is that you will not be able to run a complete suite of automated tests – due to the failing functions encountered.

  1. The Degree of Change – The best use of test automation is for regression testing, whereby you use automated tests to ensure that pre-existing functions (e.g. functions from version 1.0 - i.e. not new functions in this release) are unaffected by any changes introduced in version 1.1. And, since proper test automation planning requires that the test scripts are designed so that they are not totally invalidated by a simple gui change (such as renaming or moving a particular control), you need to take into account the time and effort required to update the scripts. For example, if your application is significantly changing, the scripts from version 1.0. may need to be completely re-written for version 1.1., and the effort involved may be at most prohibitive, at least not taken into account! However, if only disparate sections of the application are changing, or the changes are minor, you should be able to successfully utilise automated tests to regress these areas.

  1. Test Integrity - how do you know (measure) whether a test passed or failed ? Just because the tool returns a ‘pass’ does not necessarily mean that the test itself passed. For example, just because no error message appears does not mean that the next step in the script successfully completed. This needs to be taken into account when specifying test script fail/pass criteria.

  1. Test Independence - Test independence must be built in so that a failure in the first test case won't cause a domino effect and either prevent, or cause to fail, the rest of the test scripts in that test suite. However, in practice this is very difficult to achieve.

  1. Debugging or "testing" of the actual test scripts themselves - time must be allowed for this, and to prove the integrity of the tests themselves.

  1. Record & Playback - DO NOT RELY on record & playback as the SOLE means to generates a script. The idea is great. You execute the test manually while the test tool sits in the background and remembers what you do. It then generates a script that you can run to re-execute the test. It's a great idea - that rarely works (and proves very little).

  1. Maintenance of Scripts - Finally, there is a high maintenance overhead for automated test scripts - they have to be continuously kept up to date, otherwise you will end up abandoning hundreds of hours work because there has been too many changes to an application to make modifying the test script worthwhile. As a result, it is important that the documentation of the test scripts is kept up to date also.

Monday, November 14, 2005

Educational Seminars On Software Testing


From: Director of Education <Education@iist.org>
Date: Nov 11, 2005 12:43 PM
Subject: Educational Seminars from Int'l Institute for Software Testing


Dear Software Test and Quality Professional,

We will give your test team practical advice free of charge.

Our Free educational seminars are very well received by test and
quality professionals around the country.

These seminars are packed with practical advice and fresh ideas
for the whole test team.

These seminars are taught by Dr. Magdy Hanna, an internationally
recognized speaker and practitioner in Software Testing and
quality assurance.

Nothing to loose, lots to gain. Still not sure? Read what
Other test professionals said about these seminars at:
http://www.testinginstitute.com/seminar/testimonials.php

In accordance with its charter of promoting disciplined
software test practices, the International Institute
for Software Testing is offering a series of educational
seminars around the country. These seminars are offered
free of charge to software test professionals seeking
to improve their testing career.

Do not miss these purely educational opportunities.

OUR PROMISE: NO SALES PITCHES, NO NONSENSE, ONLY EDUCATION


CURRENTLY SCHEDULED SESSIONS
============================

More sessions will be added on a regular basis. Please visit http://www.testinginstitute.org/seminar.php for update.


** San Francisco, CA, December 8, 2005
----------------------------------------
This is a full day free seminar
Morning session: 8:30-12:00, Disciplined Software Testing Practices
Afternoon session: 1:00-4:30, Effective Test Management Practices

** Orlando, FL, January 19, 2006
----------------------------------
This is a full day free seminar
Morning session: 8:30-12:00, Disciplined Software Testing Practices
Afternoon session: 1:00-4:30, Effective Test Management Practices

** Minneapolis, MN, February 17, 2006
-------------------------------------
This is a half day free seminar
Morning session: 8:30-12:00, Disciplined Software Testing Practices

** Washington, DC, April 27, 2006
----------------------------------
This is a half day free seminar
Morning session: 8:30-12:00, Disciplined Software Testing Practice

Seminar outlines and registration information at:
http://www.testinginstitute.org/seminar.php


Testimonials:
This is a world-class event: 98% of past attendees "recommend
this seminar" to their peers

"What a nice surprise...I thought this was just a marketing tool.
The instructor was very detailed in this seminar and makes me
proud to be in the testing profession."

"It was a great session. I was not expecting as much information
in a FREE seminar. I definitely learned new tools to use on my
projects."

"The instructor was absolutely the most dynamic speaker I have
ever heard. I thoroughly enjoyed the seminar. I would love to
attend additional training."

"The fact that the sales pitch wasn't what this was about was
invaluable to make me want to come back for more courses."

"Thank you for organizing such a beneficial event! I'll be able
to apply most of the methods in my everyday work. The methods
provided in this seminar will improve the communications between
business, software engineers and test engineers, and will reduce
the possibility of wrong interpretation of the project's
requirements. All of it leads to minimizing time, resources,
and cost for achieving the desired quality product."

To see more testimonials, visit
http://www.testinginstitute.com/seminar/testimonials.php

Questions? Please call our office at the number below.

==========================================

Department of Education and Professional Development
International Institute for Software Testing
636 Mendelssohn Ave. North
Golden Valley, MN 55427
763-546-0072
http://www.iist.org

Wednesday, November 09, 2005

History's Worst Software Bugs

Last month automaker Toyota announced a recall of 160,000 of its Prius hybrid vehicles following reports of vehicle warning lights illuminating for no reason, and cars' gasoline engines stalling unexpectedly. But unlike the large-scale auto recalls of years past, the root of the Prius issue wasn't a hardware problem -- it was a programming error in the smart car's embedded code. The Prius had a software bug.

With that recall, the Prius joined the ranks of the buggy computer -- a club that began in 1947 when engineers found a moth in Panel F, Relay #70 of the Harvard Mark 1 system. The computer was running a test of its multiplier and adder when the engineers noticed something was wrong. The moth was trapped, removed and taped into the computer's logbook with the words: "first actual case of a bug being found. 

Read the complete article here... 
 

http://wired.com/news/technology/bugs/0,2924,69355,00.html?tw=wn_tophead_1

 

Monday, November 07, 2005

Fundamentals of Software Testing

 

Objectives of Testing

Finding of Errors - Primary Goal

Trying to prove that software does not work. Thus, indirectly verifying that software meets requirements
Software Testing
Software testing is the process of testing the functionality and correctness of software by running it. Software testing is usually performed for one of two reasons:
(1) defect detection
(2) reliability or Process of executing a computer program and comparing the actual behavior with the expected behavior

What is the goal of Software Testing?
* Demonstrate That Faults Are Not Present
* Find Errors
* Ensure That All The Functionality Is Implemented
* Ensure The Customer Will Be Able To Get His Work Done

Modes of Testing
* Static Static Analysis doesn¡¦t involve actual program execution. The code is examined, it is tested without being executed Ex: - Reviews
* Dynamic In Dynamic, The code is executed. Ex:- Unit testing

Testing methods
* White box testing Use the control structure of the procedural design to derive test cases.
* Black box testing Derive sets of input conditions that will fully exercise the functional requirements for a program.
* Integration Assembling parts of a system

Verification and Validation
* Verification: Are we doing the job right? The set of activities that ensure that software correctly implements a specific function. (i.e. The process of determining whether or not products of a given phase of the software development cycle fulfill the requirements established during previous phase). Ex: - Technical reviews, quality & configuration audits, performance monitoring, simulation, feasibility study, documentation review, database review, algorithm analysis etc
* Validation: Are we doing the right job? The set of activities that ensure that the software that has been built is traceable to customer requirements.(An attempt to find errors by executing the program in a real environment ). Ex: - Unit testing, system testing and installation testing etc

What's a 'test case'?
A test case is a document that describes an input, action, or event and an expected response, to determine if a feature of an application is working correctly. A test case should contain particulars such as test case identifier, test case name, objective, test conditions/setup, input data requirements, steps, and expected results

What is a software error ?
A mismatch between the program and its specification is an error in the program if and only if the specifications exists and is correct.

Risk Driven Testing
What if there isn't enough time for thorough testing?
Use risk analysis to determine where testing should be focused. Since it's rarely possible to test every possible aspect of an application, every possible combination of events, every dependency, or everything that could go wrong, risk analysis is appropriate to most software development projects. This requires judgement skills, common sense, and experience.

Considerations can include:
- Which functionality is most important to the project's intended purpose?
- Which functionality is most visible to the user?
- Which aspects of the application are most important to the customer?
- Which parts of the code are most complex, and thus most subject to errors?
- What do the developers think are the highest-risk aspects of the application?
- What kinds of tests could easily cover multiple functionality?
Whenever there's too much to do and not enough time to do it, we have to prioritize so that at least the most important things get done. So prioritization has received a lot of attention. The approach is called Risk Driven Testing. Here's how you do it: Take the pieces of your system, whatever you use - modules, functions, section of the requirements - and rate each piece on two variables, Impact and Likelihood.


Risk has two components: Impact and Likelihood

Impact
is what would happen if this piece somehow malfunctioned. Would it destroy the customer database? Or would it just mean that the column headings in a report didn't quite line up?

Likelihood
is an estimate of how probable it is that this piece would fail. Together, Impact and Likelihood determine Risk for the piece.


Test Planning

What is a test plan?
A software project test plan is a document that describes the objectives, scope, approach, and focus of a software testing effort. The process of preparing a test plan is a useful way to think through the efforts needed to validate the acceptability of a software product.

Elements of test planning
* Establish objectives for each test phase
* Establish schedules for each test activity
* Determine the availability of tools, resources
* Establish the standards and procedures to be used for planning and conducting the tests and reporting test results
* Set the criteria for test completion as well as for the success of each test

The Structured Approach to Testing

Test Planning
* Define what to test
* Identify Functions to be tested
* Test conditions
* Manual or Automated
* Prioritize to identify Most Important Tests
* Record Document References

Test Design
* Define how to test
* Identify Test Specifications
* Build detailed test scripts
* Quick Script generation
* Documents

Test Execution
* Define when to test
* Build test execution schedule
* Record test results


Bug Overview

What is a software error?
A mismatch between the program and its specification is an error in the Program if and only if the specification exists and is correct.
Example: -
* The date on the report title is wrong
* The system hangs if more than 20 users try to commit at the same time
* The user interface is not standard across programs

Categories of Software errors
* User Interface errors
* Functionality errors
* Performance errors
* Output errors
* documentation errors

What Do You Do When You Find a Bug?
IF A BUG IS FOUND,
* alert the developers that a bug exists
* show them how to reproduce the bug
* ensure that if the developer fixes the bug it is fixed correctly and the fix
* didn't break anything else
* keep management apprised of the outstanding bugs and correction trends

Bug Writing Tips
Ideally you should be able to write bug report clearly enough for a developer to reproduce and fix the problem, and another QA engineer to verify the fix without them having to go back to you, the author, for more information.
To write a fully effective report you must :-
* Explain how to reproduce the problem
* Analyze the error so you can describe it in a minimum number of steps
* Write a report that is complete and easy to understand


Product Test Phase - Product Testing Cycle

Pre-Alpha
Pre-Alpha is the test period during which QA, Information Development and other internal users make the product available for internal testing.
Alpha

Alpha is the test period during which the product is complete and usable in a test environment but not necessarily bug-free. It is the final chance to get verification from customers that the tradeoffs made in the final development stage are coherent.
Entry to Alpha
* All features complete/testable (no urgent bugs or QA blockers)
* High bugs on primary platforms fixed/verified
* 50% of medium bugs on primary platforms fixed/verified
* All features tested on primary platforms
* Alpha sites ready for install
* Final product feature set Determined

Beta
Beta is the test period during which the product should be of "FCS quality" (it is complete and usable in a production environment). The purpose of the Beta ship and test period is to test the company's ability to deliver and support the product (and not to test the product itself). Beta also serves as a chance to get a final "vote of confidence" from a few customers to help validate our own belief that the product is now ready for volume shipment to all customers.
Entry to Beta

* At least 50% positive response from Alpha sites
* All customer bugs addressed via patches/drops in Alpha
* All bugs fixed/verified
* Bug fixes regression tested
* Bug fix rate exceeds find rate consistently for two weeks
* Beta sites ready for install

GM (Golden Master)
GM is the test period during which the product should require minimal work, since everything was done prior to Beta. The only planned work should be to revise part numbers and version numbers, prepare documentation for final printing, and sanity testing of the final bits.
Entry to Golden Master

* Beta sites declare the product is ready to ship
* All customer bugs addressed via patches/drops in Beta
* All negative responses from sites tracked and evaluated
* Support declares the product is supportable/ready to ship
* Bug find rate is lower than fix rate and steadily decreasing

FCS (First Customer Ship)
FCS is the period which signifies entry into the final phase of a project. At this point, the product is considered wholly complete and ready for purchase and usage by the customers.
Entry to FCS

* Product tested for two weeks with no new urgent bugs
* Product team declares the product is ready to ship
================================

 

Thursday, November 03, 2005

Thread Based Integration Testing

 

Introduction

Our organization has recently completed the development of a large-scale command and control system through the implementation and formal qualification phases of the project. This development involved over eighty software engineers developing roughly 1.5 million source lines of code using multiple languages and platforms. In order to deliver the product within the projected schedule, parallel development and rapid integration occurred over many related software functional areas. To facilitate the decomposition of our design into manageable components we chose the concept of a “functional thread” as the elementary building block for integration. In this context, a “functional thread” is defined as a logical execution sequence through a series of interfacing software components resulting from or ending in the receipt of a message, event or operator interaction.

Threads not only serve as the basis for integration, they also tend to drive the entire software development effort from scheduling to status reporting. Each thread itself represents a microcosm of the system in that each has a documented definition and general execution path, an internal design and an associated test. Thread definition intends to communicate functional background and execution details between developers and from developers to testers. More importantly, the desired independence of threads supports incremental integration and system testing while the corresponding thread definition substantiates the results. Finally, since all system development activity progresses in relation to threads, management has an accurate method of judging the status of individual tasks, functional areas and requirements.


Threads

Thread Figure 1

Keeping the goals of iterative development and testing in mind, each thread has its own lifecycle with autonomous states and a formal process for state transitions (see Figure 1). Individual team leaders usually decompose general requirements into groups of threads at the beginning of formal, six month software builds and assign threads to developers. Developers maintain ownership of their threads and are responsible for documenting a scenario under which an integrator can verify the basic functionality, providing rudimentary definition to the thread. Following implementation and unit test, the developer releases the corresponding software components to a daily integration build, at which point the thread enters a “testable” state. After verifying the functionality in the integration build, the developer marks the thread “ready” for an integrator who performs more extensive testing and eventually “integrates” the thread and corresponding software components into the system. At the end of each formal build, a team of key engineers in conjunction with quality assurance checks all threads against requirements as a regression test and “finalizes” those threads which pass.

While the development team originally tracked threads manually, we quickly developed a shared database application to serve as a central repository for thread development, maintenance and tracking. The database provides a formal mechanism for defining and documenting threads, changing thread status and reporting status to project management. Moreover, the database manages references between threads: threads can serve as preconditions to other threads and developers may incorporate thread test steps from previous threads. Most importantly, the interface helps enforce the process by demonstrating the autonomy of thread status and establishing clearly defined responsibilities among developers and testers.

Thread Test Steps

Thread test steps and other background information from the database serve as a contract between developers and integrators. Integrators use thread test steps as a simple scenario to identify the scope of a thread rather than as a rigid test case that may only rubber-stamp a developer’s unit test. Consequently, the integrators are responsible for developing several execution scenarios within the boundaries of the thread and applying appropriate testing mechanisms such as known exceptional cases and boundary checking. Furthermore, the integration team often stresses exercising subsystem interfaces during integration testing, which was an area that thread steps often overlooked.

In addition to helping formalize the implementation process, the thread testing approach standardizes the integration testing process as well. As a result, the number of detected coding errors increased almost 250 percent over three formal builds after thread testing had been introduced. Although errors attributable to integration doubled during the first formal build during which our group used threads, that number has subsequently dropped to almost fifty percent below the level at which we started using threads.

While thread-based development certainly contributes greatly to the main goals of early, rapid integration and iterative development, we have also identified several potential areas of further process improvement. Perhaps most notably, developers and testers shared concerns that thread scope lacked uniformity among subsystems. At times, thread definitions were far too specific and a conscientious integrator could verify the basic functionality in fewer steps than the developer identified. Likewise, developers sometimes defined threads at too high a level, requiring the integrator to seek further information from the developer to ensure a meaningful test. A thread review process, perhaps as part of a design walk through, may answer this problem. Likewise, we recommend requiring completion of a code walk through as a prerequisite to thread completion due to the implications of walk through initiated design and code changes.

Thread Maintenance

A related area of improvement is thread maintenance. While the process encouraged (and the database supported) threads referencing other threads, maintaining consistency was not always an easy task. Furthermore, while software that composes a thread often changes after a thread has been integrated, there is no formal update process for the thread. The changes to process here are obvious and one could modify the tool to help enforce these concerns. For example, the tool would benefit from the ability to attach references to source code units so that changes to code might trigger the need for associated thread changes.

In this project the thread process focused on the integration activities rather than the full development lifecycle. This is certainly the main difference between our thread-based approach and use-case analysis. The thread database requires references to user interface specifications where applicable, but the process did not link the thread directly to the requirements database. Thus software testing and overall system testing were somewhat disjoint in that system testers found it difficult to use the thread database as a reference when creating test cases. Though it might be desirable to shift thread definition to the requirements analysis phases of the project, such analysis usually occurs at a higher level than what we had used for our threads and almost always span subsystem boundaries. Instead we suggest a more hierarchical approach to thread definition rooted in requirement-based parent threads. This would directly link the software thread repository to system requirements and better facilitate a similar iterative approach to system-wide testing. Finally, by linking threads directly to requirements, project management would have better insight about the status of entire requirements.

Since threads drove the software efforts and status, developers viewed threads as the most visible formal process in place. The simplicity of the process, accurate status and integration efficiency contributed to the development team’s acceptance of the process and enthusiasm to suggest improvements. In addition, the empirical results suggest that the introduction of thread-based testing exposed design and coding errors earlier and attributed fewer errors to the integration process itself, probably due to the enhanced communication between developers and testers. In short, our method appears to have synchronized the notion of task completion among developers, testers and management.


Summary

Thread-based integration testing played a key role in the success of this software project. At the lowest level, it provided integrators with better knowledge of the scope of what to test, in effect a contract between developers and testers. At the highest level, it provided a unified status tracking method and facilitated an agreement between management and the developers as to what would be delivered during each formal build. Furthermore, instead of testing software components directly, it required integrators to focus on testing logical execution paths in the context of the entire system. Because of this, it strongly supported the goals of early, rapid integration coupled with an iterative development approach. In summary, the thread approach resulted in tangible executable scenarios driving development and integration while the autonomous, well-defined thread states strengthened the use of threads as an accurate method of scheduling and tracking status.

Monday, October 31, 2005

Testing for Zero bugs

The Software Quality Myth

A widely accepted premise on software quality is that software is so complex (in combinatorial terms) that it is impossible to have defect free software. We often hear maxims like "there's always one more bug", and "software testing can reveal the existence of bugs, but never prove their absence".

While the above maxims may be true, I'm afraid they tend to lead us to a state of mind where we accept them as an inevitable evil. If there's no way to make software clean - why bother.

Having some prior experience with software testing, I'm certain that we can do much better than we do today. Once we do this right, the results would pleasantly surprise us.


Conventional testing: How we test software?

Looking at the ways we test software, we see the following methods:

  • Known scenario replay (test suites, regression tests)
  • Random early exposure (Campus alphas, selected customer betas)

The known scenario replay is the single most commonly used method to test software. Unfortunately, it is the least effective method to uncover bugs as proven by the large number of bugs uncovered in later stages.

Regression tests and test suites are necessary but insufficient. They're a good way for conducting sanity checking and ensuring that popular and commonly compiled code runs correctly. However, they have two striking flaws:

  • Coverage is extremely limited. When you run the same suite so many times, obviously your bugs tend to hide elsewhere.

  • It is difficult to track bugs in case of failure. Test suites tend to be big pieces of code. When something goes wrong, we apply some further trial runs and search techniques until the problem is narrowed down to a single routine or line of source code.
Early exposure like alpha (and beta) testing has the advantage that it is much more random than the known-scenario replay. It provides more "real world" coverage, but it has its own flaws:
  • Since it is an informal testing method, reproducibility is a problem (e.g asking a beta customer: what exactly did you do to experience the problem?")
  • It relies on people's good will to investigate problems as they occur, and report bugs accurately and with enough detail when they are found.
  • It suffers for a small scope (time, number of machines, and people employed) compared to the real installed base and its usage patterns.
  • At least the Alpha part doesn't really emulate our customer environment: our environment is far from heterogenic: almost no other vendors' machines (Sun, HP, IBM, Mac, PCs) exist on campus.

Fortunately, there is a complementary testing method that covers the above flaws well.


Monkeys at work: Random Testing

It is said that if you give a zillion monkeys keyboards and let them type long enough, one of them would eventually produce (insert your favorite magnum opus here). Finding bugs is probably much easier than writing magnum opii. If you think this is absurd, just substitute the word "monkeys" with "customers".

For years we have been sending our software to much less than a zillion customers, and yet, without exception, they eventually hit hundreds of undiscovered bugs.

Optimization imperative #1 calls for applying zillions of computer cycles before shipment to try and beat the customers in the race to uncover the bugs. This is what random testing is all about. This is also obvious. The magic is in the details.


Constrained Random Testing: a zillion monkeys, optimized

Since the combinatorial space of software is so huge we would like the proverbial monkeys to be constrained in some way and not purely random. To give an example: if we had our monkeys test UNIX for us, we would want them to type stuff like ls, make, cc etc. rather than having them type stuff like %^3gkl'$#*(&% (*).

``Elementary,'' you may say, but read on.


A proof by example: It just works!

Before going into the details of Constrained Random Testing, let me state that from my experience applying this technique, to one problem space, it works far better than any other conventional testing method I've seen.

I used to work at the National Semiconductor microprocessor design center in Tel Aviv (NSTA) where I was a member of the Compilers team who wrote the compilers for all the 32000 Series of microprocessors.

For several years, our compilers were buggy as hell. Having worked closely with hardware engineers, we were urged to test the compilers "like hardware is tested", using random "vectors" of input. Once we were on the right track in our tought process things started to improve fast. It took one engineer about 4 months to come with a prototype for random compiler testing. He was generating random IR (Intermediate Representation) because it was easier to do than generating high-level code and we just needed a proof of concept. As a result we were testing only the back-end (Optimizer, Code generator, assembler, linker).

The introduction of random testing practically eliminated user bug reports on released back ends. At a certain point, we couldn't believe it ourselves, so we conducted an experiment by running the random program generator (it used to be called RIG, for Random IR Generator) and was implemented by Avi Bloch and Amit Matatia (see acknowledgments) on a previous release of the compiler. To our amazement, RIG was able to find over half of all the bugs reported by customers on the code generator in just one night of runtime.

Thanks to RIG, the GNX Compilers for the 32000 processor family have matured in a very short period to become one of the most reliable in the industry; I remember compiling an early version of perl (I think it was perl 2.0) with full optimizations and passing the whole test suite on a Sequent Balance (a big 32032-CPU SMP machine), while the SunOS and the HP/UX compilers I used for comparison weren't anywhere near this quality at that time (mind you, that was way back, in 1990 or so).

Based on this personal experience, I have no doubt this method is well worth pursuing here.


Testing a Compiler: CRT Outline

Here's the skeleton of RIG:

Loop:
  • Generate a random program subject to constraints
  • Compile it
  • Run it
  • Check that the results make sense:
    • if they do, discard of the program
    • if not, save the program and results for later human inspection

Generating a random program is done using recursive descent on the grammar of the language, while applying constraints at every node on the random generator (explained below). To give a simple example: pick an operand at random, check its arity, generate N random operands to operate on. Each operand can be either a constant or an arbitrary randomly generated new expression. etc.


Constraint Magic: making a random program behave

The outline sounds simple enough, but a few problems immediately come to mind:

  • How can you ensure that a randomly generated program terminates?
  • How can you ensure that a randomly generated program doesn't generate an exception?
  • How can you ensure that a perfectly "safe" optimization won't change the semantics of the program (e.g. use of an undefined variable, the behavior of which is undefined)

This is where the constraints come into play. The embedded constraints are what turns the method from a pure theoretical play into a practical and effective testing method.

  • To ensure termination, we artificially inject an independent counter into every randomly generated loop or recursive function call, if a configurable limit is exceeded, we just break out of the loop. Not perfect, indeed, but simple, straightforward, and does the job.

  • As for exceptions, there are two approaches: one is to simply allow them, if a randomly picked divisor happens to be zero, so be it. The output may be partial, but still deterministic. The other approach (if exceptions happen too often) is to regenerate those expressions that cause the exceptions, or to ensure by construction that they always fall within a certain range of values.

    For example, we ensure that constant divisors are never zero, and that variable divisors are checked at run time (if they are zero don't divide). Likewise we may add a check before array accesses for legal indexes into a pre-generated array. From experience, the second approach is what you want in 99% of the cases, and again, it works well.

  • To make sure the program has outputs (so we can inspect its runtime results) we simply put a hook to print all the randomly generated variables and constants of the program from a customized exit() routine that is linked with the program.

  • Likewise, to ensure there's no use of undefined variables, we simply initialize all of them (with random values of course) just before we jump to main().

This is in essence what the constraints are all about.


Closing the loop: proving correctness

But then there's another more fundamental question:

  • If the program is randomly generated, (i.e. not known in advance) how can you predict its output, in order to verify that it ran correctly?

It turns out that even though the answer to the third question is "You can't", still, from a practical point of view, there's a nice solution to this. Not only it is simple, but it was also empirically proven to work very well.

Solution: Generate a reference output using another compiler or interpreter for the same language. E.g. for C, the reference may be a most vanilla compilation (without any optimizations) by a widely deployed and stable compiler like GNU cc (gcc). This can be even done on a different vendor's system.

Actually, for most cases, you don't even need an additional compiler: You may generate multiple outputs using the same compiler under test, each time with different compilation options. For example: If the result with optimization is different than the one without it, bingo: you've found a bug in the optimizer.


Practice & experience: The importance of space/time tuning

The constraints mentioned above are just part of the story. To be effective and efficient, the random program generator should be space/time tuned, for example: Testing a small array for an unsigned index wrap-around case is effective as checking a big array and is far less consuming in time and space. Thus: elements like the maximum size of a randomly generated array, maximum number of elements in a structure, or the maximum depth of an expression should all be configurable via a configuration file.

Results (in terms of detected bugs per unit of testing time) can change dramatically by fine tuning of these space/time constraint parameters. The general guideline is: strive to use constraints that will create the smallest/simplest case to exhibit the problem.

Some golden rules:

  • Don't generate big data structures.
  • Don't generate big programs, loops, or routines
  • For every possible range of values, use some bias towards extremum and "interesting" values (MAXINT, 1, 0, -1, last index of an array) as opposed to a uniform distribution of values.
  • Think hard about the constraints, let the random generator do the rest
  • For every generated program, run it against as many cases as possible (e.g. many possible compiler options) to amortize the generation overhead, and leverage the reference results over many test cases.

From my experience, all compiler bugs can be narrowed down and reproduced in a very small example. Come to think of it, This is exactly what makes random testing of compilers work so well. Also, the smaller the example, the easier it is to investigate and fix. This is where tuning pays big: if you can generate 100,000 tiny programs per night you'll be much more effective covering the compiler and fixing the bugs than if you generate 1000 larger programs per night.


More Advice

Start with simple expressions, including all types and all possible cast and type conversions, these are easy to test and are a first step in a CRT implementation that is natural to start with and build upon.

Pick a very good random number generator. If you don't and if you don't have much variability in the types of nodes that you randomly generate, your programs will tend to find the same bug multiple times. There are practical ways to minimize this: like changing the configuration parameters once a bug is found (and until the bug is fixed) but this requires further resources and effort.

Generate random C rather than random IR. You cannot compile proprietary formats with a reference compiler (not the one you're testing). You also get to test the language front ends in addition to the back end.

It may be easier to generate IR and upwardly generate high-level constructs from it. This is certainly a sensible strategy, especially in case the mapping between IR and C is one to one. even if not, this will enable testing Fortran with almost no additional effort (since we already have these "reverse compilers").

Generating test cases, running them, and comparing results normally take less time than many compilation with many options, so pick compilation cases that are as different as possible of each other to get better coverage. Also: try to combine several compilation options with each run (i.e. many optimizations together, vs. a vanilla compilation) to achieve good coverage of the compiler in as few compilations as possible.

Jumbo tip: An excellent way of tuning the constraints and the random program generator is to basic-block profile the compiler itself and see what parts of source code are not being exercised by the randomly generated programs. Then, tune the random generator a bit more. In other words, inject some white-box style testing into the random black-box approach.


By ariel faigon
http://www.yendor.com/testing


Wednesday, October 26, 2005

White Box Testing

Definition of White Box Testing - A software testing technique whereby explicit knowledge of the internal workings of the item being tested are used to select the test data.

Unlike black box testing, white box testing uses specific knowledge of programming code to examine outputs. The test is accurate only if the tester knows what the program is supposed to do. He or she can then see if the program diverges from its intended goal. White box testing does not account for errors caused by omission, and all visible code must also be readable.

Contrary to black-box testing, software is viewed as a white-box, or glass-box in white-box testing, as the structure and flow of the software under test are visible to the tester.

Testing plans are made according to the details of the software implementation, such as programming language, logic, and styles. Test cases are derived from the program structure. White-box testing is also called glass-box testing, logic-driven testing or design-based testing.

There are many techniques available in white-box testing, because the problem of intractability is eased by specific knowledge and attention on the structure of the software under test. The intention of exhausting some aspect of the software is still strong in white-box testing, and some degree of exhaustion can be achieved, such as executing each line of code at least once (statement coverage), traverse every branch statements (branch coverage), or cover all the possible combinations of true and false condition predicates (Multiple condition coverage).

Control-flow testing, loop testing, and data-flow testing, all maps the corresponding flow structure of the software into a directed graph. Test cases are carefully selected based on the criterion that all the nodes or paths are covered or traversed at least once. By doing so we may discover unnecessary "dead" code -- code that is of no use, or never get executed at all, which can not be discovered by functional testing.

In mutation testing, the original program code is perturbed and many mutated programs are created, each contains one fault. Each faulty version of the program is called a mutant. Test data are selected based on the effectiveness of failing the mutants. The more mutants a test case can kill, the better the test case is considered. The problem with mutation testing is that it is too computationally expensive to use. The boundary between black-box approach and white-box approach is not clear-cut. Many testing strategies mentioned above, may not be safely classified into black-box testing or white-box testing. It is also true for transaction-flow testing, syntax testing, finite-state testing, and many other testing strategies not discussed in this text. One reason is that all the above techniques will need some knowledge of the specification of the software under test. Another reason is that the idea of specification itself is broad -- it may contain any requirement including the structure, programming language, and programming style as part of the specification content.

We may be reluctant to consider random testing as a testing technique. The test case selection is simple and straightforward: they are randomly chosen. Study indicates that random testing is more cost effective for many programs. Some very subtle errors can be discovered with low cost. And it is also not inferior in coverage than other carefully designed testing techniques. One can also obtain reliability estimate using random testing results based on operational profiles. Effectively combining random testing with other testing techniques may yield more powerful and cost-effective testing strategies.

Monday, October 24, 2005

What is User Acceptance Testing?

 

Concept

UAT goes under many names. As well as user acceptance testing, it is also known as Beta Testing (usually in the PC world) QA Testing, Application Testing, End User Testing or as it is known in the company where I work as Model Office Testing.

 

Developing software is a expensive business. It is expensive in ;

  • time, as the software must be analysed, specified, designed and written
  • people, as very few development projects are one man jobs
  • money, the people responsible for the analysis, specification and development of software do not come cheap (look at the current rates for contractors!)

If having expended all these peoples time and the company's money, if the resulting software is not completely suitable to the purpose required, then that time and money has not been fully utilised.

If the software is suitable to the purpose, but;

  • does not dovetail precisely with the business processes
  • makes processes more difficult to do than before
  • causes business processes to take longer than previously
  • makes additional processes necessary, with making other processes obsolete

then you may not will not see return on your investment in the software until much later, or may not even see a return on your investment.

Question : how do we ensure that we do not end up in this situation?

Answer : we test the software against objective criteria to ensure that we don't

Previously, most testing was left in the hands of the development teams, with the end users trusting those teams to deliver applications that were not only fully functional and stable, but also applications that would dovetail into business processes, and support those processes (maybe even make things a little bit easier)

However, the testing executed by developers is to ensure that the code they have created is stable and functional. They will test that;

  • they cover all the lines and logic paths through their code
  • all the screens flow backwards and forwards in the correct order
  • the software meets the functions specified (eg calculations are correct, reports have correct columns, screens have correct validation etc)

This testing might not be done through the application itself (often because it has not been completely built while they are testing), so they will only add a few records, maybe by editing the file/table and adding the records, rather than using the 'Record Entry Screen'.

THIS IS NOT A PROBLEM.

I AM NOT DEGRADING THE SYSTEM AND UNIT TESTING DONE BY THE DEVELOPERS

As we will see later, this does not pose us a problem, because the UAT department will cover this testing. This system testing and unit testing by the developers is still very valid and useful. I would rather take delivery of an application where the development team say "We have done this, done that, hacked this file and added a few records, ran a few test cases through here - and everything seems OK", then take an application which has not gone through any system testing.

The application that has been tested by the developers will have had most of the obvious flaws identified and ironed out, and only the types of issues the testing was designed for should be identified. The second application will be completely unknown, and some of the time allocated for UAT will be spent identifying and fixing problems problems that could have been easily identified and rectified by the developers.

Also, because the developers are testing their own work, there is a tendency for them to skip areas because they 'know that there won't be a problem there'.

I have spoke to developers who have came to our company from places that do not do UAT, and they are both impressed with how we do things, but also like the idea of an independent third party testing their software.

These people are professional software developers, and they do not want to supply something that isn't exactly what's wanted, and they feel that the UA testing gives them a large comfort zone that any problems with their work will be identified and escalated back to them for correction.

As I said, these issues do not prove a problem to the user acceptance tester.

The four issues of the software delivered not matching the business process, making things more difficult etc are circumvented by the user acceptance tester.

While the developer tests against the system specification and technical documentation, the user acceptance tester test against the business requirements. The former tests the code, the latter the application. We will come to the test planning in a bit.

The issue of the developer testing their own work ceases to be an issue, as the UAT team will design a testing strategy that covers all areas of the business requirements, whether or not the developer feels there may be problems in a specific area.

The issue of additional processes being necessary should also not be a problem. As I said before, the UAT team tests the application against the business requirements, so all testing is done through the use of the proper system transactions.

The UAT team do not hack tables / file to create data, if a client record is needed for a test, then the UAT will create this client, by use of the formal client maintenance transaction, not by adding a record to the 'Client_Details' file.

This use of the formal application transaction transactions serves two purposes

  • it tests all the transactions that the business users shall run, giving complete 'business' coverage (as opposed to code coverage, or logic path coverage)
  • it will highlight any potential areas of adverse impacts on the business processes. If the contents of a form (eg an application form for a new life assurance policy) is used as the basis for creating a new life assurance policy record, then the use of the formal 'New Life Assurance Policy' transaction will determine whether the transaction works, and also whether the form holds the requisite information to create the policy records.

The 'New Life Assurance Policy' system may require the client to declare whether they smoke or not, however, if this question is not on the application form, then the business users will have to re-contact every new client, to determine whether or not they smoke!

We can see then, that it is the role of user acceptance testing to not only prove whether or not an application works, but also to prove how it will fit with business processes.

User Acceptance Testing Processes

OK, now we have determined what UAT is, now we need to look at HOW we achieve these objectives.

The user acceptance test life cycle follows the path shown below (obviously at a very high level);

  • analysis of business requirements We can't do anything concerning testing until we understand what the developments are supposed to achieve. This is quite an intangible step in the process, and consists mostly of thought processes, meeting etc. The end result, is a clear vision,in the testers mind, of what they are going to be expected to prove, and why it is necessary.
  • analysis of testing requirements. This is more tangible than the first stage, and consists of documenting the areas of the development that require testing, the methodologies you will need to use to test them, and the results to expect to be returned when you test them.
  • Execution of testing. Doing the business. This is what it all boils down to. Every development project will be different, and you will have had enough experience in this part of the cycle to not need any pointers from me!
  • Getting the testing signed off. There is no use going through all of these processes, raising problems to developments teams, having more work done by the development teams in fixing those problems, re-testing the changes and re-doing all your regression scripts, unless at the end of the day, you can the users to sign off the changes.
 

Wednesday, October 19, 2005

Black Box Testing


The black box testing approach is a testing method in which test data are derived from the specified functional requirements without regard to the final program structure.

It is also termed data-driven, input/output driven, or requirements-based testing. Because only the functionality of the software module is of concern, black-box testing also mainly refers to functional testing -- a testing method emphasized on executing the functions and examination of their input and output data.

The tester treats the software under test as a black box -- only the inputs, outputs and specification are visible, and the functionality is determined by observing the outputs to corresponding inputs. In testing, various inputs are exercised and the outputs are compared against specification to validate the correctness. All test cases are derived from the specification. No implementation details of the code are considered.

It is obvious that the more we have covered in the input space, the more problems we will find and therefore we will be more confident about the quality of the software. Ideally we would be tempted to exhaustively test the input space. But as stated above, exhaustively testing the combinations of valid inputs will be impossible for most of the programs, let alone considering invalid inputs, timing, sequence, and resource variables. Combinatorial explosion is the major roadblock in functional testing. To make things worse, we can never be sure whether the specification is either correct or complete.

Due to limitations of the language used in the specifications (usually natural language), ambiguity is often inevitable. Even if we use some type of formal or restricted language, we may still fail to write down all the possible cases in the specification. Sometimes, the specification itself becomes an intractable problem: it is not possible to specify precisely every situation that can be encountered using limited words. And people can seldom specify clearly what they want -- they usually can tell whether a prototype is, or is not, what they want after they have been finished. Specification problems contributes approximately 30 percent of all bugs in software.

The research in black-box testing mainly focuses on how to maximize the effectiveness of testing with minimum cost, usually the number of test cases. It is not possible to exhaust the input space, but it is possible to exhaustively test a subset of the input space. Partitioning is one of the common techniques. If we have partitioned the input space and assume all the input values in a partition is equivalent, then we only need to test one representative value in each partition to sufficiently cover the whole input space.

Domain testing partitions the input domain into regions, and consider the input values in each domain an equivalent class. Domains can be exhaustively tested and covered by selecting a representative value(s) in each domain. Boundary values are of special interest. Experience shows that test cases that explore boundary conditions have a higher payoff than test cases that do not. Boundary value analysis requires one or more boundary values selected as representative test cases. The difficulties with domain testing are that incorrect domain definitions in the specification can not be efficiently discovered.

Good partitioning requires knowledge of the software structure.

A good testing plan will not only contain black-box testing, but also white-box approaches, and combinations of the two.

Tuesday, October 18, 2005

Testing Without a Formal Test Plan


A formal test plan is a document that provides and records important information about a test project, for example:


  1. Project assumptions

  2. Project background information

  3. Available resources

  4. Project Schedule

  5. Entry and exit criteria

  6. Test milestones

  7. Use cases and/or test cases
For a range of reasons -- both good and bad -- many software and web development projects don't budget enough time for complete and comprehensive testing. A quality test team must be able to test a product or system quickly and constructively in order to provide some value to the project. This essay describes how to test a web site or application in the absence of a detailed test plan and facing short or unreasonable deadlines.


Identify High-Level Functions First
High-level functions are those functions that are most important to the central purpose(s) of the site or application. A test plan would typically provide a breakdown of an application's functional groups as defined by the developers; for example, the functional groups of a commerce web site might be defined as shopping cart application, address book, registration/user information, order submission, search, and online customer service chat. If this site's purpose is to sell goods online, then you have a quick-and-dirty prioritization of:


  1. Shopping cart - credit card validation and security.

  2. Registration/user information

  3. Taking Orders

  4. Search the site

  5. Online customer service like chat, email etc
I've prioritized these functions according to their significance to a user's ability to complete a transaction. I've ignored some of the lower-level functions for now, such as the modify shopping cart quantity and edit saved address functions because they are a little less important than the higher-level functions from a test point-of-view at the beginning of testing.
Your opinion of the prioritization may disagree with mine, but the point here is that time is critical and in the absence of defined priorities in a test plan, you must test something now. You will make mistakes, and you will find yourself making changes once testing has started, but you need to determine your test direction as soon as possible.

Test Functions Before Display
Any web site should be tested for cross-browser and cross-platform compatibility -- this is a primary rule of web site quality assurance. However, wait on the compatibility testing until after the site can be verified to just plain work. Test the site's functionality using a browser/OS/platform that is expected to work correctly -- use what the designers and coders use to review their work.

Concentrate on Ideal User Actions First
Ideal User Actions are those actions and steps most likely to be performed by users. For example, on a typical commerce site, a user is likely to


  1. identify an item of interest

  2. add that item to the shopping cart

  3. buy it online with a credit card

  4. ship it to himself/herself
Now, this describes what the user would want to do, but many sites require a few more functions, so the user must go through some more steps, for example:


  1. login to an existing registration account (if one exists)

  2. register as a user if no account exists

  3. provide billing & bill-to address information

  4. provide ship-to address information

  5. provide shipping & shipping method information

  6. provide payment information

  7. agree or disagree to receiving site emails and newsletters
Most sites offer (or force) an even wider range of actions on the user:


  1. change product quantity in the shopping cart

  2. remove product from shopping cart

  3. edit user information (or ship-to information or bill-to information)

  4. save default information (like default shipping preferences or credit card information)
All of these actions and steps may be important to some users some of the time (and some developers and marketers all of the time), but the majority of users will not use every function every time. Focus on the ideal path and identify those factors most likely to be used in a majority of user interactions.

Concentrate on Intrinsic Factors First
Intrinsic factors are those factors or characteristics that are part of the system or product being tested. An intrinsic factor is an internal factor. So, for a typical commerce site, the HTML page code that the browser uses to display the shopping cart pages is intrinsic to the site: change the page code and the site itself is changed. The code logic called by a submit button is intrinsic to the site.
Extrinsic factors are external to the site or application. Your crappy computer with only 8 megs of RAM is extrinsic to the site, so your home computer can crash without affecting the commerce site, and adding more memory to your computer doesn't mean a whit to the commerce site or its functioning.
Given a severe shortage of test time, focus first on factors intrinsic to the site:


  1. does the site work?

  2. do the functions work? (again with the functionality, because it is so basic)

  3. do the links work?

  4. are the files present and accounted for?

  5. are the graphics MIME types correct? (I used to think that this couldn't be screwed up)
Once the intrinsic factors are squared away, then start on the extrinsic points:


  1. cross-browser and cross-platform compatibility

  2. clients with cookies disabled

  3. clients with javascript disabled

  4. monitor resolution

  5. browser sizing

  6. connection speed differences
The point here is that with myriad possible client configurations and user-defined environmental factors to think about, think first about those that relate to the product or application itself. When you run out of time, better to know that the system works rather than that all monitor resolutions safely render the main pages.

Boundary Test From Reasonable to Extreme
You can't just verify that an application works correctly if all input and all actions have been correct. People do make mistakes, so you must test error handling and error states. The systematic testing of error handling is called boundary testing (actually, boundary testing describes much more, but this is enough for this discussion).
During your pedal-to-the-floor, no-test-plan testing project, boundary testing refers to the testing of forms and data inputs, starting from known good values, and progressing through reasonable but invalid inputs all the way to known extreme and invalid values.

Good Values
Enter in data formatted as the interface requires. Include all required fields. Use valid and current information (what "valid and current" means will depend on the test system, so some systems will have a set of data points that are valid for the context of that test system). Do not try to cause errors.

Expected Bad Values
Some invalid data entries are intrinsic to the interface and concept domain. For example, any credit card information form will expect expired credit card dates -- and should trap for them. Every form that specifies some fields as required should trap for those fields being left blank. Every form that has drop-down menus that default to an instruction ("select one", etc.) should trap for that instruction. What about punctuation in name fields?

Reasonable and Predictable Mistakes
People will make some mistakes based on the design of the form, the implementation of the interface, or the interface's interpretation of the relevant concept domain(s). For example, people will inadvertently enter in trailing or leading spaces into form fields. People might enter a first and middle name into a first name form field ("Mary Jane").
Not a mistake, per se, but how does the form field handle case? Is the information case-sensitive? Or does the address form handle a PO address? Does the address form handle a business name?

Compatibility Test From Good to Bad
Once you get to cross-browser and cross-platform compatibility testing, follow the same philosophy of starting with the most important (as defined by prevalence among expected user base) or most common based on prior experience and working towards the less common and less important.
Do not make the assumption that because a site was designed for a previous version of a browser, OS, or platform it will also work on newer releases. Instead, make a list of the browsers and operating systems in order of popularity on the Internet in general, and then move those that are of special importance to your site (or your marketers and/or executives) to the top of the list.


The Drawbacks of This Testing Approach
Many projects are not mature and are not rational (at least from the point-of-view of the quality assurance team), and so the test team must scramble to test as effectively as possibly within a very short time frame. I've spelled out how to test quickly without a structured test plan, and this method is much better than chaos and somewhat better than letting the developers tell you what and how to test.
This approach has definite quality implications:


  1. Incomplete functional coverage -- this is no way to exercise all of the software's functions comprehensively.

  2. No risk management -- this is no way to measure overall risk issues regarding code coverage and quality metrics. Effective quality assurance measures quality over time and starting from a known base of evaluation.

  3. Too little emphasis on user tasks -- because testers will focus on ideal paths instead of real paths. With no time to prepare, ideal paths are defined according to best guesses or developer feedback rather than by careful consideration of how users will understand the system or how users understand real-world analogues to the application tasks. With no time to prepare, testers will be using a very restricted set input data, rather than using real data (from user activity logs, from logical scenarios, from careful consideration of the concept domain).

  4. Difficulty reproducing -- because testers are making up the tests as they go along, reproducing the specific errors found can be difficult, but also reproducing the tests performed will be tough. This will cause problems when trying to measure quality over successive code cycles.

  5. Project management may believe that this approach to testing is good enough -- because you can do some good testing by following this process, management may assume that full and structured testing, along with careful test preparation and test results analysis, isn't necessary. That misapprehension is a very bad sign for the continued quality of any product or web site.

  6. Inefficient over the long term -- quality assurance involves a range of tasks and foci. Effective quality assurance programs expand their base of documentation on the product and on the testing process over time, increasing the coverage and granularity of tests over time. Great testing requires good test setup and preparation, but success with the kind testplan-less approach described in this essay may reinforce bad project and test methodologies. A continued pattern of quick-and-dirty testing like this is a sign that the product or application is unsustainable in the long run.