Monday, October 31, 2005

Testing for Zero bugs

The Software Quality Myth

A widely accepted premise on software quality is that software is so complex (in combinatorial terms) that it is impossible to have defect free software. We often hear maxims like "there's always one more bug", and "software testing can reveal the existence of bugs, but never prove their absence".

While the above maxims may be true, I'm afraid they tend to lead us to a state of mind where we accept them as an inevitable evil. If there's no way to make software clean - why bother.

Having some prior experience with software testing, I'm certain that we can do much better than we do today. Once we do this right, the results would pleasantly surprise us.

Conventional testing: How we test software?

Looking at the ways we test software, we see the following methods:

  • Known scenario replay (test suites, regression tests)
  • Random early exposure (Campus alphas, selected customer betas)

The known scenario replay is the single most commonly used method to test software. Unfortunately, it is the least effective method to uncover bugs as proven by the large number of bugs uncovered in later stages.

Regression tests and test suites are necessary but insufficient. They're a good way for conducting sanity checking and ensuring that popular and commonly compiled code runs correctly. However, they have two striking flaws:

  • Coverage is extremely limited. When you run the same suite so many times, obviously your bugs tend to hide elsewhere.

  • It is difficult to track bugs in case of failure. Test suites tend to be big pieces of code. When something goes wrong, we apply some further trial runs and search techniques until the problem is narrowed down to a single routine or line of source code.
Early exposure like alpha (and beta) testing has the advantage that it is much more random than the known-scenario replay. It provides more "real world" coverage, but it has its own flaws:
  • Since it is an informal testing method, reproducibility is a problem (e.g asking a beta customer: what exactly did you do to experience the problem?")
  • It relies on people's good will to investigate problems as they occur, and report bugs accurately and with enough detail when they are found.
  • It suffers for a small scope (time, number of machines, and people employed) compared to the real installed base and its usage patterns.
  • At least the Alpha part doesn't really emulate our customer environment: our environment is far from heterogenic: almost no other vendors' machines (Sun, HP, IBM, Mac, PCs) exist on campus.

Fortunately, there is a complementary testing method that covers the above flaws well.

Monkeys at work: Random Testing

It is said that if you give a zillion monkeys keyboards and let them type long enough, one of them would eventually produce (insert your favorite magnum opus here). Finding bugs is probably much easier than writing magnum opii. If you think this is absurd, just substitute the word "monkeys" with "customers".

For years we have been sending our software to much less than a zillion customers, and yet, without exception, they eventually hit hundreds of undiscovered bugs.

Optimization imperative #1 calls for applying zillions of computer cycles before shipment to try and beat the customers in the race to uncover the bugs. This is what random testing is all about. This is also obvious. The magic is in the details.

Constrained Random Testing: a zillion monkeys, optimized

Since the combinatorial space of software is so huge we would like the proverbial monkeys to be constrained in some way and not purely random. To give an example: if we had our monkeys test UNIX for us, we would want them to type stuff like ls, make, cc etc. rather than having them type stuff like %^3gkl'$#*(&% (*).

``Elementary,'' you may say, but read on.

A proof by example: It just works!

Before going into the details of Constrained Random Testing, let me state that from my experience applying this technique, to one problem space, it works far better than any other conventional testing method I've seen.

I used to work at the National Semiconductor microprocessor design center in Tel Aviv (NSTA) where I was a member of the Compilers team who wrote the compilers for all the 32000 Series of microprocessors.

For several years, our compilers were buggy as hell. Having worked closely with hardware engineers, we were urged to test the compilers "like hardware is tested", using random "vectors" of input. Once we were on the right track in our tought process things started to improve fast. It took one engineer about 4 months to come with a prototype for random compiler testing. He was generating random IR (Intermediate Representation) because it was easier to do than generating high-level code and we just needed a proof of concept. As a result we were testing only the back-end (Optimizer, Code generator, assembler, linker).

The introduction of random testing practically eliminated user bug reports on released back ends. At a certain point, we couldn't believe it ourselves, so we conducted an experiment by running the random program generator (it used to be called RIG, for Random IR Generator) and was implemented by Avi Bloch and Amit Matatia (see acknowledgments) on a previous release of the compiler. To our amazement, RIG was able to find over half of all the bugs reported by customers on the code generator in just one night of runtime.

Thanks to RIG, the GNX Compilers for the 32000 processor family have matured in a very short period to become one of the most reliable in the industry; I remember compiling an early version of perl (I think it was perl 2.0) with full optimizations and passing the whole test suite on a Sequent Balance (a big 32032-CPU SMP machine), while the SunOS and the HP/UX compilers I used for comparison weren't anywhere near this quality at that time (mind you, that was way back, in 1990 or so).

Based on this personal experience, I have no doubt this method is well worth pursuing here.

Testing a Compiler: CRT Outline

Here's the skeleton of RIG:

  • Generate a random program subject to constraints
  • Compile it
  • Run it
  • Check that the results make sense:
    • if they do, discard of the program
    • if not, save the program and results for later human inspection

Generating a random program is done using recursive descent on the grammar of the language, while applying constraints at every node on the random generator (explained below). To give a simple example: pick an operand at random, check its arity, generate N random operands to operate on. Each operand can be either a constant or an arbitrary randomly generated new expression. etc.

Constraint Magic: making a random program behave

The outline sounds simple enough, but a few problems immediately come to mind:

  • How can you ensure that a randomly generated program terminates?
  • How can you ensure that a randomly generated program doesn't generate an exception?
  • How can you ensure that a perfectly "safe" optimization won't change the semantics of the program (e.g. use of an undefined variable, the behavior of which is undefined)

This is where the constraints come into play. The embedded constraints are what turns the method from a pure theoretical play into a practical and effective testing method.

  • To ensure termination, we artificially inject an independent counter into every randomly generated loop or recursive function call, if a configurable limit is exceeded, we just break out of the loop. Not perfect, indeed, but simple, straightforward, and does the job.

  • As for exceptions, there are two approaches: one is to simply allow them, if a randomly picked divisor happens to be zero, so be it. The output may be partial, but still deterministic. The other approach (if exceptions happen too often) is to regenerate those expressions that cause the exceptions, or to ensure by construction that they always fall within a certain range of values.

    For example, we ensure that constant divisors are never zero, and that variable divisors are checked at run time (if they are zero don't divide). Likewise we may add a check before array accesses for legal indexes into a pre-generated array. From experience, the second approach is what you want in 99% of the cases, and again, it works well.

  • To make sure the program has outputs (so we can inspect its runtime results) we simply put a hook to print all the randomly generated variables and constants of the program from a customized exit() routine that is linked with the program.

  • Likewise, to ensure there's no use of undefined variables, we simply initialize all of them (with random values of course) just before we jump to main().

This is in essence what the constraints are all about.

Closing the loop: proving correctness

But then there's another more fundamental question:

  • If the program is randomly generated, (i.e. not known in advance) how can you predict its output, in order to verify that it ran correctly?

It turns out that even though the answer to the third question is "You can't", still, from a practical point of view, there's a nice solution to this. Not only it is simple, but it was also empirically proven to work very well.

Solution: Generate a reference output using another compiler or interpreter for the same language. E.g. for C, the reference may be a most vanilla compilation (without any optimizations) by a widely deployed and stable compiler like GNU cc (gcc). This can be even done on a different vendor's system.

Actually, for most cases, you don't even need an additional compiler: You may generate multiple outputs using the same compiler under test, each time with different compilation options. For example: If the result with optimization is different than the one without it, bingo: you've found a bug in the optimizer.

Practice & experience: The importance of space/time tuning

The constraints mentioned above are just part of the story. To be effective and efficient, the random program generator should be space/time tuned, for example: Testing a small array for an unsigned index wrap-around case is effective as checking a big array and is far less consuming in time and space. Thus: elements like the maximum size of a randomly generated array, maximum number of elements in a structure, or the maximum depth of an expression should all be configurable via a configuration file.

Results (in terms of detected bugs per unit of testing time) can change dramatically by fine tuning of these space/time constraint parameters. The general guideline is: strive to use constraints that will create the smallest/simplest case to exhibit the problem.

Some golden rules:

  • Don't generate big data structures.
  • Don't generate big programs, loops, or routines
  • For every possible range of values, use some bias towards extremum and "interesting" values (MAXINT, 1, 0, -1, last index of an array) as opposed to a uniform distribution of values.
  • Think hard about the constraints, let the random generator do the rest
  • For every generated program, run it against as many cases as possible (e.g. many possible compiler options) to amortize the generation overhead, and leverage the reference results over many test cases.

From my experience, all compiler bugs can be narrowed down and reproduced in a very small example. Come to think of it, This is exactly what makes random testing of compilers work so well. Also, the smaller the example, the easier it is to investigate and fix. This is where tuning pays big: if you can generate 100,000 tiny programs per night you'll be much more effective covering the compiler and fixing the bugs than if you generate 1000 larger programs per night.

More Advice

Start with simple expressions, including all types and all possible cast and type conversions, these are easy to test and are a first step in a CRT implementation that is natural to start with and build upon.

Pick a very good random number generator. If you don't and if you don't have much variability in the types of nodes that you randomly generate, your programs will tend to find the same bug multiple times. There are practical ways to minimize this: like changing the configuration parameters once a bug is found (and until the bug is fixed) but this requires further resources and effort.

Generate random C rather than random IR. You cannot compile proprietary formats with a reference compiler (not the one you're testing). You also get to test the language front ends in addition to the back end.

It may be easier to generate IR and upwardly generate high-level constructs from it. This is certainly a sensible strategy, especially in case the mapping between IR and C is one to one. even if not, this will enable testing Fortran with almost no additional effort (since we already have these "reverse compilers").

Generating test cases, running them, and comparing results normally take less time than many compilation with many options, so pick compilation cases that are as different as possible of each other to get better coverage. Also: try to combine several compilation options with each run (i.e. many optimizations together, vs. a vanilla compilation) to achieve good coverage of the compiler in as few compilations as possible.

Jumbo tip: An excellent way of tuning the constraints and the random program generator is to basic-block profile the compiler itself and see what parts of source code are not being exercised by the randomly generated programs. Then, tune the random generator a bit more. In other words, inject some white-box style testing into the random black-box approach.

By ariel faigon

Wednesday, October 26, 2005

White Box Testing

Definition of White Box Testing - A software testing technique whereby explicit knowledge of the internal workings of the item being tested are used to select the test data.

Unlike black box testing, white box testing uses specific knowledge of programming code to examine outputs. The test is accurate only if the tester knows what the program is supposed to do. He or she can then see if the program diverges from its intended goal. White box testing does not account for errors caused by omission, and all visible code must also be readable.

Contrary to black-box testing, software is viewed as a white-box, or glass-box in white-box testing, as the structure and flow of the software under test are visible to the tester.

Testing plans are made according to the details of the software implementation, such as programming language, logic, and styles. Test cases are derived from the program structure. White-box testing is also called glass-box testing, logic-driven testing or design-based testing.

There are many techniques available in white-box testing, because the problem of intractability is eased by specific knowledge and attention on the structure of the software under test. The intention of exhausting some aspect of the software is still strong in white-box testing, and some degree of exhaustion can be achieved, such as executing each line of code at least once (statement coverage), traverse every branch statements (branch coverage), or cover all the possible combinations of true and false condition predicates (Multiple condition coverage).

Control-flow testing, loop testing, and data-flow testing, all maps the corresponding flow structure of the software into a directed graph. Test cases are carefully selected based on the criterion that all the nodes or paths are covered or traversed at least once. By doing so we may discover unnecessary "dead" code -- code that is of no use, or never get executed at all, which can not be discovered by functional testing.

In mutation testing, the original program code is perturbed and many mutated programs are created, each contains one fault. Each faulty version of the program is called a mutant. Test data are selected based on the effectiveness of failing the mutants. The more mutants a test case can kill, the better the test case is considered. The problem with mutation testing is that it is too computationally expensive to use. The boundary between black-box approach and white-box approach is not clear-cut. Many testing strategies mentioned above, may not be safely classified into black-box testing or white-box testing. It is also true for transaction-flow testing, syntax testing, finite-state testing, and many other testing strategies not discussed in this text. One reason is that all the above techniques will need some knowledge of the specification of the software under test. Another reason is that the idea of specification itself is broad -- it may contain any requirement including the structure, programming language, and programming style as part of the specification content.

We may be reluctant to consider random testing as a testing technique. The test case selection is simple and straightforward: they are randomly chosen. Study indicates that random testing is more cost effective for many programs. Some very subtle errors can be discovered with low cost. And it is also not inferior in coverage than other carefully designed testing techniques. One can also obtain reliability estimate using random testing results based on operational profiles. Effectively combining random testing with other testing techniques may yield more powerful and cost-effective testing strategies.

Monday, October 24, 2005

What is User Acceptance Testing?



UAT goes under many names. As well as user acceptance testing, it is also known as Beta Testing (usually in the PC world) QA Testing, Application Testing, End User Testing or as it is known in the company where I work as Model Office Testing.


Developing software is a expensive business. It is expensive in ;

  • time, as the software must be analysed, specified, designed and written
  • people, as very few development projects are one man jobs
  • money, the people responsible for the analysis, specification and development of software do not come cheap (look at the current rates for contractors!)

If having expended all these peoples time and the company's money, if the resulting software is not completely suitable to the purpose required, then that time and money has not been fully utilised.

If the software is suitable to the purpose, but;

  • does not dovetail precisely with the business processes
  • makes processes more difficult to do than before
  • causes business processes to take longer than previously
  • makes additional processes necessary, with making other processes obsolete

then you may not will not see return on your investment in the software until much later, or may not even see a return on your investment.

Question : how do we ensure that we do not end up in this situation?

Answer : we test the software against objective criteria to ensure that we don't

Previously, most testing was left in the hands of the development teams, with the end users trusting those teams to deliver applications that were not only fully functional and stable, but also applications that would dovetail into business processes, and support those processes (maybe even make things a little bit easier)

However, the testing executed by developers is to ensure that the code they have created is stable and functional. They will test that;

  • they cover all the lines and logic paths through their code
  • all the screens flow backwards and forwards in the correct order
  • the software meets the functions specified (eg calculations are correct, reports have correct columns, screens have correct validation etc)

This testing might not be done through the application itself (often because it has not been completely built while they are testing), so they will only add a few records, maybe by editing the file/table and adding the records, rather than using the 'Record Entry Screen'.



As we will see later, this does not pose us a problem, because the UAT department will cover this testing. This system testing and unit testing by the developers is still very valid and useful. I would rather take delivery of an application where the development team say "We have done this, done that, hacked this file and added a few records, ran a few test cases through here - and everything seems OK", then take an application which has not gone through any system testing.

The application that has been tested by the developers will have had most of the obvious flaws identified and ironed out, and only the types of issues the testing was designed for should be identified. The second application will be completely unknown, and some of the time allocated for UAT will be spent identifying and fixing problems problems that could have been easily identified and rectified by the developers.

Also, because the developers are testing their own work, there is a tendency for them to skip areas because they 'know that there won't be a problem there'.

I have spoke to developers who have came to our company from places that do not do UAT, and they are both impressed with how we do things, but also like the idea of an independent third party testing their software.

These people are professional software developers, and they do not want to supply something that isn't exactly what's wanted, and they feel that the UA testing gives them a large comfort zone that any problems with their work will be identified and escalated back to them for correction.

As I said, these issues do not prove a problem to the user acceptance tester.

The four issues of the software delivered not matching the business process, making things more difficult etc are circumvented by the user acceptance tester.

While the developer tests against the system specification and technical documentation, the user acceptance tester test against the business requirements. The former tests the code, the latter the application. We will come to the test planning in a bit.

The issue of the developer testing their own work ceases to be an issue, as the UAT team will design a testing strategy that covers all areas of the business requirements, whether or not the developer feels there may be problems in a specific area.

The issue of additional processes being necessary should also not be a problem. As I said before, the UAT team tests the application against the business requirements, so all testing is done through the use of the proper system transactions.

The UAT team do not hack tables / file to create data, if a client record is needed for a test, then the UAT will create this client, by use of the formal client maintenance transaction, not by adding a record to the 'Client_Details' file.

This use of the formal application transaction transactions serves two purposes

  • it tests all the transactions that the business users shall run, giving complete 'business' coverage (as opposed to code coverage, or logic path coverage)
  • it will highlight any potential areas of adverse impacts on the business processes. If the contents of a form (eg an application form for a new life assurance policy) is used as the basis for creating a new life assurance policy record, then the use of the formal 'New Life Assurance Policy' transaction will determine whether the transaction works, and also whether the form holds the requisite information to create the policy records.

The 'New Life Assurance Policy' system may require the client to declare whether they smoke or not, however, if this question is not on the application form, then the business users will have to re-contact every new client, to determine whether or not they smoke!

We can see then, that it is the role of user acceptance testing to not only prove whether or not an application works, but also to prove how it will fit with business processes.

User Acceptance Testing Processes

OK, now we have determined what UAT is, now we need to look at HOW we achieve these objectives.

The user acceptance test life cycle follows the path shown below (obviously at a very high level);

  • analysis of business requirements We can't do anything concerning testing until we understand what the developments are supposed to achieve. This is quite an intangible step in the process, and consists mostly of thought processes, meeting etc. The end result, is a clear vision,in the testers mind, of what they are going to be expected to prove, and why it is necessary.
  • analysis of testing requirements. This is more tangible than the first stage, and consists of documenting the areas of the development that require testing, the methodologies you will need to use to test them, and the results to expect to be returned when you test them.
  • Execution of testing. Doing the business. This is what it all boils down to. Every development project will be different, and you will have had enough experience in this part of the cycle to not need any pointers from me!
  • Getting the testing signed off. There is no use going through all of these processes, raising problems to developments teams, having more work done by the development teams in fixing those problems, re-testing the changes and re-doing all your regression scripts, unless at the end of the day, you can the users to sign off the changes.

Wednesday, October 19, 2005

Black Box Testing

The black box testing approach is a testing method in which test data are derived from the specified functional requirements without regard to the final program structure.

It is also termed data-driven, input/output driven, or requirements-based testing. Because only the functionality of the software module is of concern, black-box testing also mainly refers to functional testing -- a testing method emphasized on executing the functions and examination of their input and output data.

The tester treats the software under test as a black box -- only the inputs, outputs and specification are visible, and the functionality is determined by observing the outputs to corresponding inputs. In testing, various inputs are exercised and the outputs are compared against specification to validate the correctness. All test cases are derived from the specification. No implementation details of the code are considered.

It is obvious that the more we have covered in the input space, the more problems we will find and therefore we will be more confident about the quality of the software. Ideally we would be tempted to exhaustively test the input space. But as stated above, exhaustively testing the combinations of valid inputs will be impossible for most of the programs, let alone considering invalid inputs, timing, sequence, and resource variables. Combinatorial explosion is the major roadblock in functional testing. To make things worse, we can never be sure whether the specification is either correct or complete.

Due to limitations of the language used in the specifications (usually natural language), ambiguity is often inevitable. Even if we use some type of formal or restricted language, we may still fail to write down all the possible cases in the specification. Sometimes, the specification itself becomes an intractable problem: it is not possible to specify precisely every situation that can be encountered using limited words. And people can seldom specify clearly what they want -- they usually can tell whether a prototype is, or is not, what they want after they have been finished. Specification problems contributes approximately 30 percent of all bugs in software.

The research in black-box testing mainly focuses on how to maximize the effectiveness of testing with minimum cost, usually the number of test cases. It is not possible to exhaust the input space, but it is possible to exhaustively test a subset of the input space. Partitioning is one of the common techniques. If we have partitioned the input space and assume all the input values in a partition is equivalent, then we only need to test one representative value in each partition to sufficiently cover the whole input space.

Domain testing partitions the input domain into regions, and consider the input values in each domain an equivalent class. Domains can be exhaustively tested and covered by selecting a representative value(s) in each domain. Boundary values are of special interest. Experience shows that test cases that explore boundary conditions have a higher payoff than test cases that do not. Boundary value analysis requires one or more boundary values selected as representative test cases. The difficulties with domain testing are that incorrect domain definitions in the specification can not be efficiently discovered.

Good partitioning requires knowledge of the software structure.

A good testing plan will not only contain black-box testing, but also white-box approaches, and combinations of the two.

Tuesday, October 18, 2005

Testing Without a Formal Test Plan

A formal test plan is a document that provides and records important information about a test project, for example:

  1. Project assumptions

  2. Project background information

  3. Available resources

  4. Project Schedule

  5. Entry and exit criteria

  6. Test milestones

  7. Use cases and/or test cases
For a range of reasons -- both good and bad -- many software and web development projects don't budget enough time for complete and comprehensive testing. A quality test team must be able to test a product or system quickly and constructively in order to provide some value to the project. This essay describes how to test a web site or application in the absence of a detailed test plan and facing short or unreasonable deadlines.

Identify High-Level Functions First
High-level functions are those functions that are most important to the central purpose(s) of the site or application. A test plan would typically provide a breakdown of an application's functional groups as defined by the developers; for example, the functional groups of a commerce web site might be defined as shopping cart application, address book, registration/user information, order submission, search, and online customer service chat. If this site's purpose is to sell goods online, then you have a quick-and-dirty prioritization of:

  1. Shopping cart - credit card validation and security.

  2. Registration/user information

  3. Taking Orders

  4. Search the site

  5. Online customer service like chat, email etc
I've prioritized these functions according to their significance to a user's ability to complete a transaction. I've ignored some of the lower-level functions for now, such as the modify shopping cart quantity and edit saved address functions because they are a little less important than the higher-level functions from a test point-of-view at the beginning of testing.
Your opinion of the prioritization may disagree with mine, but the point here is that time is critical and in the absence of defined priorities in a test plan, you must test something now. You will make mistakes, and you will find yourself making changes once testing has started, but you need to determine your test direction as soon as possible.

Test Functions Before Display
Any web site should be tested for cross-browser and cross-platform compatibility -- this is a primary rule of web site quality assurance. However, wait on the compatibility testing until after the site can be verified to just plain work. Test the site's functionality using a browser/OS/platform that is expected to work correctly -- use what the designers and coders use to review their work.

Concentrate on Ideal User Actions First
Ideal User Actions are those actions and steps most likely to be performed by users. For example, on a typical commerce site, a user is likely to

  1. identify an item of interest

  2. add that item to the shopping cart

  3. buy it online with a credit card

  4. ship it to himself/herself
Now, this describes what the user would want to do, but many sites require a few more functions, so the user must go through some more steps, for example:

  1. login to an existing registration account (if one exists)

  2. register as a user if no account exists

  3. provide billing & bill-to address information

  4. provide ship-to address information

  5. provide shipping & shipping method information

  6. provide payment information

  7. agree or disagree to receiving site emails and newsletters
Most sites offer (or force) an even wider range of actions on the user:

  1. change product quantity in the shopping cart

  2. remove product from shopping cart

  3. edit user information (or ship-to information or bill-to information)

  4. save default information (like default shipping preferences or credit card information)
All of these actions and steps may be important to some users some of the time (and some developers and marketers all of the time), but the majority of users will not use every function every time. Focus on the ideal path and identify those factors most likely to be used in a majority of user interactions.

Concentrate on Intrinsic Factors First
Intrinsic factors are those factors or characteristics that are part of the system or product being tested. An intrinsic factor is an internal factor. So, for a typical commerce site, the HTML page code that the browser uses to display the shopping cart pages is intrinsic to the site: change the page code and the site itself is changed. The code logic called by a submit button is intrinsic to the site.
Extrinsic factors are external to the site or application. Your crappy computer with only 8 megs of RAM is extrinsic to the site, so your home computer can crash without affecting the commerce site, and adding more memory to your computer doesn't mean a whit to the commerce site or its functioning.
Given a severe shortage of test time, focus first on factors intrinsic to the site:

  1. does the site work?

  2. do the functions work? (again with the functionality, because it is so basic)

  3. do the links work?

  4. are the files present and accounted for?

  5. are the graphics MIME types correct? (I used to think that this couldn't be screwed up)
Once the intrinsic factors are squared away, then start on the extrinsic points:

  1. cross-browser and cross-platform compatibility

  2. clients with cookies disabled

  3. clients with javascript disabled

  4. monitor resolution

  5. browser sizing

  6. connection speed differences
The point here is that with myriad possible client configurations and user-defined environmental factors to think about, think first about those that relate to the product or application itself. When you run out of time, better to know that the system works rather than that all monitor resolutions safely render the main pages.

Boundary Test From Reasonable to Extreme
You can't just verify that an application works correctly if all input and all actions have been correct. People do make mistakes, so you must test error handling and error states. The systematic testing of error handling is called boundary testing (actually, boundary testing describes much more, but this is enough for this discussion).
During your pedal-to-the-floor, no-test-plan testing project, boundary testing refers to the testing of forms and data inputs, starting from known good values, and progressing through reasonable but invalid inputs all the way to known extreme and invalid values.

Good Values
Enter in data formatted as the interface requires. Include all required fields. Use valid and current information (what "valid and current" means will depend on the test system, so some systems will have a set of data points that are valid for the context of that test system). Do not try to cause errors.

Expected Bad Values
Some invalid data entries are intrinsic to the interface and concept domain. For example, any credit card information form will expect expired credit card dates -- and should trap for them. Every form that specifies some fields as required should trap for those fields being left blank. Every form that has drop-down menus that default to an instruction ("select one", etc.) should trap for that instruction. What about punctuation in name fields?

Reasonable and Predictable Mistakes
People will make some mistakes based on the design of the form, the implementation of the interface, or the interface's interpretation of the relevant concept domain(s). For example, people will inadvertently enter in trailing or leading spaces into form fields. People might enter a first and middle name into a first name form field ("Mary Jane").
Not a mistake, per se, but how does the form field handle case? Is the information case-sensitive? Or does the address form handle a PO address? Does the address form handle a business name?

Compatibility Test From Good to Bad
Once you get to cross-browser and cross-platform compatibility testing, follow the same philosophy of starting with the most important (as defined by prevalence among expected user base) or most common based on prior experience and working towards the less common and less important.
Do not make the assumption that because a site was designed for a previous version of a browser, OS, or platform it will also work on newer releases. Instead, make a list of the browsers and operating systems in order of popularity on the Internet in general, and then move those that are of special importance to your site (or your marketers and/or executives) to the top of the list.

The Drawbacks of This Testing Approach
Many projects are not mature and are not rational (at least from the point-of-view of the quality assurance team), and so the test team must scramble to test as effectively as possibly within a very short time frame. I've spelled out how to test quickly without a structured test plan, and this method is much better than chaos and somewhat better than letting the developers tell you what and how to test.
This approach has definite quality implications:

  1. Incomplete functional coverage -- this is no way to exercise all of the software's functions comprehensively.

  2. No risk management -- this is no way to measure overall risk issues regarding code coverage and quality metrics. Effective quality assurance measures quality over time and starting from a known base of evaluation.

  3. Too little emphasis on user tasks -- because testers will focus on ideal paths instead of real paths. With no time to prepare, ideal paths are defined according to best guesses or developer feedback rather than by careful consideration of how users will understand the system or how users understand real-world analogues to the application tasks. With no time to prepare, testers will be using a very restricted set input data, rather than using real data (from user activity logs, from logical scenarios, from careful consideration of the concept domain).

  4. Difficulty reproducing -- because testers are making up the tests as they go along, reproducing the specific errors found can be difficult, but also reproducing the tests performed will be tough. This will cause problems when trying to measure quality over successive code cycles.

  5. Project management may believe that this approach to testing is good enough -- because you can do some good testing by following this process, management may assume that full and structured testing, along with careful test preparation and test results analysis, isn't necessary. That misapprehension is a very bad sign for the continued quality of any product or web site.

  6. Inefficient over the long term -- quality assurance involves a range of tasks and foci. Effective quality assurance programs expand their base of documentation on the product and on the testing process over time, increasing the coverage and granularity of tests over time. Great testing requires good test setup and preparation, but success with the kind testplan-less approach described in this essay may reinforce bad project and test methodologies. A continued pattern of quick-and-dirty testing like this is a sign that the product or application is unsustainable in the long run.

Thursday, October 13, 2005

What is a Nighlty Test Case

Why is a nightly called a nightly?

A nightly is a test case that must be run every time there is a new build. Generally, a new build is published every night, and these tests are run nightly, hence the name, nightly tests. Other names are Acceptance or Self-Host, but the concept remains the same.

What should a nightly cover?

A good nightly test case verifies the following:

  1. Is the feature usable?

  2. Can the user perform basic end-to-end scenarios?

  3. Most importantly, if a certain feature/scenario is broken, does the dev need to drop everything and immediately fix the issue?
Think of nightlies as regression tests. The nightlies cover the most important scenarios, so if the test case fails, a regression has occurred that the dev must investigate immediately. (If you’re wondering why don’t devs just run tests before checking in, they do – which is a topic for a later time. Or if you’re wondering why don’t devs just have the testers test the build before checking in, we do – called buddy tests – which is a topic for a later time.)

Consider creating a new text file. Putting the above into practice, a create text file nightly may verify the following:

  1. Is a Text File template available on the New File Dialog? Is the Text File created upon ok’ing the New File Dialog with the text file selected?

  2. Can the user insert text into the Text File?

  3. Can the user perform basic editing scenarios, like cut, copy, paste, and selection?

  4. Can the user save the Text File? Does the File – Save As dialog appear upon save?

  5. Does the text file appear dirty upon editing it?

  6. Can the user close the file?

  7. Does the file appear in the Most Recently Used list?

Why doesn’t a nightly cover more?

One might think, “that isn’t very much testing for a text file,” but consider how many different languages we support, how many different OS we support, the different SKUs, and the Express Skus. After a while, the testing matrix really starts to get big, especially given that we’re not only running these tests with every new build, but we’re also analyzing the results for each new build. And, there are more than 5 builds a week, believe me, when you add in all the different flavors of runs I’ve mentioned above.

To reiterate, the point of a nightly is to ensure that the feature is “testable”, also called “Self-Test”, and to find any regressions. If nightlies are passing at 100%, the feature is said to be “Self-Host” or testable for other scenarios. Once QA signs off on a feature / build, the feature / build is said to be shippable. Once nightlies are finished running, we are able to say, “This build or feature is Self-Host." If the build fails (obviously implying no features are testable), the build is considered Self-Toast, a term used by the build lab, which is a great play-on-words.

Friday, October 07, 2005

Test Plan

What a test plan should contain

A software project test plan is a document that describes the objectives, scope, approach, and focus of a software testing effort. The process of preparing a test plan is a useful way to think through the efforts needed to validate the acceptability of a software product. The completed document will help people outside the test group understand the 'why' and 'how' of product validation. It should be thorough enough to be useful but not so thorough that no one outside the test group will read it.

A test plan states what the items to be tested are, at what level they will be tested, what sequence they are to be tested in, how the test strategy will be applied to the testing of each item, and describes the test environment.

A test plan should ideally be organisation wide, being applicable to all of an organisations software developments.

The objective of each test plan is to provide a plan for verification, by testing the software, the software produced fulfils the functional or design statements of the appropriate software specification. In the case of acceptance testing and system testing, this generally means the Functional Specification.

The first consideration when preparing the Test Plan is who the intended audience is – i.e. the audience for a Unit Test Plan would be different, and thus the content would have to be adjusted accordingly.

You should begin the test plan as soon as possible. Generally it is desirable to begin the master test plan as the same time the Requirements documents and the Project Plan are being developed. Test planning can (and should) have an impact on the Project Plan. Even though plans that are written early will have to be changed during the course of the development and testing, but that is important, because it records the progress of the testing and helps planners become more proficient.

What to consider for the Test Plan:
1. Test Plan Identifier
2. References
3. Introduction
4. Test Items
5. Software Risk Issues
6. Features to be Tested
7. Features not to be Tested
8. Approach 9. Item Pass/Fail Criteria
10. Suspension Criteria and Resumption Requirements
11. Test Deliverables
12. Remaining Test Tasks
13. Environmental Needs
14. Staffing and Training Needs
15. Responsibilities
16. Schedule
17. Planning Risks and Contingencies
18. Approvals
19. Glossary

Standards for Software Test Plans
Several standards suggest what a test plan should contain, including the IEEE.

ANSI/IEEE 829-1983 IEEE Standard for Software Test Documentation -Description

Wednesday, October 05, 2005

Automation Testing versus Manual Testing

I met with my team’s automation experts a few weeks back to get their input on when to automate and when to manually test. The general rule of thumb has always been to use common sense. If you’re only going to run the test one or two times or the test is really expensive to automation, it is most likely a manual test. But then again, what good is saying “use common sense” when you need to come up with deterministic set of guidelines on how and when to automate?

Pros of Automation
  1. If you have to run a set of tests repeatedly, automation is a huge win for you

  2. It gives you the ability to run automation against code that frequently changes to catch regressions in a timely manner

  3. It gives you the ability to run automation in mainstream scenarios to catch regressions in a timely manner (see What is a Nighlty)

  4. Aids in testing a large test matrix (different languages on different OS platforms). Automated tests can be run at the same time on different machines, whereas the manual tests would have to be run sequentially.
Cons of Automation
  1. It costs more to automate. Writing the test cases and writing or configuring the automate framework you’re using costs more initially than running the test manually.

  2. Can’t automate visual references, for example, if you can’t tell the font color via code or the automation tool, it is a manual test.
Pros of Manual
  1. If the test case only runs twice a coding milestone, it most likely should be a manual test. Less cost than automating it.

  2. It allows the tester to perform more ad-hoc (random testing). In my experiences, more bugs are found via ad-hoc than via automation. And, the more time a tester spends playing with the feature, the greater the odds of finding real user bugs.
Cons of Manual
  1. Running tests manually can be very time consuming

  2. Each time there is a new build, the tester must rerun all required tests - which after a while would become very mundane and tiresome.
Other deciding factors
  1. What you automate depends on the tools you use. If the tools have any limitations, those tests are manual.

  2. Is the return on investment worth automating? Is what you get out of automation worth the cost of setting up and supporting the test cases, the automation framework, and the system that runs the test cases?
Criteria for automating
There are two sets of questions to determine whether automation is right for your test case:

Is this test scenario automatable?
  1. Yes, and it will cost a little

  2. Yes, but it will cost a lot

  3. No, it is no possible to automate
How important is this test scenario?
  1. I must absolutely test this scenario whenever possible

  2. I need to test this scenario regularly

  3. I only need to test this scenario once in a while
If you answered #1 to both questions – definitely automate that test
If you answered #1 or #2 to both questions – you should automate that test
If you answered #2 to both questions – you need to consider if it is really worth the investment to automate

What happens if you can’t automate?
Let’s say that you have a test that you absolutely need to run whenever possible, but it isn’t possible to automate. Your options are
  1. Reevaluate – do I really need to run this test this often?

  2. What’s the cost of doing this test manually?

  3. Look for new testing tools

  4. Consider test hooks


Author - saraford

Tuesday, October 04, 2005

Call for Presentations/Papers



MAY 1 - 5, 2006

For ten years PSQT has been serving quality professionals through its conferences focusing on PRACTICAL Software Quality Techniques.

Join us for PSQT WEST in Las Vegas, May 1 and 2, 2006,

followed by Tutorials May 3 - 5, 2006.

Our theme this year: YOU CAN BET ON QUALITY!

We’ll present WINNING Keynotes, Presentations and Tutorials by leaders in the field and practitioners who will show you how to apply these practical techniques to your work.



November 1, 2005 -- Proposals for full day Tutorials are due

(See Tutorial Submission Guidelines at

November 15, 2005 – Proposal for presentations due

(no late proposals accepted)

December 12, 2005 – Acceptance/Rejection notices begin mailing out

March 1, 2006 -- Complete presentation due (PowerPoint and paper if you choose to include one)

May 1 – 2, 2006 – Conference Presentations



Propose a Presentation for the Conference Concurrent Sessions on

May 1 and 2, 2006. If you are accepted as a presenter, you

receive a COMPLEMENTARY REGISTRATION to the Conference,

May 1 and 2, 2006. You will also receive 50% OFF on all

full-day Tutorials offered on May 3 - 5.

Conference materials, breakfast and lunch are included.

You will be responsible for all other experiences.

TOPICS OF INTEREST (Related topics are welcome)




o Testing technology, process and automation

o Agile and eXtreme approaches and testing

o Testing web, Internet, e-Commerce applications

o Security testing

o End-to-End testing

o Performance, load, stress testing

o Static testing: reviews, Inspections

o Integration, systems and regression testing

o Use Cases and testing

o Successful tool usage

o Risk-based testing

o Testable requirements

o Related topics



o Managing test function and processes

o Risk management and mitigation

o Developing, managing a test team

o Developing, implementing standards

o Software process assessment, improvement

o Requirements management, modeling

o Measurement, metrics

o Defect tracking and studies

o Models: CMM(I), ISO, Six Sigma, SPICE, TickIT

o Defect prevention techniques, methodologies

o ITIL, implementation, measurement, integration

o Related topics

Topics related to the Body of Knowledge for these certifications

*** Certified Software Test Professional (CSTP)

*** Certified Test Manager (CTM)



* Your proposal must follow the standards and procedures below

completely. Non-standard proposals cannot be considered.

* Your submission for the concurrent sessions must reach us by

e-mail, in the proper format, on or before November 15.

Late proposals cannot be considered.

* Receiving your proposal does NOT guarantee acceptance.

If your proposal is accepted, timely receipt of the full

presentation is required for you to be allowed to present.

* The presentation must fully address your proposal objectives

and outline.

* Presentations must be provided by March 1, 2006 in MS PowerPoint

format. If you also provide a paper to accompany your presentation,

it must be in MS Word or RTF format.

* All accepted proposals from practitioners must be accompanied by an

authorization from your company to make the presentation. If you

come from a country that requires a Visa or other travel authorization

to come to the US, you must present your Visa confirmation with you’re a

accepted presentation on March 1, 2006. Without these confirmations,

the decision to accept your proposal will be reversed.

Submit your Proposal to:


PSQT West 2006 Program Chair: Dr. Rebecca Staton-Reinstein at

Do not place any part of the Proposal in the body of the email.



Your Proposal must follow all of the standards and procedures

exactly and completely. Non-conforming Proposals cannot be

reviewed or accepted.

Submitting a Proposal implies that you will be available to

make the presentation at the conference, including getting

approval from your company and meeting any international travel

requirements. If you are accepted and do not meet these

requirements your presentation and any accompanying paper will

NOT be included in the conference documents or on the website.



1. Submit all Proposal items as MS Word or RTF documents.

2. All pages must contain the page number, your Name and

E-mail address in the footer. Do not put anything else in

the footer.

3. Please check your work for grammar and spelling. Don't

rely on "spell check" alone!

4. COVER PAGE (Restricted to one page)

a. Presentation Title

b. Presenter Information: Name, Title, Company,

Phone, Fax, E-mail

c. Will you provide a paper with the presentation:

d. Has this paper or presentation been delivered or

published elsewhere? Event and/or
Publication, Date

5. PRESENTATION ABSTRACT (Begin a new page; restricted to 1 page)

a. Presentation Title (Make it descriptive and

b. Description of the Presentation in this format:
(You must

follow format to be considered.)

i. Background, context or
rationale for Presentation

ii. Key concepts to be
presented (1 – 3 concepts)

iii. Why each concept is
important or useful

iv. Learning Objectives
(Results) for participants –

what they will
learn and/or how they will

– 3 objectives)

v. How they can apply the

6. OUTLINE: Full, detailed, bulleted or numbered outline of

the Presentation. This should highlight all points in the

Abstract and their logical flow and development. If your

presentation is chosen, this outline will be posted on

website. (Limited to 2 pages)



Your Proposal will be reviewed by a committee of experienced

quality professionals including consultants and practitioners.

If you have any questions, please address them to

Dr. Rebecca Staton-Reinstein,

Department of Education and Professional Development

International Institute for Software Testing

636 Mendelssohn Ave. North

MN 55427