MODEL-BASED TESTING
FOCUS AREA: QUALITY
Definition and Summary: Model-Based Testing is the automatic generation of efficient test procedures/vectors using models of system requirements and specified functionality.
Specific activities of the practice are (1) Build the model, (2) Generate expected inputs (3) Generate expected outputs, (4) Run tests, (5) Compare actual outputs with expected outputs, and (5) Decide on further actions (whether to modify the model, generate more tests, or stop testing, estimate reliability (quality) of the software)
Model-Based Testing (MBT) can result in the following benefits:
§ Shorter schedules, lower cost, and better quality
§ A model of user behavior
§ Enhanced communication between developers and testers
§ Early exposure of ambiguities in specification and design
§ Capability to automatically generate many non-repetitive and useful tests
§ Test harness to automatically run generated tests
§ Eases the updating of test suites for changed requirements
§ Capability to evaluate regression test suites
§ Capability to assess software quality
These benefits all require an initial investment in tools and training.
SUMMARY DESCRIPTION
Specific activities of the practice, as shown in the Figure below, are:
§ Build the model
§ Generate expected inputs
§ Generate expected outputs
§ Run tests
§ Compare actual outputs with expected outputs
§ Decide on further actions (whether to modify the model, generate more tests, or stop testing, estimate reliability (quality) of the software)
DETAILED DESCRIPTION
The MBT process begins with requirements. A model for user behavior is built from requirements for the system. Those building the model need to develop an understanding of the system under test and of the characteristics of the users, the inputs and output of each user, the conditions under which an input can be applied, etc. The model is used to generate test cases, typically in an automated fashion. The specification of test cases should include expected outputs. The model can generate some information on outputs, such as the expected state of the system. Other information on expected outputs may come from somewhere else, such as a test oracle. The system is run against the generated tests and the outputs are compared with the expected outputs. Here, too, automation is extremely useful. The failures are used to identify bugs in the system. The test data is also used to make decisions, for example, on whether testing should be terminated and the system released.
Build the model: Forming a mental representation of the system’s functionality is a prerequisite to building a model for testing purposes. Testers need to understand not only the software, but also the environment in which it operates. The model should be a depiction of the software’s behavior, which can be described in terms of the input sequences accepted by the system; the actions, conditions, and output logic; or the flow of data through the applications, modules, and routines. In order to be useful for groups of testers and for multiple testing tasks, the description needs to be written down in an easily understandable form and be as formal as is practical. Useful models typically possess properties that make test generation effortless and, frequently, automatable. There are many formal modeling techniques (ways to depict behavior) from which to choose. For large complex systems it is often necessary for a team of testing/modeling experts to work together to derive the model. They use formal modeling techniques to communicate/coordinate their efforts.
A variety of techniques/methods exist for expressing models of user/system behavior. These include, but are not limited to:
§ Decision Tables – Tables used to show sets of conditions and the actions resulting from them
§ Finite State Machines – A computational model consisting of a finite number of states and transitions between those states, possibly with accompanying actions
§ Grammars – describe the syntax of programming and other input languages
§ Markov Chains (Markov process) – A discrete, stochastic process in which the probability that the process is in a given state at a certain time depends only on the value of the immediately preceding state
§ Statecharts – Behavior diagrams specified as part of the Unified Modeling Language (UML). A statechart depicts the states that a system or component can assume, and shows the events or circumstances that cause or result from a change from one state to another.
Table 1 describes characteristics of an application that indicate which technique is most appropriate.
Table 1: Modeling Method Guidance [Based on El-Far (2001a)]
Application Characteristics |
Suggested Modeling Method |
Processes formal language (e.g., web browser process HTML, compiler) |
Grammar |
Protocol-based |
Grammar |
State-rich systems (e.g., telephony systems) |
Finite State Machines |
Few states, transitions caused by external conditions, as well as user inputs |
Prefer statecharts over Finite State Machines |
Capable of being model by Finite State Machine; statistical analysis of failure data or reliability assessment is desired. |
Prefer Markov chains over Finite State Machines |
Use of operational profiles to guide test generation is desired. |
Markov chains |
Must ensure correctness for all combinations of input values. |
A tabular model;
Prefer Finite State Machines over Markov chains |
Need to represent conditions under which inputs cause a particular response, Finite State Machines too awkward. |
Decision tables |
Parallel system, individual components capable of being modeled by state machines |
Statecharts |
Parallel system, some components not capable of being modeled by state machines |
Models for different components; “one gaping hole” [El-Far 2001a] in Model-Based Testing |
Finite State Machines and Markov chains are the two most popular techniques in MBT for modeling user behavior. Finite State Machines can ensure that generated test cases cover the model. When a Markov chain model is used, a random process generates test cases, making coverage criteria more difficult to ensure in some specified number of test cases. The mathematics of Markov chains, however, provides analytical formulas to determine expected values useful in test planning.
Addressing these specific techniques in further detail is beyond the scope of this document. Additional information, however, is presented in [Vienneau, 2003] and in a number of references cited in the Reference section of this document.
Table 2 presents sequential heuristics useful in building a state-based model, such as a Finite State Machine or a Markov chain. These heuristics cannot replace a good understanding of the system under test; they provide guidance on how to use that understanding.
Table 2: Build the Model (Based on [El-Far 2001c] and [Whittaker, 1997])
List all inputs |
For each input, list the situations in which the input can be applied and the situations in which the input cannot be applied. |
For each input, list the situations in which the input causes different behaviors or outputs, depending on the context in which the input is applied. |
Generate expected inputs: Use the model to generate test cases, which consist primarily of specifying the inputs and expected outputs. The difficulty of generating tests depends on the nature of the model. In the case of finite state machines, it is a matter of implementing an algorithm that randomly traverses the state transition diagram (a directed graph). Tests are, by definition, the sequence of inputs along the generated paths. Thus, if the model is well defined, the tests can be generated automatically. In contrast, without automation and modeling tools, this task can be immense and near impossible to do manually for a complex system.
Generate expected outputs: Software testing involves execution of a program under test using some fault-revealing input data and examination of the output to determine success or failure. A fundamental assumption of this testing is that there is some mechanism, a test “oracle”, that will determine whether or not the results of a test execution are correct, - something that defines/identifies the expected outputs. As illustrated in the summary Figure above, expected outputs must be generated prior to running tests. A test oracle is the criterion used to check the correctness of the output. For example, the behavior of a competing product might be the basis for assessing the correctness of the product under test, i.e., “It should do what product B does”. Another example would be using a previous version of the software in which the component/feature under test did not experience significant change, i.e., “We should get the same results now that we got with this in version X.”
Form, fit and function of the test oracle is closely tied to (1) size and complexity of the software under test, and (2) the degree of automation in the testing process. The greater the size and/or complexity of the software, the greater is the need for automation. Yet, automation itself makes writing/using a test oracle more difficult.
The test oracle needs to be developed in such a way that it can “marry” expected outputs with corresponding tests so that success or failure can be determined automatically for the millions of test cases typically generated for a complex system. Also the oracle needs to be flexible enough to easily adjust to the dynamics of test generation.
Table 3 presents some comparisons of manual and automatic testing relative to test oracles.
Table 3: Comparison of Manual and Automated Testing
Manual Testing |
Automated Testing |
Manual testing is slow |
Automated testing is blazing fast (making manual/visual verification self-defeating) |
Fewer tests can be performed – tester must identify the “most important few” |
Millions of test can be performed, resulting in a much larger percentage of the functionality being covered during testing |
Oracles do not have to be as comprehensive (they only need to address a subset of significant behaviors the tester has time to perform) |
The test oracle must address all functionality addressed through automated testing – a much larger portion of possible behaviors |
Oracles can be manual (There is time to visually review screen output during manual testing) |
The high volume of tests prevents manual/visual assessment of results of individual test cases, necessitating automation of the test oracle implementation |
Finding/creating a test oracle can be an issue in MBT. Tests are generated automatically and in volume. Furthermore, test suites do not remain static. Thus, calculating expected outputs by hand is usually infeasible. Some work has explored the automatic generation of test oracles [Feather 1999]. In the absence of a good test oracle, one may need to settle for plausibility checks. Tests may be considered to have passed if their outputs are in certain ranges or they pass certain consistency checks. If the system is instrumented to identify its state, expected test outputs can include the system state. In many instances, this expected output would be generated automatically from the model in conjunction with test inputs.
In practice, this is often done by comparing the output, either automatically or manually, to some pre-calculated, presumably correct, output. However, if the program is formally documented it is possible to use the specification to determine the success or failure of a test execution. There is some current research relating to the development of a prototype tool that automatically generates a test oracle from formal program documentation.
Run tests: Most MBT environments are supported with test generation tools that generate tests (test cases) that can easily be translated into executable test scripts, or produce the test script directly from the test data contained within the tool. It is worthwhile investing some time in writing good efficient scripts, since they can be used for as long as the software needs testing (potentially through the maintenance cycle, too). Although tests can be run as soon as they are created, in most testing groups it is policy to run the tests only after a complete suite that meets certain adequacy criteria is generated. Typically there is a coverage plan that is being addressed. In some instances, only a small number of tests relating to a particular feature or component would be run, even though the complete suite has been generated. Having good test generation tools in place enhances the flexibility in scheduling and executing tests.
Compare actual with expected outputs: It is useless to have the capability to generate and run millions of tests unless you have a way to assess the results and take action based upon the results. The automation process in place should make the comparison of actual to expected outputs, and alert testers to the failures. Of course, this is dependent on the quality and completeness of the test oracle. MBT cannot make good information out of bad data. It is not a silver bullet. It should provide an efficient means to drill down into the particular test cases that failed. MTB is good at verifying the state of the software and cataloging state changes. It can provide assistance to testers (but not replace them) in verifying all aspects of the software.
Decide on further actions: Outputs from MBT can include:
§ A model of user behavior from which additional tests can be constructed
§ Test cases, including expected outputs for the test cases
§ Measures of test coverage attained by the generated tests
§ Test results from which the reliability of the system under test can be estimated
MBT supports management decision making relative to:
§ When to terminate system testing and release a software system
§ Revising the model
§ Generate more tests
There are typically four kinds of criteria on which release decisions can be made:
§ White box test coverage metrics. Some test automation tools track what percentage of the statements, branches, and so on, of the code of the system under test have been executed. One might decide to stop testing when a certain percentage has been attained for one of these white box coverage criteria.
§ Black box coverage criteria are based on characteristics of the user model developed under MBT. Many tools for MBT generate test suites to satisfy some black box coverage criterion.
§ Software Reliability. Many software reliability models can be fit to test data generated to resemble usage patterns in the field. Markov chain models of user behavior, for example, can be used to generate such test data. One might decide to release a software system when its reliability exceeds some goal at some level of confidence, as calculated by some specified reliability model.
§ Cost/Economic Metrics. Suppose one supplements a reliability model with data on the cost of finding bugs in the field, of finding bugs during testing, and the cost of testing for another unit of time. Some organizations use such data to make release decisions by comparing the cost of additional testing with the expected savings of finding a bug during testing, as opposed to in the field.
Analysis of test results may lead to identifying flaws in the model itself and warrant its revision.
Models not only give a good picture of what tests have been run, but also give insight into what tests haven’t been run. This information provides some guidance on when to stop testing and when to continue. A manager might choose to continue testing until there are no more non-repetitive tests presented by the model.
This section presents an illustrative example of Model-Based Testing. Te figure below is a Finite State Machine (FSM) model of a simple phone system. This model is of a phone that can call out. Nodes are states of the phone. Edges indicate actions that the user can take (these are inputs to the system). Test cases specify a sequence of inputs, the states the system is expected to be in after each action, and the value of all outputs of the system.
Finite State Machine for Simple Phone System
A Finite State Machine is an ordered quintuple of five sets:
1. A set of inputs, e.g., {Dial/Party Busy, Dial/Party Ready, Hang Up, Party Hangs Up, Party Picks Up, Pick Up}
2. A set of states, e.g., {Busy, Dial Tone, On-Hook, Ringing, Talking}
3. The set of initial states. Let the initial state be {On-Hook}
4. The set of final (terminal) states. Let this set be {On-Hook}
5. A partial function mapping from an ordered pair consisting of a state and an input to a subset of states. Figure 2 defines this function for the example. For example, f(Dial Tone, Dial/Party Busy) is {Busy}.
This FSM can be used to generate test cases. Table 5 shows a sequence of 15 inputs. This sequence begins and ends in the “On-Hook” state. It has the desirable property of executing every action possible at each state at least once. This property is called action coverage (the Figure can be thought of as showing a directed graph). The sequence is not unique, however. Using the model to generate test sequences, then, can result in a range of efficient test sequences, each stressing the software somewhat differently, and each achieving the same coverage criterion.
Table 5: Test Sequence Achieving Action Coverage
Action |
State |
|
Action |
State |
|
On-Hook |
|
Dial/Party Ready |
Ringing |
Pick Up |
Dial Tone |
|
Party Picks Up |
Talking |
Dial/Party Busy |
Busy |
|
Party Hangs Up |
Dial Tone |
Hang Up |
On-Hook |
|
Hang Up |
On-Hook |
Pick Up |
Dial Tone |
|
Pick Up |
Dial Tone |
Dial/Party Ready |
Ringing |
|
Dial/Party Ready |
Ringing |
Hang Up |
On-Hook |
|
Party Picks Up |
Talking |
Pick Up |
Dial Tone |
|
Hang Up |
On Hook |
One can think of a new test case starting each time the system enters the “On-Hook” state. Four test cases are shown in Table 5. Other test sequences covering all actions would have a different number of test cases. A test case should specify expected outputs, as well as the sequence of inputs. Suppose this phone system is instrumented to output its state. In this case, the expected outputs are generated from the FSM, as well. If the system has no other interesting outputs, a test oracle is not needed here.
Action coverage is the easiest test coverage criterion for a FSM model. Switch coverage provides the next level of rigor in measuring tests synthesized from a FSM. Switch coverage is met when, for each state, every pair of actions leading into and out of that state is covered in the test sequence. For example, consider the “Dial Tone” state. One possible path leading in and out of the “Dial Tone” state is the sequence (“Party Hangs Up”, “Dial/Party Ready”). Table 5 does not contain this sequence. Thus, this test sequence does not achieve switch coverage.
Table 6 shows a test sequence for achieving switch coverage. Twenty-six inputs are needed to achieve switch coverage, which is a greater number than the length of the sequence of inputs needed to achieve action coverage. This sequence is not unique. Other sequences of the same length can be generated which will achieve switch coverage. If one thinks of the state “On-Hook” as initiating a new test case, six test cases are presented in Table 6. The expected sequence of states is shown for each test case.
Table 6: Test Sequence Achieving Switch Coverage
Action |
State |
|
Action |
State |
|
On-Hook |
|
Dial/Party Ready |
Ringing |
Pick Up |
Dial Tone |
|
Party Picks Up |
Talking |
Hang Up |
On-Hook |
|
Party Hangs Up |
Dial Tone |
Pick Up |
Dial Tone |
|
Hang Up |
On-Hook |
Dial/Party Busy |
Busy |
|
Pick Up |
Dial Tone |
Hang Up |
On-Hook |
|
Dial/Party Ready |
Ringing |
Pick Up |
Dial Tone |
|
Party Picks Up |
Talking |
Dial/Party Ready |
Ringing |
|
Party Hangs Up |
Dial Tone |
Hang Up |
On-Hook |
|
Dial/Party Ready |
Ringing |
Pick Up |
Dial Tone |
|
Party Picks Up |
Talking |
Dial/Party Ready |
Ringing |
|
Party Hangs Up |
Dial Tone |
Party Picks Up |
Talking |
|
Dial/Party Busy |
Busy |
Hang Up |
On-Hook |
|
Hang Up |
On-Hook |
Pick Up |
Dial Tone |
|
|
|
SUMMARY CHARACTERISTICS
Enabling Practices: Link to Model-Based Testing Interrelationships Diagram
Enabled Practices: Link to Model-Based Testing Interrelationships Diagram
Impact Areas: Primary: Schedule Secondary: Cost; Quality
Life Cycle Phase: Production, deployment and maintenance
Scope/Authority: No consensus
Applicability: No indication
Use Indicators: Long test schedules; test development delays
Use Inhibitors: No indication
Appropriate Programs: High requirement volatility; highly complex software
Inappropriate Programs: Legacy and prototype programs
Barriers: Ignorance of model-based testing capabilities; expense in tool investment
Facilitators: Tools; training; incentives based on program quality
Model-Based Testing (MBT) can result in the following benefits:
§ Shorter schedules, lower cost, and better quality
§ A model of user behavior is one major artifact of Model-Based Testing
§ Enhanced communication between developers and testers in conjunction with developing the model
§ Early exposure of ambiguities in specification and design while developing the model [Robinson 1999]
§ Capability to automatically generate many non-repetitive and useful tests
§ Test harness to automatically run generated tests
§ Combination of MBT artifacts eases the updating of test suites for changed requirements (typically, only the model need be updated)
§ Capability to evaluate regression test suites (one can know what level of test coverage they obtain)
§ Capability to assess software quality (if tests are generated from a Markov model, the results of the tests provide inputs to typical software reliability models satisfying appropriate assumptions [El-Far 2001a]
These benefits all require an initial investment in tools and training.
DETAILED CHARACTERISTICS
Key Characteristics of the “Model-Based Testing” Gold Practice
Characteristic |
Comments |
Assumes Availability of Automation and Modeling Tools |
§ Modeling makes automatic generation of many test cases possible
§ Automation improves test coverage. Not possible to achieve the same degree of coverage with manual testing that is attainable with an automated testing system |
Formal Requirements Specifications |
§ Specification drives the model. The more complete the spec, the more likely the model will be a good reflection of the true behavior of the system
§ In some cases, the model (parts of the model) can be generated directly from the specification
§ Essential for large complex systems because many people need to develop a common understanding of the system
§ How test cases are generated depends on the notation used to record the behavior model, which is closely related to how requirements are recorded
§ Some requirements formalisms are (1) Software Cost Reduction Project by Naval Research Laboratory, (2) Specification and Description Language, used in the telecommunication industry, and (3) Universal Modeling Language Statecharts |
State Space Explosion |
§ A condition evident in finite state models in which the number of states of a system increases “beyond manageability”
§ Testers/modelers use “abstraction” and “exclusion” as two approaches to lowering the number of states in the model |
Skilled Testers |
§ Testers need to be knowledgeable about the modeling techniques and their underlying and supporting mathematics and theories
§ Working knowledge of finite state machines
§ Basic familiarity with formal languages, automata theory, graph theory, and elementary statistics
§ Temporarily assigning testing roles to failed programmers (or staff with nothing to do) will not work |
Large Complex Systems |
§ MBT is often the only way to address the volume of tests needed to ensure adequate coverage
§ MBT is adaptable to changing requirements |
Up-Front Investment in Time and Tools |
§ Sophisticated modeling tools are necessary but expensive
§ Most developers will need training in use of automation and modeling tools acquired
§ MBT not appropriate for short-lived systems – too much time and money up front is needed to reap any benefit – unless criticality of software demands it |
Requires Testing Infrastructure |
§ Requires a sustainable test bed to simulate the environment of the Software Under Test
§ Due to expense and sophistication of tools and equipment, it makes sense to share the infrastructrure across programs |
The objectives of black box tests, such as acceptance testing during System Test, are:
§ Increase confidence in the system under test
§ Find bugs in the system
§ Assess reliability or other certification measures
Given the size of the input domains for large systems, all input sequences cannot be tested. Thus, test cases must be constructed to sample from the input domain. How can test cases be generated to fulfill these test objectives? Research has shown that partition testing does not generate test cases that increase confidence. In any case, the tests generated from partition testing are not representative of typical usage and are, therefore, unsuited for reliability measurement. Suppose a test suite is handcrafted. Such a test suite will generally be difficult to update as program requirements evolve over the lifecycle. How can testing processes be devised so that tests can be more easily updated with changing requirements?
Issues relating to inefficient test cases and adapting test plans to updated requirements are manifested in long test schedules and development delays relating to testing. Acquisition managers should consider implementing MBT on programs in which these difficulties arise or are expected to arise. Tests can be adapted to changing requirements through updates to the model of user behavior. MBT generates tests to fulfill coverage criteria and a mechanism for adjusting testing rigor by choosing more rigorous testing criteria. Finally, MBT can provide data for reliability modeling, thereby supporting product measurement and the imposition of appropriate stopping criteria on testing. In other words, MBT provides quantitative data for metrics-based scheduling and management, and for progress measurement.
Implementing MBT requires an investment and some tailoring of the development process. Costs include training and tools. Tools directly related to MBT accomplish the following:
§ Construction of behavior model from requirements
§ Generation of abstract test cases
§ Conversion of abstract test cases to concrete test cases in a format appropriate for testing infrastructure
Tools for requirements and automated testing facilitate MBT:
§ Tools for capturing and maintaining requirements in a formal model (e.g., UML or SDL-based tools)
§ Configuration Management tools
§ Test harnesses
§ Tools to measure white-box test coverage metrics
§ Regression testing
§ Defect tracking system
§ Reliability modeling tools (e.g., Computer Aided Software Reliability Estimation (CASRE) or Statistical Modeling and Estimation of Reliability Functions for Systems (SMERFS))
Activities for MBT occur during every development and maintenance phase. Tests with test data generated from a model are conducted during Integration Test, System Test, and Operations and Maintenance. These are all examples of phases in which black box testing can be appropriate. Model-Based Testing provides guidance on whether such tests should be continued, as is also shown in the figure. The testers begin constructing the model in the requirements phase and update it during design and coding phases, as needed.
The Figure below represents a high-level process architecture for the subject practice, depicting relationships among this practice and the nature of the influences on the practice (describing how other practices might relate to this practice). These relationship statements are based on definitions of specific “best practices” found in the literature and the notion that the successful implementation of practices may “influence” (or are influenced by) the ability to successfully implement other practices. A brief description of these influences is included in the table below.
Process Architecture for the "Model-Based Testing" Gold Practice
Summary of Relationship Factors
INPUTS TO THE PRACTICE |
Define requirements to be modeled |
The modeler needs to have a complete understanding of the system in order to model its behavior. Theoretically, system behavior is reflected in the body of its requirements statements. Well-documented requirements provide the best foundation for determining what to model and when it is complete. Performance-Based Specifications are, in essence, statements of requirements addressing both behavior and performance. Without sound documented requirements, the modeler must rely on less stable sources of information which may result in a skewed or inconsistent interpretation of the system’s expected behavior. For large complex systems, it may be advantageous to capture requirements in rigorous model-based notation (such as UML) because it supports automated model generation, ensuring the greater likelihood of the test model truly reflecting the desired system behavior. The practices of Requirements Management and Requirements Tradeoffs/Negotiations both are premised on accepting the fact that requirements change, and, therefore, the development approach must incorporate change management.
The MBT approach provides the testing flexibility needed to address changing requirements and, at the same time, ensure test coverage. The Goal-Question-Metric Approach(GQM), and its broader counterpart, Practical Software Measurement(PSM) are often used in high level (sometimes strategic) planning to ensure that the envisioned product is well-aligned with the business goals, and that appropriate measures are identified for assessing the success/results of a product or initiative. Thus, GQM has an important role in requirements gathering, and also in assessing the value of using the MBT approach.
|
Assess the MBT approach |
MBT is a process that implements a testing philosophy, and it is closely tied to the overall development methods in place on a project. MBT is not always a good choice for all software intensive projects. Determining the best way to test the product is not trivial. IPPD provides the appropriate mix of domain experts and general information about a product (product line) to assist the decision-makers in deciding whether MBT makes sense. Typically, IPPD is employed to mitigate the risk of volatile requirements, and to address complexity in system development. MBT is, perhaps, more suitable in these cases, but it comes at a cost of resources and time, and often necessitates changes in the software development process in order to be effective. Independent Expert Reviews and Software Capability Evaluations (SCEs) help developers assess their processes, and perhaps determine their readiness to implement MBT.
|
Establish system and test structures |
Systems design typically encompasses all of these practices. Leveraging COTS/NDI is an important consideration in designing for affordability and often results in non-functional specifications or requirements that are addressed by MBT in building a model. COTS items represent a form of reuse and have their own associated risks and costs. Reuse assessments impact the components that are specified for the subject system. Use of the Open Systems approach may be a requirement to ensure greater interoperability.
Focusing on the architecture is an important step toward building affordable systems. Therefore, these mentioned practices may actually be required for a project, and have significant impact on how a system is built (and, therefore, on how the test model is constructed).
These practices also affect how the test infrastructure itself is constructed to support MBT, with a goal of creating a test environment that can adjust to changing technology and be sustained over time at an affordable cost. These practices might influence whether a test bed (and corresponding test team) is created on a project-by-project basis, or across an entire program. Should an organization build its own tools for MBT, or acquire COTS tools? Will the test architecture address the testing needs for the life cycle of the product (product line)?
|
Implement the test models |
All of these practices address processes useful for the development team (including the testing team). Much of the “payoff” of MBT comes from being able to automate. In order to automate, the developer must be able to define the ‘rules”, i.e., the policies and procedures of testing that are always applied in the same way, and the expected outputs of the system (behaviors and data for each given state). Learning to establish clear goals and decision points is part of the “rule building” mentality of MBT (e.g., establishing the test stopping criteria). Binary Quality Gates at the Inch Pebble Level is a planning/tracking practice (named in the 90s, but still relevant) that can help establish both the rigor and the details that are needed in MBT. Compiling and Smoke Testing Frequently is the practice of frequent testing of specific features or parts of a system as integration progresses. MBT makes it possible to do this “selective” testing (sometimes called regression testing) frequently, and to easily adjust the scope of testing to meet the need. A group already accustomed to frequent build and test scenarios will easily adapt to MBT. Configuration Management (CM) practices are essential for a successful MBT implementation. MBT involves thousands of artifacts and many versions of tests. The ability to use it effectively is premised on having a sound CM process in place.
|
OUTPUTS FROM THE PRACTICE |
Evaluate testing effectiveness
|
The test stopping criteria defined as part of MBT is based on quantitative metrics of test coverage, reliability, or other quality goals reflected in the requirements. Thus, MBT provides quantitative data for decision-making, scheduling and management. It supports the mental migration of the developer to a measuring and fact-based mentality. It supports the implementation of Statistical Process Control by providing test data which is used to monitor and control variation.
|
Communicate testing effectiveness
|
Since MBT is often automated, it can be used to actually perform functional tests (demonstrate the system functionality) as part of the review process. This is in contrast to manual testing, which is labor intensive and requires a lengthy time period to capture and report on test results. MBT provides objective data for use in establishing the actual accomplishments of an effort. Since tests are generated against a system model, results are well aligned with other planning and support true visibility into the development effort, making it easier to communicate progress across a program. The objective data can be used to determine when earned value should be credited as well. The most important influence of MBT is the objectivity of its test results.
|
RESOURCES:
§ Websites
0 Harry Robinson of Microsoft maintains a page, (http://www.model-based-testing.org/), devoted to Model-Based Testing. He provides links to papers available online, a short bibliography of references not available on the web, and a collection of useful links.
0 The Software Productivity Consortium (SPC) is a nonprofit partnership of industry, government, and academia. SPC has a Test Automation Framework (TAF), (http://www.software.org/pub/taf/) which instantiates Model-Based Testing. SPC provides training and tools for Model-Based Testing.
0 Automated Generation and Execution of test suites for DIstributed component-based Software (AGEDIS) (http://www.agedis.de/) is a 3-year research project funded by the European Commission. AGEDIS is a consortium carrying out research and development on the automation of software testing.
0 There are various sites with general information about testing:
· (http://www.mtsu.edu/~storm/) Software Testing Online Resources
· (http://www.testingfaqs.org/) Some answers to Frequently Asked Questions about testing
· (http://www.qaforums.com/) Software Testing Tools & Quality Assurance Online Discussions Board (requires user to register with site)
0 Model-Based Testing starts with requirements specifications. Here are some resources on techniques for modeling and specifying systems:
· (http://www.omg.org/uml) Object Management Group’s UML page
· (http://www.sdl-forum.org/) SDL Forum Society
· (http://www.sosym.org/) Software and Systems Modeling (A Springer journal)
· \(<http://www.afm.sbu.ac.uk/) Formal Methods resources on Web
0 The mathematics underlying Model-Based Testing includes graph theory, automata, and Markov processes. Here are some resources on graph theory.
· (http://www1.cs.columbia.edu/~sanders/graphtheory) Graph theory resources
· (http://www.math.fau.edu/locke/graphthe.htm) Stephen Locke’s graph theory page
0 Here are some applets for experimenting with simple examples of Finite State Machines and of Markov chains:
· (http://www.belgarath.org/java/fsme.html) Matt Chapman’s Finite State Machine Explorer
· (http://www.math.uah.edu/psol/applets/MarkovChainExperiment.html) Explores some simple Markov chains
§ Tools and Methods
From the viewpoint of software process automation, MBT attempts to integrate two classes of tools – requirements and testing. Inputs to an ideal tool for MBT would be requirements, as embodied in a requirements tool. Outputs would be test cases, including expected outputs of the system under test, a test harness to run tests with one of these tools, and help in analyzing test results.
State-of-the-art methods and tools that may be useful in implementing and improving the effectiveness of Model-Based Testing include:
§ Formal Requirements Specification. Some sort of formal specification of requirements expressed in an automated tool is useful for Model-Based Testing. These formal specifications may include behavior specifications from which test suites can be automatically generated, or they may be input into tools for Model-Based Testing that automatically generate behavior models.
§ The technology developed with Computer Aided Software Engineering (CASE) tools provides a useful requirements formalism. This technology includes the Software Cost Reduction (SCR) project at the Naval Research Laboratory, the Universal Modeling Language (UML), the Specification and Description Language (SDL), Entity-Relationship (ER) diagrams and other reliability formalisms, as embedded in CASE tools, such as those provided by Rational Rose, Together, or ArgoUML.
The table below provides information on tools for Model-Based Testing. Only tools implementing MBT are shown; general automation support for requirements and for testing is not covered here. For example, capture and playback tools are not shown.
Tools for Model-Based Testing
Tool Name |
Organization |
Inputs |
Outputs |
AETG Web Service |
Telcordia Technologies Applied Research |
Tabular definition of input parameters |
Test cases |
http://www.argreenhouse.com/demos/ |
Abstract State Machine Language (AsmL) |
Microsoft |
XML and Word |
Test generation based on total transition coverage of FSM |
http://research.microsoft.com/foundations/AsmL/ |
Conformiq Test Generator |
Conformiq Software, Limited |
UML state diagrams |
Test cases in TTCN format, including expected results. Test harness. |
http://www.conformiq.com/ |
Direct – To – Test (DTT) |
Software Prototype Technologies |
Models in custom language, cause and effect tables |
Test cases including expected outputs and executable test scripts |
http://www.softprot.com/ |
GOTCHA – TCBeans |
IBM Research Laboratory in Haifa |
Model in custom language |
Test cases, including expected outputs and test translation framework |
http://www.haifa.il.ibm.com/projects/verification/gtcb/index.html |
MulSaw |
Massachusetts Institute of Technology |
Model in Alloy modeling language or in Java Modeling Language (JML), including pre and post conditions. |
Test cases based on coverage criterion. |
http://mulsaw.lcs.mit.edu/ |
Reactis |
Reactive Systems, Incorporated |
Models in MatLab’s Simulink and Stateflow modeling language. |
Model simulator; test suite, including expected results |
http://www.reactive-systems.com/ |
SDL And MSC based Test case Generation (SAMSTAG) |
University of Fribourg |
SDL system specifications, MSC test purposes |
Test cases in TTCN format |
http://diuf.unifr.ch/telecom/samstag/ (Site possibly no longer active). |
SpecTest |
George Mason University |
Models in SCR or UML |
Test cases based on coverage criterion. |
http://www.isse.gmu.edu/~aynur/rsrch/SpecTest/ |
Telelogic Tau TTCN Suite |
Telelogic |
SDL model |
Test cases based on coverage criterion, including expected results. |
http://www.telelogic.com/ |
Test Generation with Verification (TGV) |
IRISA and VERIMAG Laboratories |
LOTOS, SDL, or IF specification model |
Test cases in TTCN format, including expected results. |
http://www.irisa.fr/pampa/VALIDATION/TGV/ |
Test Vector Generation System (TVEC) |
TVEC Technologies |
Behavior model in proprietary language or SCR model. |
Abstract test cases and executable test program. |
http://www.t-vec.com/ |
The Object-oriented Software Testing Environment (TOSTER) |
Warsaw University of Technology |
UML state diagrams |
Generates and runs test cases, generates expected outputs. |
http://home.elka.pw.edu.pl/~alasota/ |
TorX |
University of Twente |
LOTOS, PROMELA, or SDL model |
Test cases. |
http://fmt.cs.utwente.nl/tools/torx/ |
Unified Testing and Specification Toolkit (UniTesK) |
Institute for System Programming of the Russian Academy of Sciences (ISPRAS) |
Model in custom specification languages for Java and C++. Pre and Post conditions. |
Test cases, test driver satisfying branch coverage of post conditions. |
http://www.ispras.ru/~RedVerst/RedVerst/Methodologies/UniTesK/Main.html |
§ Experts/Contact Points
0 Larry Apfelbaum, Teradyne Software & Systems Test
http://www.teradyne.com/sst; larry@sst.teradyne.com
0 Ibrahim K. El-Far, Florida Institute of Technology
http://testingresearch.com/Ibrahim/; ielfar@testingresearch.com or ielfar@acm.org
0 Alan Hartman, IBM Haifa Research Laboratory, hartman@il.ibm.com
0 Heiko Loetzbeyer
http://www.loetzbeyer.de/heiko/index.html; loetzbeyer@informatik.tu-muenchen.de
0 Alexander Pretschner
http://www4.in.tum.de/~pretschn/; pretschn@in.tum.de
0 Harry Robinson, Microsoft
http://www.model-based-testing.org/; harryr@microsoft.com
0 James A. Whittaker, Florida Institute of Technology
http://www.cs.fit.edu/~jw/; jw@cs.fit.edu
Harry Robinson, Microsoft http://www.model-based-testing.org/; harryr@microsoft.com, provides information and links to other sources through his web site as well as the opportunity for visitors to address issues with MBT via email.
§ Training Opportunities:
0 International Institute for Software Testing (http://www.testinginstitute.com/courses.php)
0 Software Productivity Consortium has several courses and technical resources that relate specifically to MBT: (Access to the Software Productivity Consortium requires user registration).
· Model-Based Development and Automated Testing This course uses lecture and exercises to discuss the motivation, conception, and implementation of a testing environment that supports automated model analysis, model-based test generation, and test execution based on the Consortium's Test Automation Framework (TAF). https://www.software.org/catalog/products/product.asp?pfid=46
· Component Assessment Using Specification-Based Analysis and Testing (A Technical Report) This report presents the results of a scientific study to explore an approach for the evaluation of components using a strategy that focuses on specification-based testing. https://www.software.org/catalog/products/product.asp?pfid=144
· Specification Transformation to Support Automated Testing This technical report addresses the key challenge of developing transformation rules and tools for translating modeling languages (e.g., Unified Modeling Language [UML], SCR/CoRE) into a form that is suitable for automated test vector generation, specification-based test coverage analysis, requirement-to-test, and design-to-test traceability. This report provides specific guidance for developing translators that integrate specification and model-development tools with test vector generation tools. https://www.software.org/catalog/products/product.asp?pfid=147
· Test Automation Framework
http://www.software.org/pub/taf/testing.html; https://www.software.org/catalog/products/product.asp?pfid=45
§ Bibliography:
This bibliography contains many references describing applications of MBT, tools implementing MBT, or otherwise supporting a point about MBT. Some references are of more general interest. [Apfelbaum and Doyle, 1997], [Dalal et. al., 1999], and [El-Far and Whittaker, 2001] provide introductions and overviews of MBT. [Robinson, 1999] provides a clear introductory explanation of test generation algorithms for some simple black box coverage criteria. [Hartman, 2002] provides a survey of tools implementing MBT. [Fujiwara et. al., 1991] provide a canonical technical presentation of the approach using Finite State Machines. [Whittaker and Thomason, 1994] provide a parallel presentation of the approach using Markov chains.
[Aharon, 1995] |
A. Aharon, D. Goodman, M. Levinger, Y. Lichtenstein, Y. Malka. C. Metzger, M. Molcho, and G. Shurek, “Test Program Generation for Functional Verification of PowerPC Processors in IBM”. Design Automation Conference. 1995 |
[Al-Ghafees, 2002] |
M. Al-Ghafees and J. A. Whittaker, “Markov Chain-based Test Data Adequacy Criteria: a Complete Family”. June 2002 |
[Apfelbaum, 1997] |
L. Apfelbaum and J. Doyle, “Model-Based Testing”. Software Quality Week Conference. May 1997 |
[Berger, 1997] |
B. Berger, M. Abuelbassal, and M. Hossain, “Model Driven Testing”. March 1997 |
[Blackburn, 1997] |
M. R. Blackburn, R. D. Busser, and J. S. Fontaine, “Automatic Generation of Test Vectors for SCR-Style Specifications”. Proceedings of the 12th Annual Conference on Computer Assurance (COMPASS97). 16019 June 1997 |
[Blackburn, 2001a] |
M. Blackburn, R. Busser, A. Nauman, R. Knockerbocker, and R. Kasuda, “Mars Polar Lander Fault Identification Using Model-Based Testing”. .” 26th Annual NASA Goddard Software Engineering Workshop. 27-29 November 2001 |
[Blackburn, 2001b] |
M. Blackburn, R. Busser, A. Nauman, and R. Chandramouli, “Model-Based Approach to Security Test Automation.” Proceedings of Quality Week 2001. June 2001 |
[Cheung, 1980] |
R. C. Cheung, “A User-Oriented Software Reliability Model”. IEEE Transactions on Software Engineering. V. 6, No. 2, pp. 118-125, March 1980 |
[Chow 1978] |
T. S. Chow, “Testing Software Design Modeled by Finite-State Machines”. IEEE Transactions on Software Engineering. V. 4, No. 3, pp. 178-187, May 1978 |
[CMMI, 2002a] |
Capability Maturity Model Integration (CMMI), Version 1.1: CMMI for Systems Engineering, Software Engineering, Integrated Product and Process Development, and Supplier Sourcing (CMMI-SE/SW/IPPD/SS, V1.1), Continuous Representation. CMU/SEI-2002-TR-011, ESC-TR-2002-011. March 2002
http://www.sei.cmu.edu/pub/documents/02.reports/pdf/02tr001.pdf
|
[CMMI, 2002b] |
Capability Maturity Model Integration (CMMI), Version 1.1: CMMI for Systems Engineering, Software Engineering, Integrated Product and Process Development, and Supplier Sourcing (CMMI-SE/SW/IPPD/SS, V1.1), Staged Representation. CMU/SEI-2002-TR-012, ESC-TR-2002-012. March 2002
http://www.sei.cmu.edu/pub/documents/02.reports/pdf/02tr002.pdf
|
[Dalal, 1998] |
S. R. Dalal, A. Jain, N. Karunanithi, J. M. Leaton, and C. M. Lott, “Model-Based Testing of a Highly Programmable System”. Proceedings of ISSRE-98. 5-7 November 1998 |
[Dalal, 1999] |
S. R. Dalal, A. Jain, N. Karunanithi, J. M. Leaton, C. M. Lott, G. C. Patton and B. M. Horowitz, “Model-Based Testing in Practice”. Proceedings of the ICSE’99. May 1999 |
[DODD 5000.2] |
DoD 5000.2-R, “Mandatory Procedures for Major Defense Acquisition Programs (MDAPS) and Major Automated Information System (MAIS) Acquisition Programs”, 5 April 2002
|
[El-Far, 2001a] |
I. K. El-Far, “Enjoying the Perks of Model-Based Testing”. Proceedings of the Software Testing, Analysis, and Review Conference (STARWEST 2001). October/November 2001 |
[El-Far, 2001b] |
I. K. El-Far, H. H. Thompson, and F. E. Mottay, “Experiences in Testing Pocket PC Applications”. Proceedings of the Fifth International Internet & Software Quality Week Europe Conference (QWE 2001). November 2001 |
[El-Far, 2001c] |
I. K. El-Far and J. A. Whittaker, “Model-Based Software Testing”. Encyclopedia of Software Engineering (edited by J. J. Marciniak). Wiley, 2001 |
[Farchi, 2002] |
E. Farchi, A. Hartman, and S. S. Pinter, “Using a Model-Based Test Generator to Test for Standard Conformance”. IBM Systems Journal. V. 41, No. 1, pp. 89-110, 2002 |
[Feather, 1999] |
M. S. Feather and B. Smith, “Automatic Generation of Test Oracles – From Pilot Studies to Application”. Fourteenth IEEE Automated Software Engineering Conference (ASE-99). October 1999 |
[Frankl, 1988] |
P. G. Frankl and E. J. Weyuker, “An Applicable Family of Data Flow Testing Criterion”. IEEE Transactions on Software Engineering. V. 14, No. 10, pp. 1483-1498, October 1998 |
[Fujiwara, 1991] |
S. Fujiwara, G. v. Bochmann, F. Khendek, M. Amalou, and A. Ghedamsi, “Test Selection Based on Finite State Models”. IEEE Transactions on Software Engineering. V. 17, No. 6, pp. 591-603, June 1991 |
[Gronau, 2000] |
Gronau, A. Hartman, A. Kirshin, K. Nagin, and S. Olvovsky, A Methodology and Architecture for Automated Software Testing. IBM Research Laboratory in Haifa Technical Report, 2000 |
[Hamlet, 1990] |
D. Hamlet and R. Taylor, “Partition Testing Does Not Inspire Confidence”. IEEE Transactions on Software Engineering. V. 16, No. 12, pp. 1402-1411, December 1990 |
[Hartman, 2002] |
A. Hartman, Model-Based Test Generation Tools. Agedis Consortium, 22 November 2002 |
[Heitmeyer, 1995] |
C. Heitmeyer, A. Bull, C. Gasarch, and B. Labaw, “SCR*: A Toolset for Specifying and Analyzing Requirements”. Proceedings of the Tenth Annual Conference on Computer Assurance (COMPASS 95). 25-29 June 1995 |
[Heitmeyer, 1998] |
C. Heitmeyer, “SCR: A Practical Method for Requirements Specification”. Proceedings of the 17th Digital Avionics Systems Conference. 3 October – 7 November, 1998 |
[Horgan, 1994] |
J. R. Horgan, S. London, and M. R. Lyu, “Achieving Software Quality with Testing Coverage Measures”. IEEE Computer. V.27, No. 9, pp. 60-69, September 1994 |
[IEEE, 1990a] |
Institute of Electrical and Electronics Engineers. IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries. 1990 |
[IEEE, 1990b] |
Institute of Electrical and Electronics Engineers. IEEE Standard Glossary of Software Engineering Terminology. IEEE Std 610.12-1990. 1990 |
[INTERIM, 2002] |
(Replaces DoD 5000.2-R, canceled 30 October 2002)
|
[Jessop, 1976] |
W. H. Jessop, J. R. Kane, S. Roy, and J. M. Scanlon, “ATLAS – An Automated Software Testing System”. Proceedings of 2nd International Conference On Software Engineering. San Francisco, CA, October 1976 |
[Liuying, 1999] |
L. Liuying, Q. Zhichang, “Test Selection from UML Statecharts”. Technology of Object-Oriented Languages and Systems (TOOLS), 1999 |
[Ntafos, 2001] |
S. C. Ntafos, “On Comparisons of Random, Partition, and Proportional Partition Testing”. IEEE Transactions on Software Engineering. V. 27, No. 10, October 2001 |
[Offut, 1999] |
A. J. Offutt and A. Abdurazik, “Generating Tests from UML Specifications”. Second International Conference on the Unified Modeling Language (UML99). Fort Collins, CO, October 1999 |
[Offut, 2003] |
A. J. Offutt, S. Liu, and A. Abdurazik, “Generating Test Data from State-Based Specifications”. Journal of Software Testing, Verification & Reliability. V. 13, No. 1, March 2003 |
[Poore, 1993] |
J. H. Poore, H. D. Mills, David Mutchler, “Planning and Certifying Software System Reliability”. IEEE Software. pp. 88-99, January 1993 |
[Pretschner, 2001] |
A. Pretschner, H. Lotzbeyer, and J. Philipps, “Model-Based Testing in Evolutionary Software Development”. 12th International Workshop on Rapid System Prototyping. 2001 |
[Prowell, 1999] |
S. J. Prowell, C. J. Trammell, R. C. Linger, and J. H. Poore, Cleanroom Software Engineering: Technology and Process. Addison-Wesley, 1999 |
[Robinson, 1999] |
Harry Robinson, “Graph Theory Techniques in Model-Based Testing”. 1999 International Conference on Testing Computer Software |
[Siegrist, 1988a] |
K. Siegrist, “Reliability of Systems with Markov Transfer of Control”. IEEE Transactions on Software Engineering. V. 14, No. 7, pp. 1049-1053, July 1988 |
[Siegrist, 1988b] |
K. Siegrist, “Reliability of Systems with Markov Transfer of Control”. IEEE Transactions on Software Engineering. V. 14, No. 10, pp. 1478-1480, October 1988 |
[SPMN, 1998] |
The Program Managers Guide to Software Acquisition Best Practices, Software Program Managers Network, April 1998
http://www.spmn.com/best_practices.html |
[Tahat, 2001] |
L. H. Tahat, B. Vaysburg, B. Korel, and A. J. Bader, “Requirement-Based Automated Black-Box Test Generation”. 25th Annual International Computer Software and Applications Conference (COMPSAC 2001. |
[Turner, 2002] |
Turner, R.G., “Implementation of Best Practices in U.S. Department of Defense Software-Intensive System Acquisitions”, Ph.D. Dissertation, George Washington University, 31 January 2002 |
[Vienneau, 1991] |
Vienneau, R. L., “The Cost of Testing Software”. Annual Reliability and Maintainability Symposium. Orlando, Florida, 1991 |
[Vienneau, 2003] |
Vienneau, R. L., "An Overview of Model-Based Testing for Software", Data and Analysis Center for Software, CR/TA 12, June 2003 |
[Weyuker, 1988] |
E. J. Weyuker, “The Evaluation of Program-Based Software Test Data Adequacy Criteria”. Communications of the ACM. V. 31, no. 6, pp. 668-675, June 1988 |
[Weyuker, 1998] |
E. J. Weyuker, “Testing Component-Based Software: A Cautionary Tale”. IEEE Software. pp. 54-59, September/October 1998 |
[Whittaker, 1993] |
J. A. Whittaker and J. H. Poore, “Markov Analysis of Software Specifications”. ACM Transactions on Software Engineering Methodology. V. 2, pp. 93-106, January 1993 |
[Whittaker, 1994] |
J. A. Whittaker and M. G. Thomason, “A Markov Chain Model for Statistical Software Testing”. IEEE Transactions on Software Engineering. V. 20, No. 10, pp. 812-824, October 1994 |
[Whittaker, 1997] |
J. A. Whittaker, “Stochastic Software Testing”. Annuals of Software Engineering. V. 4, pp. 115-131, 1997 |
APPENDICES
Model-Based Testing (MBT) is a black-box “approach in which common testing tasks such as test case generation and test result evaluation are based on a model of the application under test”. Typically, the system’s data and user behavior are modeled using Finite State Machines, Markov processes, or decision tables.
- [El-Far, 2001c]
Increasing interest in MBT in the 1990s is a response to problems found in testing medium-size to large software systems. Acceptance testing became more formalized with the adoption of software engineering practices. Previously, random and partition testing were the predominant black box testing methodologies. Hamlet (1990) responded to weaknesses in partition testing by calling for the automatic generation of tests and further research on formal specifications addressing the needs of testers.
Furthermore, experience revealed additional problems in the use and maintenance of test suites generated manually. As with antibiotics treating bacteria and with pesticides treating bugs, a static test suite will eliminate susceptible faults, but leave behind a population of super-bugs. Consequently, the suite becomes less useful, and a static test suite is difficult to update as a software system evolves over the life cycle [Robinson 1999]. The foundation for addressing these needs, though, had already been developed.
MBT draws on mathematics that predates electronic computers. It is an adaptation to software of techniques originally developed for testing hardware. Finite State Machines and Markov chains are the most popular techniques for modeling user behavior, the key to successful MBT.
Automation in testing is also useful for MBT. Scripts for running tests have been developed, as well as more sophisticated tools to record and play back tests.
§ Capability Maturity Model Integration (CMMI), Version 1.1, Software Engineering Institute, CMU/SEI-2002-TR-011, TR-012, March 2002
The CMMI does not specifically mention the practice of “Model-Based Testing” but it does discuss establishing a testing approach as part of the “verification” key process area (KPA).
“Verification” KPA. Under CMMI the organization establishes a process for verification of the software:
Select Work Products for Verification |
§ Enables the identification of the work products to be verified, the methods to be used to perform the verification, and the requirements to be satisfied by each selected work product |
Establish the Verification Environment |
§ Enables the determination of the environment that will be used to carry out the verification |
Establish Verification Procedures and Criteria |
§ Enables the development of verification procedures and criteria that are aligned with the selected work products, requirements, methods, and characteristics of the verification environment |
Perform Verification |
§ Conducts the verification according to the available methods, procedures, and criteria |
The verification methods address the technical approach to work product verification and the specific approaches that will be used to verify that specific work products meet their requirements.” Examples include:
0 Path coverage testing
0 Load, stress, and performance testing
0 Decision-table-based testing
0 Functional-decomposition-based testing
0 Test-case reuse
0 Acceptance tests
§ DoD 5000.2-R, “Mandatory Procedures for Major Defense Acquisition Programs (MDAPS) and Major Automated Information System (MAIS) Acquisition Programs”, 5 April 2002 [CANCELLED 30 October 2002]
§ Interim Defense Acquisition Guidebook, 30 October 2002 [INTERIM 2002]
The Program Manager (PM) shall identify and fund required M&S resources early in the acquisition life cycle, so that M&S may be integrated with the Test &Evaluation (T&E) program. The PM shall use test results to revise both the test program and test procedures. Test results shall also be used to develop and improve models and simulations. The T&E Working-level Integrated Product Team (WIPT) shall develop and document a robust, comprehensive, and detailed evaluation strategy for the Test & Evaluation Master Plan (TEMP), using both simulation and test resources, as appropriate. Operational Test Agencies (OTAs) shall develop evaluation plans consistent with the evaluation strategy.
§ Turner, R., “Implementation of Best Practices in US Department of Defense Software-Intensive Systems”, Dissertation, George Washington University, January, 2002
Turner, with the sponsorship of the Software-intensive Systems Office of the Deputy Under Secretary of Defense (Science and Technology), developed a survey on the characteristics of best practices for software acquisition. He conducted a pilot survey, with 23 respondents, on eight best practices. The best practices were chosen based on past literature and expert recommendations. Barry Boehm (University of Southern California) identified Model-Based Testing as a best practice. Survey respondents could choose which best practices to report on; three chose Model-Based Testing.
Turner found that Model-Based Testing, “while new, promises to provide significant savings” [Turner 2002]. His survey respondents described Model-Based Testing as “Adoptable” on a scale of Immature/Adoptable/Mature.
§ Cleanroom Software Engineering
Proponents of the “cleanroom software engineering methodology” recommend Model-Based Testing as a necessary practice for measuring/assessing/certifying the quality of the software. Basically, the Cleanroom certification process is Model-Based Testing tailored for reliability measurement. Cleanroom alludes to a technique used in semiconductor fabrication to prevent defects, and is a methodology for software development that integrates several software engineering technologies:
0 Incremental development
0 Formal specifications
0 Stepwise refinement
0 Structured programming
0 Formal verifications
0 Formal reviews
0 Statistical testing
0 Certification (reliability) measurement
Detailed technical guidance regarding MBT is embedded in the literature on Cleanroom Software Engineering. A detailed discussion of the Cleanroom methodology is beyond the scope of this document, but additional information is provided in [Vienneau, 2003].
GLOSSARY
Airlie Council |
Refers to a group of experts convened by the Navy’s Software Program Manager’s Network (SPMN) in 1995 who established/identified nine best practices. These practices have been augmented with other practices since 1995, and in current literature are referenced as the original Airlie best practices. |
Best Practice |
A documented practice aimed at lowering an identified risk in a system acquisition and is required or recommended by a bona fide DoD, industry, or academic source.
Methodologies and tools that consistently yield productivity and quality results when implemented in a minimum of 10 organizations and 50 software projects, and is asserted by those who use it to have been beneficial in all or most of the projects. |
CASE |
Computer Aided Software Engineering |
CASRE |
Computer Aided Software Reliability Estimation |
CBSE |
Component Based Software Engineering |
Cleanroom Software Engineering |
A theory-based team-oriented process for development and certification of high-reliability software systems under statistical quality control [Mills 92, Linger 93, Linger 94]. A principal objective of the Cleanroom process is development of software that exhibits zero failures in use. The Cleanroom name is borrowed from hardware Cleanrooms, with their emphasis on rigorous engineering discipline and focus on defect prevention rather than defect removal. Cleanroom combines mathematically based methods of software specification, design, and correctness verification with statistical, usage-based testing to certify software fitness for use. |
CMMI |
Capability Maturity Model Integration |
COTS |
Commercial Off-The-Shelf |
Decision Table |
A table used to show sets of conditions and the actions resulting from them. (IEEE-1990b) |
Entity Relationship Diagram (ERD) |
A diagram that depicts a set of real-world entities and the logical relationships among them (IEEE 1990b) |
ERP |
Enterprise Resource Planning |
Failure |
An event in which an item fails to perform one or more of its required functions within specified limits under specific conditions (DACS Software Reliability Sourcebook) |
Fault |
An incorrect step, process, or data definition in a computer program (IEEE 1990b) |
Finite State Machine (FSM) |
A computational model consisting of a finite number of states and transitions between those states, possibly with accompanying actions (IEEE 1990b). A technique for modeling user behavior (synonymous with “Finite State Automata”). |
GQM |
Goal-Question-Metric approach |
Grammar |
A behavior modeling technique that describes the syntax of programming and other input languages. |
IER |
Independent Expert Review |
IPPD |
Integrated Product and Process Development |
KPA |
Key Process Area |
M&S |
Modeling & Simulation |
MAIS |
Major Automated Information Systems |
Markov Chain |
A discrete, stochastic process in which the probability that the process is in a given state at a certain time depends only on the value of the immediately preceding state. A technique for modeling user behavior. |
Markov Process |
See Markov Chain |
MBT |
Model-Based Testing |
MDAPS |
Major Defense Acquisition Programs |
Message Sequence Chart (MSC) |
A part of the Testing and Test Control Notation (TTCN) used to record the purpose of a test |
Model |
A simplification of reality that provides a complete description of a system |
NDI |
Non-Develement Item |
Operational Profile |
A probability distribution that describes how the software will be used when operating in the field (Weyuker 1998) |
OTAs |
Operational Test Agencies |
PM |
Program Manager |
PSM |
Practical Software Measurement |
Quality Scenarios |
Quality scenarios are descriptions that embody quality requirements and make them concrete (e.g. ”Estimate the impact of a 10% increase in users per year” is a performance scenario) |
Reliability |
The ability of a system or component to perform its required functions under stated conditions for a specified period of time (IEEE 1990b). |
Requirements Specification |
In systems/software engineering, a document that states the functions that software must perform, required level of performance (speed, accuracy, etc.), the nature of the required interfaces between the software product and its environment, the type and severity of constraints on design, and the quality of the final product. See also Software Requirements Specification (Richard Thayer). |
SCE |
Software Capability Evaluations |
SCR |
Software Cost Reduction |
SDL |
Specification and Description Language |
SIS |
Software-Intensive System |
SMERFS |
Statistical Modeling and Estimation of Reliability Functions for Systems |
Software Cost Reduction (SCR) |
A project at the Naval Research Laboratory (NRL) that developed and investigated methodologies for improved software production. In particular, the requirements methodology developed by this software. |
Software Requirements Specification (SRS) |
A document that clearly and precisely describes each of the essential requirements (functions, performance, design constraints, and quality attributes) of the software and the external interfaces. Each requirement is defined in such a way that its achievement can be objectively verified by a prescribed method, for example, inspection, demonstration, analysis, or test (ANSI/IEEE Standard 830-1984). |
Specification and Description Language (SDL) |
A language defined by International Telecommunication Union (ITU) Recommendation Z.100. The language is intended to be used from requirements to implementation, is suitable for real-time stimulus-response systems, is presented in a graphical form, has a model based on communicating processes (Extended Finite State Machines), and provides an object-oriented description of SDL components. |
SPMN |
Software Program Managers Network |
Statechart |
A behavior diagram specified as part of the Unified Modeling Language (UML). A statechart depicts the states that a system or component can assume, and shows the events or circumstances that cause or result from a change from one state to another (IEEE 1990b). |
Stochastic Process |
Formally, an indexed set of random variables. Typically, the index denotes time, and the random variables show how the state of a system evolves over time. |
T & E |
Test and Evaluation |
TAF |
Test Automation Framework |
TEMP |
Test and Evaluation Master Plan |
Test |
(1) An activity in which a system or component is executed under specified conditions, the results are observed or recorded, and an evaluation is made of some aspect of the system or component. (2) To conduct an activity as in (1). (3) A set of one or more test cases (IEEE 1990b). |
Test Case |
A set of test inputs, execution conditions, and expected results developed for a particular objective, such as to exercise a particular program path or to verify compliance with a specific requirement (IEEE 1990b). |
Test Coverage |
The degree to which a given test or set of tests addresses all specified requirements for a given system or component. (IEEE 1990b) |
Test Coverage |
A measure of the proportion of a program exercised by a test suite, usually expressed as a percentage. This will typically involve collecting information about which parts of a program are actually executed when running the test suite in order to identify which branches of conditional statements which have been taken. |
Test Oracle |
The criterion used to check the correctness of the output. |
Test, Acceptance |
Formal test conducted to determine whether or not a system satisfies its acceptance criteria and to enable the user, customer, or other authorized entity to accept the system (IEEE 1990b). |
Test, Alpha |
The first phase of testing in a software development process which is typically done in-house. It includes unit testing, component testing, and system testing. |
Test, Beta |
The second phase of software testing in which a sampling of the intended audience tries the product out. This term derives from early 1960s terminology for product cycle checkpoints, first used at IBM, but later standard throughout the industry. |
Test, Black Box |
(Also known as Functional Testing). Test that ignores the internal mechanism of a system or component and focuses solely on the outputs generated in response to select inputs and execution conditions (IEEE 1990b). |
Test, Clear Box |
(Also known as Structural Testing). Test that takes into account the internal mechanism of a system or component. Types include branch testing, path testing, and statement testing (IEEE 1990b). |
Test, Integration |
Test in which software components are combined and tested to evaluate the interaction between them (IEEE 1990b). |
Test, Model-Based |
A general term that signifies an approach that bases common testing tasks such as test case generation and test result evaluation on a model of the application under test. |
Test, Partition |
Test in which the input domain of the system under test is partitioned into disjoint sub domains and test cases are constructed based on this partitioning. |
Test, Random |
Test in which test cases are selected randomly from the input domain of the system under test. |
Test, System |
Test conducted on a complete, integrated system to evaluate the system’s compliance with its specified requirements (IEEE 1990b). |
Test, Unit |
Test of individual software units or groups of related units (IEEE 1990b). |
Test, White Box |
See Test, Clear Box |
Testing and Test Control Notation (TTCN) |
A language used to write detailed test specifications. TTCN-3 is defined by the European Telecommunications Standards Institute (ETSI) ES 201 873 series and by the International Telecommunication Union (ITU) Z.140. |
Tree and Tabular Combined Notation (TTCN) |
Previous name for Testing and Test Control Notation. |
Unified Modeling Language (UML) |
A language for specifying, visualizing, and documenting models of software systems, including their structure and design. UML defines twelve types of diagrams: four structural diagrams, five behavior diagrams, and three model management diagrams. |
Universal Modeling Language (UML) |
See Unified Modeling Language. |
Use Case |
A piece of functionality in the system that gives a user a result of value.
A technique for reasoning about/describing the behavior of a system in a concrete setting. A technique for capturing functional requirements and making them concrete instead of conceptual. |
WIPT |
Working-level Integrated Product Team |
XML |
Extensible Markup Language |
Just a few years ago researchers were calling for more empirical work with Model-Based Testing:
§ No large empirical study has been conducted to measure efficacy of this new approach… [Dalal 1999]
§ There have been no comprehensive studies of the efficiency or effectiveness of the various model coverage proposals… This is a gaping hole in the theory of MBT. [El-Far 2001c]
Now, in 2003, we have many more case studies, reflecting increased interest in Model-Based Testing. Still there is a need for more formal experiments with Model-Based Testing, especially in industrial environments. Many experiments have been conducted with white-box test coverage metrics, investigating how effectiveness in finding faults varies with increasing coverage criteria (e.g., [Horgan 1994]). More work is needed for black-box coverage criteria; [Al-Ghafees 2002] is one such study.
The following table summarizes some case studies and experiments with Model-Based Testing that were found in the literature.
Case Studies Involving Model-Based Testing
Application |
Reference |
Summary |
Metrics |
Power PC |
|
The Model-Based Test Generator is an expert system containing a formal model of a processor architecture and a heuristic database. |
Found 1530 bugs in 3 PowerPC processors with 1st silicon realizations fully functional. Reduced time to market. |
Clock Display, Two Microsoft Pocket PC Applications |
|
Defined over 20 black-box coverage criteria focused on data aspects of model. Generated tests for new criteria, structural criteria, and random tests for a 12-state clock display application, a 49 Pocket PC application, and an 864-state Pocket PC application. |
Number of faults detected for two applications. (Each contained 10 faults). Number of transitions (cost) for third application. |
Mars Polar Lander |
|
Test cases generated for 50 source lines of code for Touchdown Monitor. Identified a fault that had previously caused mission failure. |
19 tests, 12 staff hours modeling requirements and building test driver |
Security |
|
Security requirements were expressed in the format developed in the Software Cost Reduction (SCR) project. T-VEC was used to model behavior and generate tests. Tests were run against Oracle and Interbase database engines in a fully automated process. |
40 test vectors |
Telephony |
|
Developed TestMaster state-based model of a digital exchange for a global switching network. |
Effort in manually and automatically developing tests. 80% increase in productivity. |
Telephony |
|
Test cases generated for three releases of Bellcore’s Intelligent Services Control Point and for a particular message set supported by the ISCP. ISCP contains several million lines of code. The ISCP project will be used to generate tests at low cost for future versions. |
~6,100 tests
~440 failures, some data on cost of testing |
Two Telephony, Workforce Management and Tasking, GUI |
|
Four case studies with millions of lines of code. Tests generated automatically using an approach based on statistical design of experiments; test harnesses hand written. Detected faults not detected by parallel non-Model-Based Testing group. |
Tests, failures, failure classes |
Microsoft Pocket PC Applications |
|
FSMs used to generate test cases for 5 standalone components on Microsoft platform for handheld devices. |
None reported |
Standard Conformance (aspects of POSIX and Java) |
|
FSMs were used to model POSIX byte range locking API and Java exception handling standards. Conformance tests required by the standards are weaker than test suites produced by these models. |
Rqmts-based coverage criteria, statement coverage. 9 POSIX tests. 6,914 lines of Java code |
Internet Telephony, Two Releases of Internet API, POSIX File System, Call Center API |
|
Five IBM experiments; one experiment was understaffed and failed. Tests were generated from a Finite State Machine model. More defects, and more severe, were found than development teams expected. |
Defects |
Cruise Control |
|
Cruise control system with 400 lines of C code seeded with 24 bugs, each in a separate version. Twelve test cases generated for transition coverage; 54 generated for full predicate coverage; and 54 generated randomly for control set. Tests generated for specification-based coverage criteria outperformed random testing. |
24 faults, faults detected, block and decision (white-box) coverage criteria |
|