Chapter 4 – Web Application Performance Testing Core Activities
Performance Testing Guidance for Web Applications
J.D. Meier, Carlos Farre, Prashant Bansode, Scott Barber, and Dennis Rea
Microsoft Corporation
September 2007
Objectives
- Learn the seven core activities that are integral to the majority of performance-testing projects.
- Understand the seven core activities in sufficient detail to identify how your tasks and processes map to these activities.
- Understand various performance-testing approaches that can be built around the core activities.
Overview
This chapter provides a high-level introduction to the most common core activities involved in performance-testing your applications and the systems that support those applications. Performance testing is a complex activity that cannot effectively be shaped into a “one-size-fits-all” or even a “one-size-fits-most” approach. Projects, environments, business drivers, acceptance criteria, technologies, timelines, legal implications, and available skills and tools simply make any notion of a common, universal approach unrealistic.
That said, there are some activities that are part of nearly all project-level performance-testing efforts. These activities may occur at different times, be called different things, have different degrees of focus, and be conducted either implicitly or explicitly, but when all is said and done, it is quite rare when a performance-testing-project does not involve at least making a decision around the seven core activities identified and referenced throughout this guide. These seven core activities do not in themselves constitute an approach to performance testing; rather, they represent the foundation upon which an approach can be built that is appropriate for your project.
How to Use This Chapter
Use this chapter to understand the core activities of performance testing and what these activities accomplish. To get the most from this chapter:
- Use the “Summary Table of Core Performance Activities” section to get an overview of the core activities of performance testing, and as quick reference guide for you and your team.
- Use the various Activity sections to understand the details of the most critical performance-testing tasks, as well as considerations for each activity.
Overview of Activities
The following sections discuss the seven activities that most commonly occur across successful performance-testing projects. The key to effectively implementing these activities is not when you conduct them, what you call them, whether or not they overlap, or the iteration pattern among them, but rather that you understand and carefully consider the concepts, applying them in the manner that is most valuable to your own project context.
Starting with at least a cursory knowledge of the project context, most teams begin identifying the test environment and the performance acceptance criteria more or less in parallel. This is due to the fact that all of the remaining activities are affected by the information gathered in activities 1 and 2. Generally, you will revisit these activities periodically as you and your team learn more about the application, its users, its features, and any performance-related risks it might have.
Once you have a good enough understanding of the project context, the test environment, and the performance acceptance criteria, you will begin planning and designing performance tests and configuring the test environment with the tools needed to conduct the kinds of performance tests and collect the kinds of data that you currently anticipate needing, as described in activities 3 and 4. Once again, in most cases you will revisit these activities periodically as more information becomes available.
With at least the relevant aspects of activities 1 through 4 accomplished, most teams will move into an iterative test cycle (activities 5-7) where designed tests are implemented by using some type of load-generation tool, the implemented tests are executed, and the results of those tests are analyzed and reported in terms of their relation to the components and features available to test at that time.
To the degree that performance testing begins before the system or application to be tested has been completed, there is a naturally iterative cycle that results from testing features and components as they become available and continually gaining more information about the application, its users, its features, and any performance-related risks that present themselves via testing.
Summary Table of Core Performance-Testing Activities
The following table summarizes the seven core performance-testing activities along with the most common input and output for each activity. Note that project context is not listed, although it is a critical input item for each activity.
Activity |
Input |
Output |
Activity 1. Identify the Test Environment |
|
|
Activity 2. Identify Performance Acceptance Criteria |
|
|
Activity 3. Plan and Design Tests |
|
|
Activity 4. Configure the Test Environment |
|
|
Activity 5. Implement the Test Design |
|
|
Activity 6. Execute the Test |
|
|
Activity 7. Analyze Results, Report, and Retest |
|
|
Core Performance-Testing Activities Walkthrough
The seven core performance-testing activities can be summarized as follows.
Figure 4.1* *Core Performance Testing Activities
- Activity 1. Identify the Test Environment. Identify the physical test environment and the production environment as well as the tools and resources available to the test team. The physical environment includes hardware, software, and network configurations. Having a thorough understanding of the entire test environment at the outset enables more efficient test design and planning and helps you identify testing challenges early in the project. In some situations, this process must be revisited periodically throughout the project’s life cycle.
- Activity 2. Identify Performance Acceptance Criteria. Identify the response time, throughput, and resource utilization goals and constraints. In general, response time is a user concern, throughput is a business concern, and resource utilization is a system concern. Additionally, identify project success criteria that may not be captured by those goals and constraints; for example, using performance tests to evaluate what combination of configuration settings will result in the most desirable performance characteristics.
- Activity 3. Plan and Design Tests. Identify key scenarios, determine variability among representative users and how to simulate that variability, define test data, and establish metrics to be collected. Consolidate this information into one or more models of system usage to be implemented, executed, and analyzed.
- Activity 4. Configure the Test Environment. Prepare the test environment, tools, and resources necessary to execute each strategy as features and components become available for test. Ensure that the test environment is instrumented for resource monitoring as necessary.
- Activity 5. Implement the Test Design. Develop the performance tests in accordance with the test design.
- Activity 6. Execute the Test. Run and monitor your tests. Validate the tests, test data, and results collection. Execute validated tests for analysis while monitoring the test and the test environment.
- Activity 7. Analyze Results, Report, and Retest. Consolidate and share results data. Analyze the data both individually and as a cross-functional team. Reprioritize the remaining tests and re-execute them as needed. When all of the metric values are within accepted limits, none of the set thresholds have been violated, and all of the desired information has been collected, you have finished testing that particular scenario on that particular configuration.
Activity 1. Identify the Test Environment
The environment in which your performance tests will be executed, along with the tools and associated hardware necessary to execute the performance tests, constitute the test environment. Under ideal conditions, if the goal is to determine the performance characteristics of the application in production, the test environment is an exact replica of the production environment but with the addition of load-generation and resource-monitoring tools. Exact replicas of production environments are uncommon.
The degree of similarity between the hardware, software, and network configuration of the application under test conditions and under actual production conditions is often a significant consideration when deciding what performance tests to conduct and what size loads to test. It is important to remember that it is not only the physical and software environments that impact performance testing, but also the objectives of the test itself. Often, performance tests are applied against a proposed new hardware infrastructure to validate the supposition that the new hardware will address existing performance concerns.
The key factor in identifying your test environment is to completely understand the similarities and differences between the test and production environments. Some critical factors to consider are:
Hardware
- Configurations
- Machine hardware (processor, RAM, etc.)
Network
- Network architecture and end-user location
- Load-balancing implications
- Cluster and Domain Name System (DNS) configurations
Tools
- Load-generation tool limitations
- Environmental impact of monitoring tools
Software
- Other software installed or running in shared or virtual environments
- Software license constraints or differences
- Storage capacity and seed data volume
- Logging levels
External factors
- Volume and type of additional traffic on the network
- Scheduled or batch processes, updates, or backups
- Interactions with other systems
Considerations
Consider the following key points when characterizing the test environment:
- Although few performance testers install, configure, and administrate the application being tested, it is beneficial for the testers to have access to the servers and software, or to the administrators who do.
- Identify the amount and type of data the application must be seeded with to emulate real-world conditions.
- Identify critical system components. Do any of the system components have known performance concerns? Are there any integration points that are beyond your control for testing?
- Get to know the IT staff. You will likely need their support to perform tasks such as monitoring overall network traffic and configuring your load-generation tool to simulate a realistic number of Internet Protocol (IP) addresses.
- Check the configuration of load balancers.
- Validate name resolution with DNS. This may account for significant latency when opening database connections.
- Validate that firewalls, DNS, routing, and so on treat the generated load similarly to a load that would typically be encountered in a production environment.
- It is often appropriate to have systems administrators set up resource-monitoring software, diagnostic tools, and other utilities in the test environment.
Activity 2. Identify Performance Acceptance Criteria
It generally makes sense to start identifying, or at least estimating, the desired performance characteristics of the application early in the development life cycle. This can be accomplished most simply by noting the performance characteristics that your users and stakeholders equate with good performance. The notes can be quantified at a later time.
Classes of characteristics that frequently correlate to a user’s or stakeholder’s satisfaction typically include:
- Response time. For example, the product catalog must be displayed in less than three seconds.
- Throughput. For example, the system must support 25 book orders per second.
- Resource utilization. For example, processor utilization is not more than 75 percent. Other important resources that need to be considered for setting objectives are memory, disk input/output (I/O), and network I/O.
Considerations
Consider the following key points when identifying performance criteria:
- Business requirements
- User expectations
- Contractual obligations
- Regulatory compliance criteria and industry standards
- Service Level Agreements (SLAs)
- Resource utilization targets
- Various and diverse, realistic workload models
- The entire range of anticipated load conditions
- Conditions of system stress
- Entire scenarios and component activities
- Key performance indicators
- Previous releases of the application
- Competitor’s applications
- Optimization objectives
- Safety factors, room for growth, and scalability
- Schedule, staffing, budget, resources, and other priorities
Activity 3. Plan and Design Tests
Planning and designing performance tests involves identifying key usage scenarios, determining appropriate variability across users, identifying and generating test data, and specifying the metrics to be collected. Ultimately, these items will provide the foundation for workloads and workload profiles.
When designing and planning tests with the intention of characterizing production performance, your goal should be to create real-world simulations in order to provide reliable data that will enable your organization to make informed business decisions. Real-world test designs will significantly increase the relevancy and usefulness of results data.
Key usage scenarios for the application typically surface during the process of identifying the desired performance characteristics of the application. If this is not the case for your test project, you will need to explicitly determine the usage scenarios that are the most valuable to script. Consider the following when identifying key usage scenarios:
- Contractually obligated usage scenario(s)
- Usage scenarios implied or mandated by performance-testing goals and objectives
- Most common usage scenario(s)
- Business-critical usage scenario(s)
- Performance-intensive usage scenario(s)
- Usage scenarios of technical concern
- Usage scenarios of stakeholder concern
- High-visibility usage scenarios
When identified, captured, and reported correctly, metrics provide information about how your application’s performance compares to your desired performance characteristics. In addition, metrics can help you identify problem areas and bottlenecks within your application.
It is useful to identify the metrics related to the performance acceptance criteria during test design so that the method of collecting those metrics can be integrated into the tests when implementing the test design. When identifying metrics, use either specific desired characteristics or indicators that are directly or indirectly related to those characteristics.
Considerations
Consider the following key points when planning and designing tests:
- Realistic test designs are sensitive to dependencies outside the control of the system, such as humans, network activity, and other systems interacting with the application.
- Realistic test designs are based on what you expect to find in real-world use, not theories or projections.
- Realistic test designs produce more credible results and thus enhance the value of performance testing.
- Component-level performance tests are integral parts of realistic testing.
- Realistic test designs can be more costly and time-consuming to implement, but they provide far more accuracy for the business and stakeholders.
- Extrapolating performance results from unrealistic tests can create damaging inaccuracies as the system scope increases, and frequently lead to poor decisions.
- Involve the developers and administrators in the process of determining which metrics are likely to add value and which method best integrates the capturing of those metrics into the test.
- Beware of allowing your tools to influence your test design. Better tests almost always result from designing tests on the assumption that they can be executed and then adapting the test or the tool when that assumption is proven false, rather than by not designing particular tests based on the assumption that you do not have access to a tool to execute the test.
Realistic test designs include:
- Realistic simulations of user delays and think times, which are crucial to the accuracy of the test.
- User abandonment, if users are likely to abandon a task for any reason.
- Common user errors.
Activity 4. Configure the Test Environment
Preparing the test environment, tools, and resources for test design implementation and test execution prior to features and components becoming available for test can significantly increase the amount of testing that can be accomplished during the time those features and components are available.
Load-generation and application-monitoring tools are almost never as easy to get up and running as one expects. Whether issues arise from setting up isolated network environments, procuring hardware, coordinating a dedicated bank of IP addresses for IP spoofing, or version compatibility between monitoring software and server operating systems, issues always seem to arise from somewhere. Start early, to ensure that issues are resolved before you begin testing.
Additionally, plan to periodically reconfigure, update, add to, or otherwise enhance your load-generation environment and associated tools throughout the project. Even if the application under test stays the same and the load-generation tool is working properly, it is likely that the metrics you want to collect will change. This frequently implies some degree of change to, or addition of, monitoring tools.
Considerations
Consider the following key points when configuring the test environment:
- Determine how much load you can generate before the load generators reach a bottleneck. Typically, load generators encounter bottlenecks first in memory and then in the processor.
- Although it may seem like a commonsense practice, it is important to verify that system clocks are synchronized on all of the machines from which resource data will be collected. Doing so can save you significant time and prevent you from having to dispose of the data entirely and repeat the tests after synchronizing the system clocks.
- Validate the accuracy of load test execution against hardware components such as switches and network cards. For example, ensure the correct full-duplex mode operation and correct emulation of user latency and bandwidth.
- Validate the accuracy of load test execution related to server clusters in load-balanced configuration. Consider using load-testing techniques to avoid affinity of clients to servers due to their using the same IP address. Most load-generation tools offer the ability to simulate usage of different IP addresses across load-test generators.
- Monitor resource utilization (CPU, network, memory, disk and transactions per time) across servers in the load-balanced configuration during a load test to validate that the load is distributed.
Activity 5. Implement the Test Design
The details of creating an executable performance test are extremely tool-specific. Regardless of the tool that you are using, creating a performance test typically involves scripting a single usage scenario and then enhancing that scenario and combining it with other scenarios to ultimately represent a complete workload model.
Load-generation tools inevitably lag behind evolving technologies and practices. Tool creators can only build in support for the most prominent technologies and, even then, these have to become prominent before the support can be built. This often means that the biggest challenge involved in a performance-testing project is getting your first relatively realistic test implemented with users generally being simulated in such a way that the application under test cannot legitimately tell the difference between the simulated users and real users. Plan for this and do not be surprised when it takes significantly longer than expected to get it all working smoothly.
Considerations
Consider the following key points when implementing the test design:
- Ensure that test data feeds are implemented correctly. Test data feeds are data repositories in the form of databases, text files, in-memory variables, or spreadsheets that are used to simulate parameter replacement during a load test. For example, even if the application database test repository contains the full production set, your load test might only need to simulate a subset of products being bought by users due to a scenario involving, for example, a new product or marketing campaign. Test data feeds may be a subset of production data repositories.
- Ensure that application data feeds are implemented correctly in the database and other application components. Application data feeds are data repositories, such as product or order databases, that are consumed by the application being tested. The key user scenarios, run by the load test scripts may consume a subset of this data.
- Ensure that validation of transactions is implemented correctly. Many transactions are reported successful by the Web server, but they fail to complete correctly. Examples of validation are, database entries inserted with correct number of rows, product information being returned, correct content returned in html data to the clients etc.
- Ensure hidden fields or other special data are handled correctly. This refers to data returned by Web server that needs to be resubmitted in subsequent request, like session IDs or product ID that needs to be incremented before passing it to the next request.
- Validate the monitoring of key performance indicators (KPIs).
- Add pertinent indicators to facilitate articulating business performance.
- If the request accepts parameters, ensure that the parameter data is populated properly with variables and/or unique data to avoid any server-side caching.
- If the tool does not do so automatically, consider adding a wrapper around the requests in the test script in order to measure the request response time.
- It is generally worth taking the time to make the script match your designed test, rather than changing the designed test to save scripting time.
- Significant value can be gained from evaluating the output data collected from executed tests against expectations in order to test or validate script development.
Activity 6. Execute the Test
Executing tests is what most people envision when they think about performance testing. It makes sense that the process, flow, and technical details of test execution are extremely dependent on your tools, environment, and project context. Even so, there are some fairly universal tasks and considerations that need to be kept in mind when executing tests.
Much of the performance testing–related training available today treats test execution as little more than starting a test and monitoring it to ensure that the test appears to be running as expected. In reality, this activity is significantly more complex than just clicking a button and monitoring machines.
Test execution can be viewed as a combination of the following sub-tasks:
- Coordinate test execution and monitoring with the team.
- Validate tests, configurations, and the state of the environments and data.
- Begin test execution.
- While the test is running, monitor and validate scripts, systems, and data.
- Upon test completion, quickly review the results for obvious indications that the test was flawed.
- Archive the tests, test data, results, and other information necessary to repeat the test later if needed.
- Log start and end times, the name of the result data, and so on. This will allow you to identify your data sequentially after your test is done.
As you prepare to begin test execution, it is worth taking the time to double-check the following items:
- Validate that the test environment matches the configuration that you were expecting and/or designed your test for.
- Ensure that both the test and the test environment are correctly configured for metrics collection.
- Before running the real test, execute a quick smoke test to make sure that the test script and remote performance counters are working correctly. In the context of performance testing, a smoke test is designed to determine if your application can successfully perform all of its operations under a normal load condition for a short time.
- Reset the system (unless your scenario calls for doing otherwise) and start a formal test execution.
- Make sure that the test scripts’ execution represents the workload model you want to simulate.
- Make sure that the test is configured to collect the key performance and business indicators of interest at this time.
Considerations
Consider the following key points when executing the test:
- Validate test executions for data updates, such as orders in the database that have been completed.
- Validate if the load-test script is using the correct data values, such as product and order identifiers, in order to realistically simulate the business scenario.
- Whenever possible, limit test execution cycles to one to two days each. Review and reprioritize after each cycle.
- If at all possible, execute every test three times. Note that the results of first-time tests can be affected by loading Dynamic-Link Libraries (DLLs), populating server-side caches, or initializing scripts and other resources required by the code under test. If the results of the second and third iterations are not highly similar, execute the test again. Try to determine what factors account for the difference.
- Observe your test during execution and pay close attention to any behavior you feel is unusual. Your instincts are usually right, or at least valuable indicators.
- No matter how far in advance a test is scheduled, give the team 30-minute and 5-minute warnings before launching the test (or starting the day’s testing) if you are using a shared test environment. Additionally, inform the team whenever you are not going to be executing for more than one hour in succession so that you do not impede the completion of their tasks.
- Do not process data, write reports, or draw diagrams on your load-generating machine while generating a load, because this can skew the results of your test.
- Turn off any active virus-scanning on load-generating machines during testing to minimize the likelihood of unintentionally skewing the results of your test.
- While load is being generated, access the system manually from a machine outside of the load-generation environment during test execution so that you can compare your observations with the results data at a later time.
- Remember to simulate ramp-up and cool-down periods appropriately.
- Do not throw away the first iteration because of application script compilation, Web server cache building, or other similar reasons. Instead, measure this iteration separately so that you will know what the first user after a system-wide reboot can expect.
- Test execution is never really finished, but eventually you will reach a point of diminishing returns on a particular test. When you stop obtaining valuable information, move on to other tests.
- If you feel you are not making progress in understanding an observed issue, it may be more efficient to eliminate one or more variables or potential causes and then run the test again.
Activity 7. Analyze Results, Report, and Retest
Managers and stakeholders need more than just the results from various tests — they need conclusions, as well as consolidated data that supports those conclusions. Technical team members also need more than just results — they need analysis, comparisons, and details behind how the results were obtained. Team members of all types get value from performance results being shared more frequently.
Before results can be reported, the data must be analyzed. Consider the following important points when analyzing the data returned by your performance test:
- Analyze the data both individually and as part of a collaborative, cross-functional technical team.
- Analyze the captured data and compare the results against the metric’s acceptable or expected level to determine whether the performance of the application being tested shows a trend toward or away from the performance objectives.
- If the test fails, a diagnosis and tuning activity are generally warranted.
- If you fix any bottlenecks, repeat the test to validate the fix.
- Performance-testing results will often enable the team to analyze components at a deep level and correlate the information back to the real world with proper test design and usage analysis.
- Performance test results should enable informed architecture and business decisions.
- Frequently, the analysis will reveal that, in order to completely understand the results of a particular test, additional metrics will need to be captured during subsequent test-execution cycles.
- Immediately share test results and make raw data available to your entire team.
- Talk to the consumers of the data to validate that the test achieved the desired results and that the data means what you think it means.
- Modify the test to get new, better, or different information if the results do not represent what the test was defined to determine.
- Use current results to set priorities for the next test.
- Collecting metrics frequently produces very large volumes of data. Although it is tempting to reduce the amount of data, always exercise caution when using data-reduction techniques because valuable data can be lost.
Most reports fall into one of the following two categories:
Technical Reports
- Description of the test, including workload model and test environment.
- Easily digestible data with minimal pre-processing.
- Access to the complete data set and test conditions.
- Short statements of observations, concerns, questions, and requests for collaboration.
Stakeholder Reports
- Criteria to which the results relate.
- Intuitive, visual representations of the most relevant data.
- Brief verbal summaries of the chart or graph in terms of criteria.
- Intuitive, visual representations of the workload model and test environment.
- Access to associated technical reports, complete data sets, and test conditions.
- Summaries of observations, concerns, and recommendations.
The key to effective reporting is to present information of interest to the intended audience in a manner that is quick, simple, and intuitive. The following are some underlying principles for achieving effective reports:
- Report early, report often.
- Report visually.
- Report intuitively.
- Use the right statistics.
- Consolidate data correctly.
- Summarize data effectively.
- Customize for the intended audience.
- Use concise verbal summaries using strong but factual language.
- Make the data available to stakeholders.
- Filter out any unnecessary data.
- If reporting intermediate results, include the priorities, concerns, and blocks for the next several test-execution cycles.
Summary
Performance testing involves a set of common core activities that occur at different stages of projects. Each activity has specific characteristics and tasks to be accomplished. These activities have been found to be present — or at least to have been part of an active, risk-based decision to omit one of the activities — in every deliberate and successful performance-testing project that the authors and reviewers have experienced. It is important to understand each activity in detail and then apply the activities in a way that best fits the project context.