Chapter 10 – Quantifying End-User Response Time Goals
Performance Testing Guidance for Web Applications
J.D. Meier, Carlos Farre, Prashant Bansode, Scott Barber, and Dennis Rea
Microsoft Corporation
September 2007
Objectives
- Learn how to identify the difference between performance requirements and performance goals.
- Learn how to apply several methods for capturing subjective performance requirements and goals.
Overview
Ultimately, there is only one end-user response-time metric that matters: the percentage of application users that are frustrated with poor performance. Your application’s users do not know or care about the values in your performance test results, how many seconds it takes the screen to display past the user’s threshold for “too long,” or what the throughput value is. However, users do notice whether the application seems slow — and their impressions can be based on anything from their mood to their prior experience with applications. This chapter describes a method for converting these user perceptions into testable numbers.
Determining what your users will deem “acceptable” in terms of performance can be challenging — and their preferences are subject to significant change over short periods of time. Software-development companies do not want to conduct regular usability studies with groups of representative users because it costs time and money. For the most part, these companies have neither the resources nor the training to conduct usability studies, even if they wanted to.
User experience reports from leading performance testers, shared during peer workshops such as the Workshop on Performance and Reliability (WOPR, http://www.performance-workshop.org/), suggest that simply verbalizing your application’s performance goals and requirements enables teams to find a way to overcome quantification, technical, logical, logistical, and managerial challenges in order to achieve a successfully performing application. These same performance testers report that quantified goals and requirements are sometimes met, and frequently ignored. Even when they are met, the goals and requirements rarely correlate to satisfied users unless there are also qualitative requirements and goals that serve as a reference point.
How to Use This Chapter
Use this chapter to understand how to establish performance-testing goals and apply several methods for capturing subjective performance requirements and goals. To get the most from this chapter:
- Use the “Terminology” section to understand some common terms used to describe performance-testing goals that will facilitate articulating terms correctly in the context of your project.
- Use the “Approach for Quantifying End-User Response Time” section to get an overview of the approach to determining performance-testing goals, and as quick reference guide for you and your team.
- Use the various activity sections to understand the details of the most critical tasks for quantifying end-user response-time goals.
Terminology
This chapter uses the following terms.
Term / Concept |
Description |
Performance requirements |
Performance requirements are those criteria that are absolutely non-negotiable due to contractual obligations, service level agreements (SLAs), or fixed business needs. Any performance criterion that will not unquestionably lead to a decision to delay a release until the criterion passes is not absolutely required ― and therefore, not a requirement. |
Performance goals |
Performance goals are the criteria that your team wants to meet before product release, although these criteria may be negotiable under certain circumstances. For example, if a response time goal of three seconds is set for a particular transaction but the actual response time is 3.3 seconds, it is likely that the stakeholders will choose to release the application and defer performance tuning of that transaction for a future release. |
Approach for Quantifying End-User Response Time
Quantifying end-user response time goals can be thought of in terms of the following activities:
- Determine application functionality and usage.
- Verbalize and capture performance requirements and goals.
- Quantify performance requirements and goals.
- Record performance requirements and goals.
These activities are discussed in detail in the following sections.
Determine Application Functionality and Usage
Before you can effectively determine the desired performance characteristics of an application, you need to identify the scenarios for which you want to characterize performance. When identifying the business scenarios that have a critical need for performance requirements and goals, it may be useful to think in terms of the following four categories:
- Frequently used scenarios
- Performance-intensive scenarios
- Business-critical scenarios
- Scenarios of special interest (possibly due to contractual obligations or stakeholder visibility)
Once you have identified the scenarios that need performance requirements and/or goals, you can engage the entire team, from executive sponsor to end-user, to determine exactly what those requirements and/or goals should be. In general, all you need to do is get the team to informally tell you how each scenario or group of scenarios should perform. Once you have collected this information, it becomes your job to convert the subjective data into a testable form and then document these testable requirements and/or goals for the purposes of traceability and progress monitoring.
Verbalize and Capture Performance Requirements and Goals
Although it is generally desirable to conduct this activity early in the software development life cycle, it is also valuable to revisit this activity periodically throughout the project. No matter how well you conduct this activity, contracts, perceptions, business drivers, and priorities change as new information becomes available. Keep this in mind as you traverse the project life cycle. For example, if you find out that the terms of a contract have changed while you are presenting what you believe is your final report, it will appear as though your project was never based on the terms of the initial contract.
Throughout this activity, it is important to distinguishing between requirements and goals (see “Terminology” above). Identifying requirements is far from difficult. To determine requirements, focus on contracts and legally binding agreements or standards related to the software under development, and get the executive stakeholders to commit to any performance conditions that will cause them to refuse to release the software into production. The resulting criteria may or may not be related to any specific business scenario or condition. If they are, however, you must ensure that those scenarios or conditions are included in your performance testing.
Performance goals are more challenging to capture and to subsequently quantify, which is why it is important to treat the capture and quantification as separate activities. An extremely common mistake related to performance testing is to begin quantification without first verbalizing the goals subjectively or qualitatively.
Review Project Documentation and Related Contracts
This activity is conceptually straightforward. Regulatory and compliance documents may be challenging to obtain because they often are not readily available for review by non-executives. Even so, it is important to review these standards. The specific language and context of any statement related to testing is critical to determining a compliant process. For example, the difference between “transactions will” and “on average, transactions will” is tremendous. The first case implies that every transaction will comply every single time. The second case is completely ambiguous, as becomes obvious when you try to quantify these criteria.
Frequently, the most important performance-related statements can be found in vision and marketing documents. Vision documents often hold subjective performance goals such as “at least as fast as the previous release,” “able to support a growing customer base,” and “performance consistent with the market.” Marketing documents, however, are notorious for containing unintentional performance requirements.
Any declaration made in a publicly available marketing statement is legally binding in the United States, which makes every claim about performance (or anything else) a non-negotiable requirement. This is not well-known across the software industry and has caused significant challenges when marketing materials included words like “fast,” “instant,” and “market-leading performance.” For each item, the terms must be publicly and reasonably defined and supported — which is where performance testing comes in.
To complete this activity, all you need to do is highlight statements in these published materials that are even loosely related to the application’s speed, scalability, and/or stability and set them aside until you are ready to quantify them. Alternatively, you could transpose these statements directly into your requirements-management system just as they are, with the understanding that they are likely to be revised later.
Interview Stakeholders Who Will Influence the “Go Live” Decision
Stakeholders always have an opinion when it comes to performance, and frequently they express those opinions in terms that appear to be already quantified and absolute, although they are rarely well understood. The key to interviewing stakeholders is not only to capture their statements, but also to determine the intent behind those statements.
For example, a stakeholder with a background in telecommunications who may say that she expects the application to have “five 9s of availability” probably does not understand that this equates to the near-impossible standard of a Web site being unavailable for roughly five minutes per year (or roughly one second per day). The truth is that many Web sites could be down for an hour per day, if it is the “right” hour, without customers even noticing.
In fact, it is hard to imagine that Web users would notice a one-second delay, even if it did happen once a day. So while one second of mid-conversation silence each day on a land line is absolutely unacceptable to users, it is probably an unnecessarily strict standard for a Web site. The key is to ask good questions in order to determine the real intent behind statements stakeholders make related to performance. The following are some sample starting questions, along with potential follow-up questions, to help you capture the intent of the stakeholder:
- How do you expect this application to perform relative to other similar applications/Web sites? How much better? Ten percent? Noticeably? Dramatically? Which application/Web site in particular exhibits the kind of performance you would like this application/Website to have? You said “x” seconds; how did you decide on that number and what does it indicate to you?
- How much disruption are you willing to accept due to downtime? Does that include scheduled maintenance that users are notified about beforehand? Does it matter if the user simply cannot access the Web site/application, or if they are given a message acknowledging that the site is down? What if the users can still accomplish their tasks, but the speed is degraded during downtime?
- How do you expect the application/Web site to respond to unexpectedly high traffic volumes? Do you prefer dramatic performance degradation for all users or a “system is temporarily unavailable, please try again later” message for all users in excess of the supported volume? Is it more important to you that the application/Web site demonstrates consistent performance, or variable performance that may be up to 50 percent faster or slower than average based on current usage volume?
To complete this activity, it is most important to record the questions and the answers and not quantify the answers or comment on them unless the stakeholder specifically asks you to explain. The general rule is to ask questions that have answers that do not specifically require quantifications, and to follow up with questions that help you qualify the initial responses subjectively. If the stakeholder does provide numbers, take note of them, but do not assume that they are the right numbers.
Determine if There Are Relevant Standards and/or Competitive Baselines Related to the Application
There are very few performance-related standards outside of safety-critical devices and applications, but there are some. More frequently, market expectations and competition create de facto standards. Every application in every vertical industry will have different methods and sources for determining the competitive landscape. The bottom line: do not assume that you have completely captured goals and requirements until you have checked to see if your application is likely to be compared against an official or de facto standard.
Quantify Performance Requirements and Goals
After capturing the requirements and goals, the next step is to quantify them. While it is not strictly necessary, it is useful to distinguish between requirements and goals prior to quantification. Unlike goals, requirements need to be much more carefully and completely quantified. A goal of “approximately three seconds to render the requested Web page,” for example, is a perfectly respectable performance goal, but a completely non-testable performance requirement. Requirements should be specific, such as “the response time must not exceed 10 seconds.” Additionally, requirements need to specify the conditions or state of the system to which they apply.
Separate Requirements from Goals
At first glance, this activity seems purely mechanical. If the captured item is legally or contractually binding, or if a stakeholder with the influence to keep the software from being released mandates that an item is required, it is a requirement. The challenge is when an item identified as a requirement is more stringent than other items identified as goals.
In these cases, it is important to bring these conflicting items to stakeholders for additional clarification. It may be the case that the goal is superseded by the requirement — in which case you should simply remove the goal from the list of items. Also, the stakeholders may determine that the requirement is overly aggressive and needs to be modified. Regardless, the sooner these apparent conflicts are resolved, the less confusion they will cause later.
It is worth noting that conflicts between goals and requirements may not become obvious until after both are quantified, making this another important activity to revisit – both after quantification, and periodically during testing to ensure that priorities have not changed.
Quantify Captured Performance Goals
Some goals are at least conceptually easy to quantify. For example, a goal of “no slower than the previous release” is quantified by either referencing the most recent production performance monitoring report, or by executing single-user, light-load, and heavy-load tests against the previous release and recording the results to use as a baseline for comparison. Similarly, to quantify a goal of “at least as fast as our competitors,” you can take a series of single-user performance measurements of the competitor’s application — perhaps by scheduling a performance test script to execute a common scenario against the application once an hour over a period of a week.
Often, most of the captured goals that need to be quantified are not comparative goals; they are user satisfaction goals, otherwise known as quality of service (QoS) goals. Quantifying end-user satisfaction and/or frustration is more challenging, but far from impossible. To quantify end-user satisfaction, all you need is an application and some representative users. You do not need a completed application; a prototype or demo will do for a first pass at quantification.
For example, with just a few lines of code in the HTML of a demo or prototype, you can control the load time for each page, screen, graphic, control, or list. Using this method, you can create several versions of the application with different response characteristics, then have the users try each, telling you in their own terms whether they find that version unacceptable, slow, reasonable, fast, or whatever descriptors associated with the goals provided to you. Since you know the actual response times, you can then start pairing those numbers to the users’ reported degrees of satisfaction. It is not an exact science, but it works well enough for goals — especially if you follow up by asking the same questions about performance testing on the application every time you present those goals. This is applicable for functional testing, user acceptance testing, beta testing, and so on, as you are measuring response times in the background as users interact with the system. This allows you to collect more data and enhance your performance goals as the application evolves.
While quantifying goals, consider distinguishing the goals based on how the application will be used. For instance, a data-entry page that is use 2000 times a day, or a once-a-year comprehensive report on 40 million transactions, will have very different performance goals.
Quantify Captured Performance Requirements
If you are lucky, most of the performance requirements that you captured are already quantified and testable. If you are a little less lucky, the requirements you captured are not quantified at all, in which case you can follow the process described above for quantifying performance goals. If you are unlucky, the performance requirements that you collected are partly quantified and non-testable.
The challenge is that if a requirement is extracted from a contract or existing marketing document, it likely cannot be changed. When you are faced with a requirement such as “three-second average response time,” or “2,500 concurrent users,” you have to figure out what those requirements mean and what additional information you need in order to make them testable.
There is no absolute formula for this. The basic idea is to interpret the requirements precisely written in common language, supplement them with the most common or expected state for the application, and then get your extended, testable requirement approved by the stakeholder(s). The stakeholders will then be held responsible if someone were to challenge legal compliance with the requirements after the product goes live. To illustrate, consider the following examples:
Requirement: Direct quote from a legal contract: “The Website shall exhibit an average response time of not greater than three (3) seconds.”
Extended quantification: This requirement is particularly challenging. The literal, and therefore most likely legal, interpretation is that “Over the life of the Website, the arithmetic mean of all response times, at any point in time, will not exceed 3 seconds.” While that is hard enough to determine, response time has not been defined either. Response time could mean “end-user-perceived response time,” “server response time,” or something else entirely. The following breaks this down systematically:
- Without any information to the contrary, it is probably safe to assume that the only reasonable way to test the three-second average response time is either “with all pages being accessed equally often” or “under the most likely workload distribution.”
- Again, without any information to the contrary, you are left to determine the load conditions for the test. In this case, your best bet is probably to average across multiple volumes. For instance, you could get 30 percent of your data from low-load tests, 50 percent from expected-load tests, and 20 percent from high-load tests, and then report a weighted average — assuming that distribution of load is a reasonable approximation of the anticipated production load profile. Alternatively, you could make a case for testing this requirement exclusively under expected load conditions.
Requirement: Direct quote from sales brochure: “This application will support up to 2,500 concurrent users.”
Extended quantification: The challenge here is similar because “concurrent user” is not technically accurate for Web applications and therefore can mean several different things.
- Since it is unlikely that you will have the opportunity to determine the intention of the person who chose the term “concurrent,” you have to use your best judgment based on the application. Generally, the safest interpretation is “overlapping, active sessions” where an “active session” is one user’s activity between the time they access the application until the time they complete their task — without stopping to do something else — whether or not the application technically tracks sessions.
- Using this interpretation, if a user typically has session duration of 15 minutes, statistically, it would take a total of about 5,000 users over a 30-minute period with a realistic ramp-up/ramp-down model to simulate 2,500 overlapping active sessions.
- Also, in this example you have no information about the expected activity of those users. As in the previous example, it is probably safe to assume that the only reasonable way to test this requirement are either “with all pages being accessed equally often” or “under the most likely workload distribution” — although in this case, “under the mostly likely workload distribution” is more likely to be the original author’s intent.
See Chapter 12 – Modeling Application Usage for more information on defining concurrent users.
Record Performance Requirements and Goals
The preferred method for recording goals and requirements will necessarily be particular to your team and tools. However your team manages requirements and goals, you must remember to record both the quantitative and the qualitative versions of the goals and requirements together. By doing so, when it is late in the project and someone tries to decide if the application is performing well enough to be released, you can quickly refer not just to the numbers, but to the intent behind the numbers to help you and your team make a more informed decision.
Summary
Quantifying response-time goals is tightly related to expressing the user’s perception of the application’s performance. Most often, users of your application are not able to articulate how long it should take to display data onscreen, what the application throughput should be, or how many transactions per second a database must support. However, users do notice the performance behavior of the application, based on their impressions, which are the result of several factors: previous experience, the criticality of the task, and how their expectations have been set.