Six design principles of continuous performance testing
Oct 11, 2016 • 7 min read
In the course of delivering many successful Continuous Performance Testing (CPT) implementations for enterprise customers, Grid Dynamics engineering teams have developed a number of basic design principles to guide their actions. Your requirements may be unique, but just as all custom race cars have a chassis, suspension, and wheels, all CPT implementations need to follow the six design principles we talk about in this post.
Principle #1: Divide and conquer: split your tests into easy-to-handle stages
Organize your performance testing into stages. Start with “cheap” tests that can identify performance problems on small datasets early in the pipeline to provide developers with instant feedback when they commit; place tests that require massive data sets and infrastructure later in the pipeline. Integrate all test executions with CI infrastructure.
Principle #2: One throat to choke: target APIs for all testing goals
The API is your best friend when it comes to performance testing. Focus on testing of the performance of programmatic interfaces rather than human experience. An API is your single throat to choke, as an API can be hit from different angles to measure the system’s response to various scenarios.
Principle #3: Keep your eyes on the prize: get your risks and KPIs right
Deciding what to measure is half the battle. Define metrics that represent business KPIs, the risk of system slowdowns or breakages, and the impact of the system’s performance on these potential problems. Identify performance thresholds that constitute acceptable or faulty performance results. Get the business side of the house to sign off on them before you go to work on your CPT implementation
Principle #4: Automate everything: you need one-click automation end-to-end
100% of the testing process should be performed by software, not humans. This includes provisioning test environments; deploying relevant middleware and application code; configuring the environment and setting up the right connections to 3rd party interfaces; loading the test data; running the tests; collecting metrics; and cleaning up after the test run. 100% automation is necessary to assure that the tests can be run continuously on each commit, build, and release candidate. A single manual step cripples the pipeline and defeats its purpose.
Principle #5: Analyze this: discover, visualize, and deliver performance insights
Generate reports that track performance metrics of each build against the targets, monitor regression between builds, and analyze performance trends over time. Apply modern visualization techniques to make the data readable and actionable. Deliver performance data to the right people using modern dashboard tools.
Principle #6: Retain all results: the value of data grows over time
Some insights can only be seen as a part of the broader trend. Store all test run results in a persistent data store, so that various analytical techniques can be applied to mine the data for patterns and discover performance trends over time. Historic data about performance test results is also invaluable for troubleshooting production issues down the line by comparing the performance shown in testing to actual production experience.
Let’s discuss each principle in detail:
Principle #1: Divide and conquer: split your tests into easy-to-handle stages
This is the central idea behind merging performance testing into the CI pipeline in the first place. In our previous blog post, we showed you the following visual diagram:
To make our performance tests both automated and efficient, we segregate different types of performance testing by the type of environment, test dataset, and type of test queries we want to run. Each test is then automated and integrated into the CI pipeline. “Cheap” tests, like joint queries executed on modest syntactic datasets, are run first, followed by more “expensive” ones with more data and more complex workflows performed later in the pipeline.
Typical staged performance tests involve:
- Smoke tests – standalone verification of primary business services at the API level. Executed on a nightly basis
- Integration tests – complex verification tests, where several services are invoked at the same time; characterized by dataset changes during the test period long test durations. Executed on a weekly basis
- End-to-end tests – verification of simulated user end-to-end scenarios in a fully-integrated environment; executed as part of the CD pipeline.
When a group of tests is broken down like this, it is possible to write automated sequences that provision the environment, load the data, and execute that group of tests end-to-end, including the quick analysis of test results for pass/fail logic within the CI process.
Principle #2: One throat to choke: target APIs for all testing goals
When we consider the best approaches to test the system’s performance automatically, APIs are our best friends for two reasons:
- Every API is a potential bottleneck whose performance under stress must be tested and validated. Therefore, the specification for each API must include non-functional SLAs on performance given edge conditions on concurrent load, size of dataset transferred, and such. The goal of the performance tests is to validate that these SLAs are met within acceptable norms.
- Testing the performance of a complex scenario, such as the “checkout” process in a web store, is best simulated by a sequence of API calls that are performed as a part of the checkout workflow.
Splitting the full system into a number of API calls gives us the ability to reduce test complexity and duration, increase stability, and simplify investigation of any issues we find.
Let’s take the “checkout” scenario as an example. Our virtual user is going through several steps: “go to shopping cart “> “select products” > “select quantity” > “apply offers” > “enter shipping details” > “enter payment details” > “confirm taxes and shipping costs” > “place order”. At every step there are several UI actions and multiple API calls, both sequential and asynchronous. For proper performance metrics collection we are measuring the response times of every UI action and API call, and the duration of each complete step in the scenario. This allows us to check each scenario step duration against our acceptance criteria, as well as review the impact of every action and call.
We can use the same type of scenario for stress tests and endurance testing by adjusting the number of virtual users and the test duration.
Principle #3: Keep your eyes on the prize: get your risks and KPIs right
All performance tests boil down to validating how your system will behave in a certain configuration under a certain load scenario, and at what points it will start slowing down — and eventually break down. You are basically performing automated risk management: am I at risk of my system breaking due to poor performance? The key to successful risk management starts with knowing your risks, such as:
- What operations (queries, API calls, workflows) are most likely to slow the service down under load?
- What user load scenarios might create problems?
- What parts of your system are most likely to become bottlenecks?
These potential “risk areas” will become the targets of our performance tests.
Once you have identified the riskiest parts of the system, the next step is to define the critical thresholds that the business side finds acceptable — or not. It could be:
- The number of checkout transactions per second should support expected Cyber Monday traffic, plus 20%
- The number of incoming messages from IoT devices could reach 100,000/second
- A checkout can be completed within 30 seconds, 99% of the time
- The size of the shopping cart may contain up to 150 items
It is critically important to involve business analysts in documenting critical thresholds so that the tests can be engineered to recreate the right test conditions. Get your performance KPIs right in advance, and get your business people to buy into them. If the business risks can be formalized in these KPIs and their thresholds, the performance engineering team can usually design a test to validate them.
Principle #4: Automate everything: you need one-click automation end-to-end
When we design our performance tests, we need to think about how we:
- Create the right distributed test environment
- Load the right application configuration and test datasets
- Orchestrate the correct load scenarios over the selected period of time
- Collect performance data from CPUs, network, API calls (and such), from all nodes
- Store the results for future processing
- Build models to correlate data and interpret results
- Detect thresholds on selected tests that constitute “pass” or failure” of the CI tollgate
- Deliver results via reports or data visualization to the human performance engineers for further investigation
We need powerful frameworks and tools that can perform all these functions, and do this repeatedly, automatically, and reliably, as a part of the CI process. While this might seem like a tall order, we will present you a complete set of 100% free open source tools to achieve this goal.
The importance of complete test execution automation cannot be overstated. Any manual step in the process means that human beings must be involved in the process, and therefore the performance tests cannot be a part of automated CI pipelines.
Principle #5: Analyze this: discover, visualize, and deliver performance insights
When we design our performance tests, we need to think about how we will analyze our results. Some results are easier to interpret than others. For example, you may need to correlate the data from latency, transaction-per-second throughput, and CPU utilization in typical load scenarios to answer a question like, “How well does this test result predict the performance of our application in production?” Here are some common approaches we use to evaluate performance results:
- Baseline-relative performance of different builds against the same tests. Thanks to automation and continuous integration, we always have the same environment and constant loads in our performance lab so we can not only measure absolute values and compare them to specified thresholds, but also compare our results to previous runs to spot deviations, detect regressions, and raise alarms.
- Mathematical analysis. By collecting and analyzing data from multiple performance test runs, we are able to calculate our main parameters’ trends and predict likely system behavior with different setups (e.g. different system configurations).
- Correlation of metrics between production and our performance lab. Collection of test results for smoke, integration and end-to-end tests helps us get accurate predictions for production load and capacity planning, and helps us calculate performance coefficients and ratios.
- Investigate issues. During investigation of performance issues, all available data is required: logs, system metrics, and app metrics.
Again, we need powerful tools for the analysis of the data. In later blog posts we will recommend the toolset that we know and love, that proved itself in many successful implementations.
Principle #6: Retain all results: the value of data grows over time
Some insights can only be evaluated accurately as part of a broader trend. Store all test run results in a persistent data store, so that various analytical techniques can be applied to mine the data for patterns and discover performance trends over time. Historic data about performance test results is also invaluable for troubleshooting production issues down the line by comparing the performance shown in testing to the actual production experience.
This requires a robust backend designed specifically to store test results and conveniently access them for further analysis. In later blogs we will present you with our approach for implementing such capabilities.
In the rest of this blog series, we will cover specific tools and frameworks for achieving all six of these principles using open source technologies.