What is mobile app performance testing, exactly? What should we care about in terms of app performance and how can we test app performance as part of our automated testsuite? This is a topic I've been thinking a lot about recently, and so I convinced Brien Colwell (CTO of HeadSpin) to sit down with me so I could pester him with some questions. I took our conversation and turned it into a sort of FAQ of mobile app performance testing. None of this is specifically tied to Appium--stay tuned for future editions where we look at how to achieve some of the goals discussed in this article, using Appium by itself or in conjunction with other tools and services.
Really, we could think of performance testing as a big part of a broader concept: UX testing (or User Experience Testing). The idea here is that a user's experience of your app goes beyond the input/output functionality of your app. It goes beyond a lot of the things we normally associate with Appium (though of course it includes all those things too--an app that doesn't work does not provide a good experience!) It reflects the state of the app market, where the world is so crowded with nearly-identical apps, that small improvements in UX can mean the difference between life and death for one of these startups.
Years ago, I considered performance testing to be exclusively in the domain of metrics like CPU or memory usage. You only did performance testing when you wanted to be sure your app had no memory leaks, that kind of thing. And this is still an important part of performance testing. But more broadly, performance testing now focuses on attributes or behaviors of your application as they interface with psychological facts about the user, like how long they are prepared to wait for a view to load before moving on to another app. One quick way of summarizing all of this is to define performance testing as ensuring that your app is responsive to the directions of the user, across all dimensions.
It's true that classic specters of poor software performance, like memory leaks or spinning CPUs, can plague mobile app experience. And there is good reason to measure and profile these metrics. However, the primary cause of a bad user experience these days tends to be network-related. So much of the mobile app experience is dominated by the requirement of loading data over the network, that any inefficiency there can cause the user to experience painfully frustrating delays. Also, testing metrics like memory or CPU usage can often be adequately accomplished locally during development, whereas network metrics need to be tested in a variety of conditions in the field.
To this end, we might track metrics like the following:
Beyond network metrics, there are a number of other UX metrics to consider:
When developing an application, we are often doing so on top-of-the-line desktop or laptop computers and devices, with fast corporate internet. The performance we experience during development may be so good that it masks issues experienced by the average set of users. Here are a few common mistakes (again, largely network-oriented) that developers make which can radically impact performance:
In general, it can often be more useful to track performance relative to a certain baseline, whether that is an accepted standard baseline, or just the first point at which you started measuring performance of your app. However, tracking relative performance can also be a challenge when testing across a range of devices or networks, because relative measures might not be comparing apples to apples. In these cases, looking at absolute values side-by-side can be quite useful as well.
It's true that each team defines UX for their own app. Acceptable TTI measures for an e-commerce app might differ by an order of magnitude or more from acceptable measures for AAA gaming titles. Still, there are some helpful rules of thumb based on HCI (Human-Computer Interaction) research:
These are not hard-and-fast truths that make sense in every case. And of course nobody really has a universal answer, but again, it's helpful to treat that 500ms number as a good target for any interaction we want to feel "snappy".
In other words, when should we be really concerned about differences in performance between different devices or networks? Actually, it's fairly common to see differences of about 30% as quite common between devices. This level of difference doesn't usually indicate a severe performance issue, and can (with all appropriate caveats) be regarded as variance.
True performance problems can cause differences of 10-100x the baseline measurements--just think how long you've waited for some app views to load when they are downloading too much content over a slow network!
The answer here is simple if not practical: test on the devices that bring in the greatest revenue! Obviously this implies that you have some kind of understanding of your userbase: where are they located? What devices do they use? What is their typical network speed? And so on. If you don't have this information, try to start tracking it so you can cross-reference with whatever sales metrics are important for your product (items purchased, time spent in app, whatever).
At that point, if you can pick the top 5 devices that meet these criteria in a given region, you're well positioned to ensure a strong UX.
"Performance" turns out to be quite a broad subcategory of UX, and of course, what we care about at the end of the day is UX, in a holistic way. The more elements of the UX we can begin to measure, the more we will be able to understand the impact of changes in our application. We'll even eventually get to the point where we've identified solid app-specific metric targets, and can fail builds that don't meet these targets, guaranteeing a minimum high level of UX quality for our users. Oh, and our users? They won't know any of this is happening, but they'll love you for it.