Edition 121

What Is Testing?

Appium Pro is usually known for Appium-related automation tutorials and advanced techniques. But sometimes it's good to step back and take a look at the bigger picture of how these techniques are used, which is most frequently in the context of "testing". In my forthcoming course on HeadSpin University, Appium and Selenium Fundamentals, I talk a lot about testing, and what it is, before diving into any automation training. I think it's always useful to consider our practices from a philosophical perspective before just plowing ahead, and so this article is a snippet of the course which has to do with precisely this question: what is testing?

Ancient Roman pot

I think we would all have a fairly good intuitive idea of what testing is. It's making sure something works, right? Indeed! But as a former linguist, I think it'd be great to dive a bit into the etymology of testing, the background and history of the word itself, to get a sense of the roots of the concept.

The Etymology of Testing

The English word "test" originally came from a Latin word testum, which referred a kind of container or pot made of earth. These were pots that were designed to hold very hot metal. People wanted to heat metal up for a number of reasons, but one was to determine whether there were any impurities in the metal.

So, by association, the pot itself became tied up with the act of checking the quality of the metal, and so now we can say that to put something to the test, i.e., to heat it up in the earthen pot called a "testum", is to determine the quality of whatever that thing is! You might also have heard someone talk about a "crucible", or maybe heard the expression "going through the crucible". This is another word that has a very similar history. A "crucible" is another kind of container for heated metal! To take our analogy a bit deeper, we can say that "testing" is the process of applying some kind of energy in some kind of container to determine the quality of something we care about.

Testing Software

Now, what we care about is software. Our job as testers is to get some kind of indication of the quality of the software that sits in front of us. Maybe we developed it ourselves, or maybe someone else did. How do we even start thinking about the process of testing software? There are a huge number of methods and tools that we can use in the process. But one simple way of thinking about it is realizing that our responsibility is to design and bring about the specific conditions (like heating a pot) that will reveal the quality of the software. One common way of bringing about these conditions is to use the software in its normal modes, as a normal user. The software is meant to run in certain conditions, and we can simply run it in those conditions, as any normal user of the software would, to determine whether it works the way it's supposed to in those conditions.

We could also try to imagine conditions for the software that its developers might not have thought about. These are often called "edge cases". The trouble with software is that there can be a near infinite number of conditions a piece of software runs in, and a near infinite number of situations or states it can get itself into, based on user input or other factors. When developers write software, they usually assume a fairly simple and minimal set of conditions. It is the job of a tester to design crucibles, or test scenarios, that show how the software behaves in all kinds of situations.

Obviously, if there are an infinite number of situations that software can get run in, it doesn't make sense to try and test every possible case. So testing becomes a real mix of engineering and creative, out-of-the-box thinking, just to figure out what to test. And that is only the beginning! Testers don't have an infinite amount of time to evaluate each potential release of software. Sometimes software gets released multiple times a day! Choices have to be made about what's most important to test, and which additional tools and strategies to adopt to make sure that as much as possible can be tested.

Imagine if you were an ancient Roman tester, and people brought their metal to you to heat up and determine its quality. Once the metal starts piling up faster than you can gather logs and heat up your pots, you've got a problem, and you need to figure out not just new and clever methods for determining the quality of the metal, but also the mechanics of how to scale your testing practice. Maybe you buy more pots, or invest in technology that starts fires faster. Maybe you get super advanced and figure out that by mixing certain reagents in with the metal, you can get an instant read on one aspect of the quality of that metal! Anyway, I don't want to push this analogy too far. But this is very much the situation encountered by testers. Almost all testing of software nowadays is done in a context that requires great breadth, depth, speed, and scale. It's an impossible set of requirements, unless testers adopt some of the same modes of thinking that have propelled software development to huge levels of scale.

And yes, by that I mean automation, which is a conversation for another time. The main takeaway for now is this: testing is a practice which uses creativity and technology to build and execute special crucibles for software, with the goal of determining the quality of the software to as accurate a degree as possible!

Kinds of Testing

So, we defined testing as follows: the act of defining, building, and executing processes that determine application quality. This is a good definition, but it's pretty general. We can get a lot more specific about the various types and subtypes of testing, so we know what kinds of options we have when it comes to defining and building these test processes.

The way I like to think about types of testing is as a 3-dimensional matrix. Each dimension corresponds to one facet or characteristic of a test. Here are the three dimensions that I see:

Test targets. A test target is basically the thing you want to test! We might call it the "object under test" to be general. In software testing, it's usually called the "App Under Test", sometimes written AUT. But target means more than saying what app you're testing. Usually we're only responsible for testing one app anyway, so the App Under Test is kind of a given. What I mean by "target" is, what component or layer of the app is being tested. When we develop software, the software usually has a number of distinct components, and a number of distinct levels. We can consider the levels independently, and even test them independently if we want. One component or level of an application could be an individual code function. Users don't execute individual code functions---that happens behind the scenes, so to speak. But we could test a specific function in isolation! Another layer could be the layer of an application's API. Again, users don't typically try to access an app's backend API. That's something that the app itself would do in order to provide the appropriate information to the user. Yet another layer could be the user interface or UI of the app. This is the part that a user actually interacts with! And we could test all of these different layers. We could test individual units of code. We could test APIs. We could test integrations between various backend services. And we could test the UI. That's what I mean by "target"---the test target is the answer to the question, what aspect of the app itself are you trying to test?
Test types. Now here I mean something very specific by the word "type". In this sense, I mean, what aspect of the app's quality are you trying to test? Apps can have good or poor quality on a number of different dimensions. One app might be fast but perform calculations incorrectly. So we would say that it has good performance but poor functionality. Another app might have good performance and functionality, but be hard for people who are hard of hearing to use. Then we would say that app has poor accessibility. Performance, functionality, and accessibility are all aspects of the quality of an app. As apps become more and more important in our lives, we keep coming up with more and more aspects of app quality that become important! If we want to test all of these categories, I would say we want to implement a number of different test types. So that is what I mean by "type".
Test methods. This category is a bit more straightforward. Basically, method just refers to the procedure used to implement the test. How will the test be performed? That's the method. What are some methods? Well, what we call "manual testing" is one method. Manual testing is simply the act of using an app the way a user would, or exercising an app component in a one-off way. In other words, it's done by hand, by a human being. You can manually test units of code. You can manually test APIs. And of course, the most common instance of manual testing is manual testing of app UIs. But manual testing isn't the only method. We can also write software to test our app under test. In this case, our tests are considered to be using the automated method.

So this is the structure I use to help give us an understanding of the various types of tests. These are three dimensions, meaning that in principle the test targets are independent from the test types, and they're both independent from test methods. In practice, not every possible combination of those three dimensions ends up making sense, but a lot do. Let's look at some examples!

Unit Testing

The name "unit testing" refers to the idea that we are testing a specific unit of code. Unit testing can be described using the different test dimensions in this way: the "target" of a unit test is typically a single function in the codebase. The "type" is usually understood to be functionality. Usually when writing unit tests, we are making sure that the specific bit of code in question functions appropriately, in terms of its inputs and outputs. We could run other types of tests on single code functions, for example we could test their performance. But usually unit tests are written to test functionality. And the "method" for a unit test is usually automation.

UI Testing

UI Testing goes under many different names. Sometimes it's called "end-to-end" testing. Sometimes it's just called "functional testing," and some people even refer to UI tests when they talk about "integration tests." I think most of these other names are a bit confusing, so I prefer to use the term "UI Test." This makes clear what the target of our testing is: it's the UI of an application. As we talked about earlier, an app has many different layers, but users typically only interact with the UI. When you are doing UI testing, you are running tests from the perspective of the user! Now, "UI" just refers to the target of testing, and it doesn't specify the type or method involved. Typically, if we don't say anything else, UI testing is assumed to be functional in nature. That is, we are testing the UI's functionality. And of course, we can do this using different methods, whether manual or automated.

Integration Testing

Another type of testing which is commonly discussed is 'integration testing'. Sometimes, when people say "integration testing" they might mean "UI testing" so it's good to make sure everyone's working with the same definition. When I say integration testing, what I mean is testing the communication, or "integration", between two different services. Let's go back to the 3-dimensional framework we've been using to unpack this a little bit. In terms of a target or targets, integration testing is focused on two components of a larger system. In the world of modern software development, the most obvious candidate for these components would be a set of microservices that all communicate in order to power an application. But the components don't necessarily need to be free-standing services; they could also just be two chunks of a much larger monolithic codebase, which are responsible for communicating in some way. The test "type" I mean when I talk about "integration testing" is usually "functional". We could do performance integration testing, but it's usually assumed that we're just talking about making sure the integration works at all. Finally, the test "mode" can be manual or automated, just like any other form of testing we've discussed, but this kind of testing is almost always automated.

A good example of integration testing would be asserting that two services communicate with one another correctly, even if the entire stack is not in the picture. Maybe you have one service that reads customer orders from a task queue, does some processing on them, and then sends certain orders to a system that is responsible for physical order fulfilment (in other words, shipping), and sends other orders to a system that is responsible for digital order fulfilment (in other words, e-mailing digital products or something like that). You could image setting up a test environment that just runs these few services, and has no frontend, and no actual real database or task queue. Fake orders can be created in a simple in-memory queue, and orders received by upstream services can simply be included in a correctness assertion, but not actually acted on. This would be an example of an integration test.

Conclusion

There are lots of other kinds of testing out there, but I hope that you remember the three dimensions we can use to define any type of testing: the test target, the test type, and the test mode. The target specifies what thing we're trying to ensure quality for. The type specifies what sort of quality we care about (speed, correctness, visual appeal, etc...). The mode specifies how we go about the test (do we use automated means, or manual means, or some specific subcategory of those?). Take a moment and think about the various tests you've written in your career, and try to map them on this schema.

The Appium Pro newsletter and site are made with love by Jonathan Lipps.

Appium is a registered trademark of the OpenJS Foundation.