Edition 22

Making Your Appium Tests Fast and Reliable, Part 4: Dealing With Unfindable Elements

This article is the fourth in a multi-part series on test speed and reliability, inspired by a webinar I gave on the same subject (you can watch the webinar here). You might also want to check out Part 1: Test Flakiness, Part 2: Finding Elements Reliably, and Part 3: Waiting for App States.

One of the sad truisms of mobile testing is that we can't always get what we want. The stack is so complex, the tools so nascent, that we are often in a position of having to try weird hacks just to get our tests to work. Appium does everything it can to canonize the most reliable weird hacks as official endpoints, so you can just go about your day writing straightforward Appium client code. But sometimes, the world doesn't cooperate.

One common occurrence in this category is when an element is clearly visible on the screen, and yet cannot be found. You've printed out the page source. You've gone hunting in Appium Desktop, and sure enough, there's no sign of the element! Why could this be? There are a number of reasons, but usually it has to do with how your app is developed. For iOS and Android, Appium is fundamentally at the mercy of the automation engines it builds on (XCUITest and UiAutomator2, respectively). These engines use the accessibility layer of the OS to determine which elements are available for automation. Oftentimes, apps which employ custom UI components have neglected to register these custom components with the accessibility service. This means that said components are essentially invisible from the perspective of XCUITest/UiAutomator2, and hence un-automatable from the perspective of Appium.

If this describes your problem, and if you have the ability to change the app source code (or can convince someone else to), the solution is actually pretty straightforward: simply turn on accessibility for your custom UI controls! Both Apple and Google mention how in their docs, but it's easy to understand why a developer might forget to do this, since it doesn't affect the functionality of the app. Anyway, you can check out the technique at these links:

If changing the app code isn't an option, or if there is some other reason that your element is not findable (for example, React Native doing something goofy when you set your element's position to absolute), you might need to pull out the big guns. (Sorry, that sounds violent: I should say, you might need to pull out the big keyboard! Because coding happens on keyboards!) Regardless of metaphor, what am I referring to? Tapping by coordinates. That's right: with Appium you can always tap the screen at any x/y position you like, and the tap will (/should) work. If the UI control you are targeting is underneath, the net result will be the same as having called a findElement command and then calling click() on the resulting object.

There's a problem, though, which is that tapping by coordinate is notoriously brittle---hence why my suggestions on how to do it right are showing up here, in our series on test speed and reliability. But first, why is tapping by coordinates brittle? For all the same reasons XPath is probably not a good locator strategy, and more:

  • Different versions of your app might put the element in a different place on screen
  • Different dynamic data on the screen might shove the element around even within the course of a single automation session
  • Different screen sizes might make the element fall under a different coordinate

One tantalizing solution would be to find the element visually, meaning by comparison with a reference image. In fact, Appium can do this! But it's pretty new and advanced, and will be the subject of a future Appium Pro edition once all the pieces are firmly in place.

In the meantime, a first stab at making tapping at a coordinate more reliable would be to ensure our coordinate itself is dynamically calculated based on conditions. For example, if we know that our app renders in completely the same proportions across different screen sizes, we can solve the last problem in the list above by storing coordinates as percentages of screen width and height in our code, rather than storing actual x and y values. Then we determine the specific coordinates for a given test run only after making a call to driver.manage().window().getSize(), so we can turn the percentages back into coordinates.

This is a good start, but it still doesn't help us in the case where the element might have moved around for a variety of reasons. Another idea (which I recently saw discussed as a full blog post by Wim Selles) is to find the coordinates of your unfindable element via reference to an element which is findable.

The key insight of this tactic is that it's often the case that the position of the desired element is always fixed in relationship to some other element, which may be found using the normal findElement methods. We can use the size and position of the reference element to calculate absolute coordinates for the desired element. Of course, you'll have to observe your app's design closely to determine which features of the reference element are useful. For example, you might decide that the desired element is always 20 pixels to the right of the reference element. Or, it could be halfway between the right edge of the reference element and the right edge of the screen. Using a combination of the reference element's bounds and the screen dimensions, pretty much any position for your desired element can be targeted dynamically and reliably.

Let's take a look at a concrete example using The App. Currently, the initial screen of The App looks like this:

The App

In reality, I can find all the elements I care about via Accessibility ID. But let's pretend that the "Login Screen" list item is not accessible, and I can't see it in the hierarchy at all. What I'm going to do is gather information about the element right below it ("Clipboard Demo"), and use that to tap the Login Screen element dead center. I will rely on the facts I know about my app, which is that "Login Screen" is a list item of the same dimensions as "Clipboard Demo", and appears directly before it in the list. All I need to do, then, is find the midpoint of "Clipboard Demo", and then subtract the height of the list item, and I'll have the coordinate for the middle of "Login Screen". Basic math, right? So let's have a look at the code (which assumes we've got driver and wait objects all ready for us):

// first, find our reference element
WebElement ref = wait
    .until(ExpectedConditions.presenceOfElementLocated(referenceElement));

// get the location and dimensions of the reference element, and find its center point
Rectangle rect = ref.getRect();
int refElMidX = rect.getX() + rect.getWidth() / 2;
int refElMidY = rect.getY() + rect.getHeight() / 2;

// set the center point of our desired element; we know it is one row above the
// reference element so we simply have to subtract the height of the reference element
int desiredElMidX = refElMidX;
int desiredElMidY = refElMidY - rect.getHeight();

// perform the TouchAction that will tap the desired point
TouchAction action = new TouchAction<>(driver);
action.press(PointOption.point(desiredElMidX, desiredElMidY));
action.waitAction(WaitOptions.waitOptions(Duration.ofMillis(500)));
action.release();
action.perform();

// finally, verify we made it to the login screen (which means we did indeed tap on
// the desired element)
wait.until(ExpectedConditions.presenceOfElementLocated(username));

This is a direct implementation of what I phrased in words above. The important commands are of course the commands for getting the bounds of the reference element (element.getRect()), and the commands for setting up an Appium TouchAction to natively tap on a coordinate. There are a few lines responsible for this latter purpose, but they're straightforward: press on a coordinate, wait for 500ms, and release. That's it!

Happily, this technique is cross-platform, so the above code will run on both the iOS and Android versions of my app without modification. And it is relatively robust and reliable. It will never be as reliable as clicking on an element found by Accessibility ID, of course, because in that case we have a guarantee of calling an underlying tap method on a platform-specific element reference. In most cases, that element reference will exist even if the app is reorganized spatially. But if we are forced out into the cold dark, beyond all ability to find elements, then the technique discussed here will help you at least retain a modicum of sanity.

If you're interested in a runnable example showing the above code working on both iOS and Android, simply check out the code for this edition! And check back soon for the latest in this mini-series on speed and reliability.