Edition 46

Sending Arbitrary Keystrokes With The Actions API

If you're anything like me, sometimes you just want to send arbitrary keystrokes to an app, without necessarily having found a text input field yet. Actually, just kidding; this is not a genuine hobby of mine, but it's a nice way to introduce this edition's topic! It is actually possible to use the new W3C Actions API not only to send pointer actions, but also to send key input. At the moment, this is supported in Appium's Android drivers.

The use cases for this feature are varied and probably not too common. Some apps require keyboard input on non-text fields, or have elements that can't be directly accessed (maybe as part of a game that implements its own keyboard, for example).

For the simple purpose of sending keystrokes, regardless of whether an element is 'focused' or not, the Java client has as nice easy method for doing this, utilizing the Actions class:

Actions a = new Actions(driver);
a.sendKeys("foo");
a.perform();

We construct an instance of Actions by passing in our session, register a sendKeys action on it with the characters we want to type, and then call perform() to make it all happen. The fact that we have to explicitly call perform() means we could register a number of key inputs, perhaps with a series of waits in between, or perhaps mixed together with some pointer inputs as well.

This is all well and good, but what if we want lower-level control over the typing? What if we want to press multiple characters at once, or hold down a meta key (like SHIFT) while typing another character? In that case we'll need to explore the KeyInput class. The way that we use it is very similar to the way we use the PointerInput class in the gesture actions guide. First, we define a Sequence to contain our actions. Then, we define a KeyInput we use to generate the key actions (KeyDown or KeyUp on specific keys) we will register. Then, we register our actions with the overall sequence, and finally perform() that sequence with our driver.

In the following example, we type "Foo" into the app, and get the capital "F" by using a combination of keystrokes that overlap in time:

KeyInput keyboard = new KeyInput("keyboard");
Sequence sendKeys = new Sequence(keyboard, 0);

sendKeys.addAction(keyboard.createKeyDown(Keys.SHIFT.getCodePoint()));
sendKeys.addAction(keyboard.createKeyDown("f".codePointAt(0)));
sendKeys.addAction(keyboard.createKeyUp("f".codePointAt(0)));
sendKeys.addAction(keyboard.createKeyUp(Keys.SHIFT.getCodePoint()));

sendKeys.addAction(keyboard.createKeyDown("o".codePointAt(0)));
sendKeys.addAction(keyboard.createKeyUp("o".codePointAt(0)));

sendKeys.addAction(keyboard.createKeyDown("o".codePointAt(0)));
sendKeys.addAction(keyboard.createKeyUp("o".codePointAt(0)));

driver.perform(Arrays.asList(sendKeys));

As you can see, this strategy is quite a bit more verbose, since we have to register both the down and up state of every key we want to add to the sequence. (Of course, in real test code we would probably create some helper method to reduce boilerplate here). It's important to note that these low-level methods use character code points rather than characters themselves. And we're making good use of the built-in Keys class from the Selenium client, which lets us access the SHIFT key without needing to look up its code point ourselves.

That's all there is to it! You can get pretty fancy, of course, because you're not limited to the usual ASCII keys and can press multiple keys at once, or in a time sequence of your choosing. Have a look at the full code sample below, where both strategies discussed above are represented:

import io.appium.java_client.AppiumDriver;
import io.appium.java_client.MobileBy;
import java.io.IOException;
import java.net.URL;
import java.util.Arrays;
import org.junit.After;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
import org.openqa.selenium.By;
import org.openqa.selenium.Keys;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.interactions.Actions;
import org.openqa.selenium.interactions.KeyInput;
import org.openqa.selenium.interactions.Sequence;
import org.openqa.selenium.remote.DesiredCapabilities;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;

public class Edition046_W3C_Keys {

    private String APP = "https://github.com/cloudgrey-io/the-app/releases/download/v1.8.0/TheApp-v1.8.0.apk";
    private By loginScreen = MobileBy.AccessibilityId("Login Screen");
    private By username = MobileBy.AccessibilityId("username");

    private AppiumDriver driver;
    private WebDriverWait wait;

    @Before
    public void setUp() throws IOException {
        DesiredCapabilities caps = new DesiredCapabilities();

        caps.setCapability("platformName", "Android");
        caps.setCapability("deviceName", "Android Emulator");
        caps.setCapability("automationName", "UiAutomator2");

        caps.setCapability("app", APP);
        driver = new AppiumDriver(new URL("http://localhost:4723/wd/hub"), caps);
        wait = new WebDriverWait(driver, 10);
    }

    @After
    public void tearDown() {
        try {
            driver.quit();
        } catch (Exception ign) {}
    }

    @Test
    public void testSendKeysAction() {
        wait.until(ExpectedConditions.presenceOfElementLocated(loginScreen)).click();
        WebElement usernameField = driver.findElement(username);
        usernameField.click();
        Actions a = new Actions(driver);
        a.sendKeys("foo");
        a.perform();
        Assert.assertEquals("foo", usernameField.getText());
    }

    @Test
    public void testLowLevelKeys() {
        wait.until(ExpectedConditions.presenceOfElementLocated(loginScreen)).click();
        WebElement usernameField = driver.findElement(username);
        usernameField.click();

        KeyInput keyboard = new KeyInput("keyboard");
        Sequence sendKeys = new Sequence(keyboard, 0);

        sendKeys.addAction(keyboard.createKeyDown(Keys.SHIFT.getCodePoint()));
        sendKeys.addAction(keyboard.createKeyDown("f".codePointAt(0)));
        sendKeys.addAction(keyboard.createKeyUp("f".codePointAt(0)));
        sendKeys.addAction(keyboard.createKeyUp(Keys.SHIFT.getCodePoint()));

        sendKeys.addAction(keyboard.createKeyDown("o".codePointAt(0)));
        sendKeys.addAction(keyboard.createKeyUp("o".codePointAt(0)));

        sendKeys.addAction(keyboard.createKeyDown("o".codePointAt(0)));
        sendKeys.addAction(keyboard.createKeyUp("o".codePointAt(0)));

        driver.perform(Arrays.asList(sendKeys));

        Assert.assertEquals("Foo", usernameField.getText());
    }

}

(Don't forget to check out the full code sample inside the runnable project on GitHub)