Edition 110

Controlling Appium via raw HTTP requests with curl

Appium plus curl

Did you know that Appium is just a web server? That's right! I'm going to write a full edition at some point on the WebDriver Protocol, which is the API that Appium implements (and which, as you can tell by the name, was invented for the purpose of browser automation). But for now, let's revel in the knowledge that Appium is just like your backend web or API server at your company. In fact, you could host Appium on a server connected to the public internet, and give anybody the chance to run sessions on your devices! (Don't do this, of course, unless you're a cloud Appium vendor).

You may be used to writing Appium code in Java or some other language, that looks like this:

driver = new IOSDriver<WebElement>(new URL("http://localhost:4723/wd/hub"), capabilities);
WebElement el = driver.findElement(locator);
System.out.println(el.getText());
driver.quit();

This looks like Java code, but what's happening under the hood is that each of these commands is actually triggering an HTTP request to the Appium server (that's why we need to specify the location of the Appium server on the network, in the IOSDriver constructor). We could have written all the same code, for example, in Python:

driver = webdriver.Remote("http://localhost:4723/wd/hub", capabilities)
el = driver.find_element(locator)
print(el.text)
driver.quit()

In both of these cases, while the surface code looks different, the underlying HTTP requests sent to the Appium server (and the responses coming back) are the same! This is what allows Appium (and Selenium, where we stole this architecture from) to work in any programming language. All someone needs to do is code up a nice little client library for that language, that converts that language's constructs to HTTP requests.

What all this means is that we technically don't need a client library at all. It's convenient to use one, absolutely. But sometimes, we want to just run an ad-hoc command against the Appium server, and creating a whole new code file and trying to remember all the appropriate syntax might be too much work. In this case, we can just use curl, which is a command line tool used for constructing HTTP requests and showing HTTP responses. Curl works on any platform, so you can download it for your environment if it's not there already (it comes by default on Macs, for example). There are lots of options for using curl, and to use it successfully on your own, you should understand all the components of HTTP requests. But for now, let's take a look at how we might encode the previous four commands, without any Appium client at all, just by using curl!

# 0. Check the Appium server is online
> curl http://localhost:4723/wd/hub/status

# response:
{"value":{"build":{"version":"1.17.0"}},"sessionId":null,"status":0}

You can see above that we can make sure we have a connection to the Appium server just by running curl and then the URL we want to retrieve, in this case the /status endpoint. We don't need any parameters to curl other than the URL, because we're making a GET request, and so no other parameters are required. The output we get back is a JSON string representing Appium's build information. Now, let's actually start a session:

# 1. Create a new session
> curl -H 'Content-type: application/json' \
       -X POST \
       http://localhost:4723/wd/hub/session \
       -d '{"capabilities": {"alwaysMatch": {"platformName": "iOS", "platformVersion": "13.3", "browserName": "Safari", "deviceName": "iPhone 11"}}}'

# response:
{"value":{"capabilities":{"webStorageEnabled":false,"locationContextEnabled":false,"browserName":"Safari","platform":"MAC","javascriptEnabled":true,"databaseEnabled":false,"takesScreenshot":true,"networkConnectionEnabled":false,"platformName":"iOS","platformVersion":"13.3","deviceName":"iPhone 11","udid":"140472E9-8733-44FD-B8A1-CDCFF51BD071"},"sessionId":"ac3dbaf9-3b4e-43a2-9416-1a207cdf52da"}}

# save session id
> export sid="ac3dbaf9-3b4e-43a2-9416-1a207cdf52da"

Let's break this one down line by line:

  1. Here we invoke the curl command, passing the -H flag in order to set an HTTP request header. The header we set is the Content-type header, with value application/json. This is so the Appium server knows we are sending a JSON string as the body of the request. Why do we need to send a body? Because we have to tell Appium what we want to automate (our "capabilities")!
  2. -X POST tells curl we want to make a POST request. We're making a POST request because the WebDriver spec defines the new session creation command in a way which expects a POST request.
  3. We need to include our URL, which in this case is the base URL of the Appium server, plus /session because that is the route defined for creating a new session.
  4. Finally, we need to include our capabilities. This is achieved by specifying a POST body with the -d flag. Then, we wrap up our capabilities as a JSON object inside of an alwaysMatch and a capabilities key.

Running this command, I see my simulator pop up and a session launch with Safari. (Did the session go away before you have time to do anything else? Then make sure you set the newCommandTimeout capability to 0). We also get a bunch of output like in the block above. This is the result of the new session command. The thing I care most about here is the sessionId value of ac3dbaf9-3b4e-43a2-9416-1a207cdf52da, because I will need this to make future requests! Remember that HTTP requests are stateless, so for us to keep sending automation commands to the correct device, we need to include the session ID for subsequent commands, so that Appium knows where to direct each command. To save it, I can just export it as the $sid shell variable.

Now, let's find an element! There's just one element in Appium's little Safari welcome page, so we can find it by its tag name:

# 2. Find an element
> curl -H 'Content-type: application/json' \
       -X POST http://localhost:4723/wd/hub/session/$sid/element \
       -d '{"using": "tag name", "value": "h1"}'

# response:
{"value":{"element-6066-11e4-a52e-4f735466cecf":"5000","ELEMENT":"5000"}}

# save element id:
> export eid="5000"

In the curl command above, we're making another POST request, but this time to /wd/hub/session/$sid/element. Note the use of the $sid variable here, so that we can target the running session. This route is the one we need to hit in order to find an element. When finding an element with Appium, two parameters are required: a locator strategy (in our case, "tag name") and a selector (in our case, "h1"). The API is designed such that the locator strategy parameter is called using and the selector parameter is called value, so that is what we have to include in the JSON body.

The response we get back is itself a JSON object, whose value consists of two keys. The reason there are two keys here is a bit complicated, but what matters is that they each convey the same information, namely the ID of the element which was just found by our search (5000). Just like we did with the session ID, we can store the element ID for use in future commands. Speaking of future commands, let's get the text of this element!

# 3. Get text of an element
> curl http://localhost:4723/wd/hub/session/$sid/element/$eid/text

# response:
{"value":"Let's browse!"}

This curl command is quite a bit simpler, because retrieving the text of an element is a GET command to the endpoint /session/$sid/element/$eid/text, and we don't need any additional parameters. Notice how here we are using both the session ID and the element ID, so that Appium knows which session and which element we're referring to (because, again, we might have multiple sessions running, or multiple elements that we've found in a particular session). The response value is the text of the element, which is exactly what we were hoping to find! Now all that's left is to clean up our session:

# 4. Quit the session
> curl -X DELETE http://localhost:4723/wd/hub/session/$sid

# response:
{"value":null}

This last command can use all the default curl arguments except we need to specify that we are making a DELETE request, since that is what the WebDriver protocol requires for ending a session. We make this request to the endpoint /session/$sid, which includes our session ID so Appium knows which session to shut down.

That's it! I hope you've enjoyed learning how to achieve some "low level" HTTP-based control over your Appium (and Selenium) servers!