How testers coded a mobile farm for iOS
When a certain problem starts to pester a lot, even testers can create something (or break the damn thing so it doesn’t disturb anybody anymore). This article is exactly about the case when our problem became fatal, and we had to solve it through our own development.
Our whole team is spread across several countries, so we know firsthand the advantages and disadvantages of remote work. At some point, the disadvantages had begun to disturb and consequently affect the testing quality, so we solved them using our own development. Further I will explain how we ended up here, what we were doing, what we went through and, of course, how it turned out. Spoiler alert: at the end, there will be a link to GitHub where our work can be found.
Disclaimer
Our intention was not to make cloud farms look bad. In most cases, they do their job well and there are no problems with them. Nor the intention was to promote our own development. The farm we created is an open-source solution. We want to share it with everyone who is interested and who may find it useful in their work and also to engage those who may be interested in improving and refining it. Our farm is already up and running, which means it does not just show signs of life, but rather is fully functional. But there is still room for improvement. Don’t forget, we are not developers, we are testers.
What made the testers code
First of all, let’s go through the terminology: cloud farms (AWS, BrowserStack, etc.) are a variety of different devices and browsers, which can be used to handle comprehensive testing, implement automation, log the device state, etc.
Then there is a logical question: why bother inventing something new when we already have solutions, such as BrowserStack that we mentioned above? At first, we, as well as many testers, used a popular and simple solution for testing IOS, BrowserStack. But it (and other similar tools) has limitations that impact the testing quality and drastically reduce the number of nerve cells in a tester.
The problems we encountered when working with BrowserStack:
1. These are not live devices, just emulation.
That is why the product performance in BrowserStack can be displayed in one way, while it will behave completely differently on a live device.
2. Slow speed of cloud farms
Frequently, all servers of cloud farms are located outside the country so there is too much lag.
3. Limited functionality.
Since this is an emulation, not all native services can be used, i.e. you can’t use phone numbers, insert your SIM card, connect Apple ID or some other native services.
So, our way of realizing the problem looked like this:
1. IOS testing in BrowserStack was increasingly failing, with customers coming in and complaining that we were missing bugs. And we didn’t even see those bugs; none displayed in BrowserStack.
2. The team had to queue up to test IOS. Initially, we were a very small company, and then we started to grow, but due to limited resources, the number of accounts in BrowserStack remained the same.
3. In order to somehow unload the queue, we occasionally purchased live devices and sent them out to our folks. This is quite expensive, and there weren’t always enough resources available.
4. It became clear that something had to be done, so we decided to try making something of our own.
Now onto the technical part:
Let’s start with what the farm consists of:
- IOS devices connected to the server and having access to the network;
- A server;
- A user who connects to the server, sees the devices through some simple web interface and chooses the needed one.
Now, let’s go over the hardware and discuss the technical part in more detail:
Phone state management
In order to implement the transmission of actions from the user to the phone, we wanted to find a simple and convenient tool. That’s what we were looking for:
A tool in a server format, which would receive and process user actions in real-time;
A possibility to implement it using some popular programming languages so that we wouldn’t accidentally start writing in Jython;
Not just Mac OS development.
Only Appium fitted our criteria. Yes, we could have created a Frankenstein of our own by putting together a lot of tools to satisfy all our desires. But let us be honest, we were looking for a simple and free option. However, Appium has a problem that we still haven’t been able to solve in a simple way. Appium doesn’t allow more than one person to be on the farm. We know for a fact that this is because of the ports in Appium itself, which can only work with one connection at a time. One obvious solution we considered was the containerization of multiple Appium servers with different ports, but that’s another level of knowledge and hardware requirements, so we’re still looking for another option, a simpler one.
Okay, back to the user actions. So, what actions can we take? Actually, all the basic user actions: swipes, taps, double taps, etc. We also plan to add the ability to control physical buttons like volume, mute, screen lock.
The process of controlling the phone’s state looks like this: we convert a user action (for example, a click) into a script (the script is written in Python), Appium libraries get connected, and we simply pass these scripts to the phone through Appium.
Transmitting an image from the phone screen to the user
Since we settled on Appium, we tried to set up image transmission based on its built-in tools. But at the time we started development, the image transfer through Appium from the device to the user was too big. So, we went looking for other solutions.
Our path was not an easy one, ranging from transmitting screenshots to setting up an HLS stream:
Attempt #1
The first idea was simple: take a screenshot of the device’s screen at the clock rate of the processor and send it to the user. The screenshots were taken quickly and looked good, but since it was the JPG format, the file size was too big and on average reached the user with a delay of 5 -10 seconds.
Attempt #2
We switched to low-quality PNG. We lost quality, but almost doubled the transfer speed. Yet even this speed was not good enough to properly use the phone, let alone to test products.
Attempt #3
We decided to try HLS streaming. This is what HLS streaming looks like: it collects pictures into a buffer, compiles a video file from the buffer and transmits it endlessly to the user.
The plan was good. But it failed at the first run. The delay between the current state of the phone and the broadcast was as much as 25 seconds. This was due to the fact that the video was being compiled frame by frame at each second.
During our search, the transmission of images through Appium was finished by the developers. The delay was a second, literally. So we returned to Appium.
The process of transmitting images from the phone screen to the user looks like this: Appium, which is on the server, retransmits the screen to the user with a one second delay.
Note: do not forget that the image transmission speed still depends on the ping to the server. So, locate all your servers with farms at least in your city or region, or, at worst, in your country.
Two-way interactive connection
So, we have a phone, we have a user who needs to see the phone and poke at the screenshot. There is only one thing missing: all this has to happen at the same time. Hence, the only possible solution was WebSockets. This technology allows you to establish and maintain a two-way interactive connection between the client and the server in real time. And, what is very important, the technology is protected. Intruders who gain access to your network can simply steal the phones by changing the IDs.
How the farm actually works: we have a phone, we transmit the image to the user via Appium. We have a user who performs an action on these screenshots, which this user receives up to the Appium server. Appium, via its tools, simulates these actions on the phone. All this is protected by the WebSockets connection.
And that’s it. Yes, we will hardly be able to watch any 4k videos, but it’s exactly the right thing for testing the basics. Unless, of course, you are testing video streams.
What we got out of it all.
What we end up with is a live device farm at your side, which will obviously have less lag than cloud farms since it’s located right next to you.
Among its advantages:
Linking to native services;
Installation of any apps;
The behavior of tested products is real as we interact with a live device, not an emulator, as was the case with BrowserStack;
Availability of a server, which allows you to further configure network traffic analysis tools, change VPN, location, etc.
Some disadvantages:
Only one person can operate it at one time, this is still the biggest drawback;
It’s untested, we’re the testers after all. Moreover, we would like to get an outside opinion;
No audio broadcasting. But this is rather a common drawback of all cloud farms. So far it has not been implemented anywhere. And if it is implemented, please write about it in the comments.
The main thing or what you need to run a farm
Concerning the hardware:
- A server where it will run. It is even better to buy some kind of a USB hub so you can connect phones in advance;
- Phones running IOS.
Concerning the skills:
- Basic command line knowledge whether it’s Linux or PowerShell or whatever;
- You will have to work with the network; you need to be able to open a port; start Nginx; code something in Vim, save, and more, if you’re lucky, get out;
- You have to understand that the actions the user does to retransmit are written in Python. Some things can be reworked to suit you if something you think is inconvenient. You can add more if you come up with some features;
- And naturally, any programming is simply about knowing how to google.
And here’s a link to GitHub that has a guide on how to start up this farm at your place. If you have any questions/suggestions or anything else, feel free to write to us.