Friday 10 April 2020

Can the Covid-19 tracking app be trusted?

A research team at King's College London have been attempting to track UK and USA Covid19 cases via a self-reporting app, available for iOS and Android at https://covid.joinzoe.com/ and would like to encourage as many people as possible to download the app and self-report every day so we have a better data-set. Some 2 million people have already joined the programme, and results are already being published.

But the ultra-cautious among us might be asking: since the app's source code is not available, how can we be sure it's really doing what it says it is, and not for example spying on our phones?

The development of the app was outsourced to a commercial startup company, Zoe Global Ltd, who unfortunately have been unable to "open-source" the app, probably because, like many commercial companies, they are afraid that disclosing their source code will somehow harm their future business—although in fact there are many good examples of successful companies that do disclose their source code, such as the Android platform itself, and the Jitsi alternative to Zoom, but not everyone has yet fully grasped how to harness such "open-source" business models, and Zoe Global Ltd seems to be one of the companies that cannot do it yet. Nevertheless they were able to quickly produce the app in a crisis, and the research team felt that good positive press coverage by The Guardian, the BBC, King's College London and Professor Tim Spector should be enough assurance of the app's good behaviour without also having to disclose its source code.

Although we were unable to be given the source code when we asked for it, we were nevertheless able to perform a limited independent investigation into how the app works by downloading the Android APK file for the app (package name com.joinzoe.covid-zoe) from a third-party mirror site called APKFab, and looking inside it.

The download from APKFab is a file called "COVID Symptom Tracker_v0.9_apkfab.com.xapk" which can be unpacked using the "unzip" command on GNU/Linux. It contains a "manifest.json" file that gives it the following permissions:
  • Receive messages from the Android Cloud to Device Messaging (C2DM) service. This lets the research team send you messages (which they'll probably do only if they want to clarify an exceptional case).
  • Access the Internet. This is obviously required to send the results. The app also requests permission to read the state of the Internet and Wi-Fi connections, which could be used for example to check that the Internet connection is good before trying to use it.
     
  • Display a window over the top of another app. It's not clear why they want this permission: most likely the requst was left in the code by mistake. This permission (which is requested by 10% of apps on the Google Play store) can in theory be abused by "overlay malware" to pop up a fake login screen on another app, so you think you have to enter login details into the other app but you are actually giving them to the malware author.
     
  • Modify audio settings. This allows the app to change the volume, and to change from headphones to speaker or vice versa. Again, there is no clear reason why the Covid19 app should have this permission (unless they want to be able to shout at you in an emergency)—most likely it was left in by mistake.
     
  • Run a "service" and start automatically after the phone is restarted. This is used to give you a notification reminding you to report if you haven't done so for a while.
     
  • Prompt to install new packages (apps). This permission is usually requested if an app wants to bundle its own update mechanism. It won't be able to actually install other apps unless you say "yes" when asked.
     
  • Access the phone's storage, i.e. shared files. This is not strictly necessary unless the app wants to store its data in a place that will persist even after it has been uninstalled and reinstalled, or if it wants to use the external storage card. Apps that request this permission can read files you leave in the shared filespace, so we normally want the app to have a good reason to ask for it.
     
  • A permission called "bind get install referrer service" which is automatically added by Google Play Services unless the developer overrides it; it's believed to be harmless analytics on where the app came from.
Looking further into the app, the "config.*" files mostly contain graphics and messages in various languages, although the file "config.arm64_v8a.apk" also contains various library files, including one whose purpose is to detect faces in images. Hopefully this was left in by mistake, since there's no reason for this app to go looking through your album picking out faces (it doesn't have permission to use the camera, but apps with the Storage permission could in theory look at pictures you've already taken). Given that the company had to write the app so quickly, they probably copy/pasted a lot of code from another project and this app ended up with large chunks of unused code which is what we are seeing here; if they'd had more time to develop it properly they'd have hopefully taken this out.

The main part of the app is in a file called "com.joinzoe.covid_zoe.apk" which contains more face-detection resources (in the "assets/models" directory, hopefully left in by mistake), supporting files for Google Play Services and crash-analytics libraries (they want to know what happens if their app crashes), a Soundex library (for matching mis-spelled words), and 16 megabytes of compiled Kotlin code.

It's hard to analyse the code without source code, but there are some tools that can try to back-translate ("decompile") it into code that is partly readable by a developer. We used the "d2j-dex2jar" utility to back-translate the code, although it did fail on some of it, most notably on a couple of libraries from Facebook, one of which is responsible for managing Facebook advertisements. Again we hope this was left in the app by mistake and not actually used.

After "d2j-dex2jar" has done its work, the resulting "jar" files can be inspected using a tool called "jd-gui". We browsed through it and found various user-interface support libraries (not all of which are used), the Amplitude library for tracking what users do in the app, the Bumptech Glide image loading library, and libraries from both Facebook and Google that include advertising functions. The app also bundled a library called Expo that includes a barcode scanner, camera handler (although the app does not have permission to use this), SMS, speech and printing functions\u2014obviously much of this is unused and they didn't have time to take it out (that's how apps end up taking more space on your phone than they should). It also contains an open-source cryptography library which can help send data to their servers securely.

What we were really looking for, though, was the main part of the program—not all libraries are used, so it would be nice to start at the entry point and see which of them actually are used. In future it would help if unused libraries were completely removed, so we wouldn't need to have conversations like “yes we have code in our app that tracks faces and Facebook, but it's switched off and we forgot to take it out”, but obviously the developers had to rush and their method does seem to be “copy everything from another project and use just some of it”.

Unfortunately the all-important main-program classes under "com.joinzoe" do not seem to have been listed as such by the "dex2jar" utility, so finding them would require reading through the best part of a million lines of support code, which would take quite some time. (Perhaps some companies have a deliberate practice of dumping lots of extra unused code in their work just to confuse anyone who tries to analyse it.)

So all we can say at this stage is: we haven't confirmed the app won't contact Facebook, mess with your volume control, pop up misleading boxes over other apps, or look for faces in your photo album, but at least it can't use your camera or microphone, can't read your precise location and can't access the phone, contacts or messages. And the researchers who founded the startup do seem to have a track record of not being particularly devious, so let's apply Hanlon's Razor (“never attribute to malice what is sufficiently explained by stupidity”) and assume the suspect functions were left in by mistake and are most likely turned off. Moreover, we need this data, so do please use the app and self-report every day if you can.

One final note that may be of interest: The app requires at least version 5 of Android to run. Android 5 was released in 2014, and Google figures say 90% of Android users are on Android 5 or above (this is a global figure; they don't say how it's distributed by country). If you are in the 10% who are still using Android phones more than 6 years old, or if you have a non-Android non-iOS phone, then you cannot use the app, which may give the results a certain selection bias. The app asks for your age, so they could in principle correct for the over-representation of younger people (the elderly are less likely to be smartphone users), but it's harder to correct for differences in income: a low-income person stuck on a 7-year-old phone (and therefore excluded from the study) is more likely to have to catch a bus to work in a supermarket every day instead of staying safe at home doing office work. If this bus user with a very old phone is therefore more likely to encounter the virus, then the data is not completely representative if we're not counting them. But some low-income families do in fact have up-to-date smartphones because they've saved up for them and made it a priority, for example because they wanted to use religious or learning apps that required smartphones, and these will now be able to participate in the Covid19 study if they wish.

(written with help from Silas S. Brown)

No comments:

Post a Comment