What is the goal of the Android Network Traces website?
Many smartphone apps collect and send a great deal of data to third-party websites. The goal of this site is to help people understand Who knows What about us and Why. By Who, we mean these third-party websites. By What, we mean what data they are collecting about us. By Why, we mean the purpose of this data collection; e.g. for advertising, for maps, for backups, and so on.
What types of data can I learn about from Android Network Traces?
General ID data: MAC address, IMEI number UUID, and other identifiers. Used for analytics, advertising, authentication, and anti-fraud activities.
Device data: Phone model, screen size, manufacturer info etc. Used for analytics, advertising, and interface customization.
Account data: demographics such as email, age, gender, zip and settings for tasks like syncing data from different services. Used for analytics, advertising, and login purposes.
Network data: IP address and network connectivity details like WiFi, LTE, 3G, 4G etc. Used for analytics, network optimization, and advertising.
Location data: Location related GPS data from the phone + other Geospatial data. Used for advertising, analytics, personalization, and nearby searches.
What keywords would help me understand Android Network Traces?
Network trace: a single piece of data sent across the internet (i.e. packet) that we were able to intercept (ex. latitude of 45 would be a coordinate for location data).
Network request: A request may contain multiple network traces (ex. Latitude of 45, longitude of 90). The most common type of traffic request is HTTP or HTTPS.
UI monkey: a bot that randomly clicks through an app.
Who are the people behind this web site?
We are researchers at Carnegie Mellon University. See our About Us page for more info.
About the collected data.
How much data was collected?
We accessed approximately 15 thousand apps, which sent 6.3 million network traces to 12 thousand domains. You can read more about our research in our Mobipurpose paper, and download the data here.
How did you collect data from apps?
We collected data using 8 phones, an automation tool for clicking through apps, and a custom VPN for intercepting data.
Phones and apps: In November 2016, we crawled the Google Play store to find free apps visible to US users. Out of the 185,000 apps free apps updated after 2015, 30,000 apps were compatible with Android 7 and were thus installed on eight phones. Network traces were found from half of these apps.
Automation tool: Each phone contained an automation tool called a “UI monkey”. The UI monkey could install an app, randomly click through it for 3 minutes, and then uninstall it. We ran the UI monkeys for 50 days.
Custom VPN: As the UI monkeys clicked through the apps, the phones sent data across the internet. The most common way for smartphone apps to communicate with cloud servers is through HTTP(S) requests. Therefore, we built a custom tool (or “man-in-the-middle VPN”) to intercept these outgoing HTTP(S) traffic requests.
This approach is novel in its ability to not only assess which data leaves the phones but also collect millions of data points. A limitation is that UI monkeys miss some screens in an app. Additionally, these bots were generally not logged into the apps.
How did you process the collected data for analysis?
The responses (i.e. network traces) that our VPN intercepted were parsed as key-value pairs. For each pair, we performed a taxonomy look-up to assign a data type. For example, a key of latitude could have a value of 40.460865 and be classified as “Location data”. Our algorithm successfully predicted the data type with 95% precision. The five data types supported with our current dataset are shown above.
How often do you refresh your data?
Our predictive results are based on the one-time analysis of the dataset from a project called MobiPurpose. We assume the data type sent to the cloud services and the data collection purposes to be quite static. If we update the raw dataset in the future, we will put up a notice.
Where can I download the data used in the Android Network
Our team built both PrivacyGrade.org and this website. PrivacyGrade assigns a privacy score to an app based on user expectations, while Android Network Traces provides transparency on where our data is going and why. That is, Android Network Traces focuses more on the third parties that are gathering data about us.
What are examples of the purposes?
Analytics: A general ID could be used to avoid redundant device counting in marketing, and location and account data could be used for marketing analysis.
Advertising: Network data can be used for personalizing advertisements, and general ID could support ad targeting/evaluation.
Interface customization: Device information can be used to customize an interface based on the screen resolution.
Anti-fraud activities: Device information could be used to enforce limits on both free content and advertisement.
Third-party login: Account data could help a user log into a new app with their facebook or google account.
Network optimization: Network data could be used to download low resolution images when on LTE.
For researchers interested in our full purpose taxonomy, it is available on GitHub as part of the CMU's contribution to the Brandeis project.