Data Quantity Report page provides visualizations on the number of data records each participant has uploaded per unit of time during the participation period, for each data source. This provides a quick way to monitor the data quantity and spot non-compliant situations which might require an intervention.
One of the main uses of these graph is to show whether data is available for a given time window or not, and if not, whether data is missing only for a specific data source, or from all data sources. Such information can be interpreted as follow:
- If the data is available for all data sources, it means the app was functioning as expected during that time window (though the quality of data should be investigated separately).
- If the data is missing for a set of data sources, it shows the app was functioning properly, though an external factor was preventing the app from collecting data for that specific data source. For example, while the app was running, user had manually turned off her GPS.
- If the data is missing from all data sources, very likely the app was not operational during that time window.
The graphs presented here do not represent the quality of the data provided. For example, assume your study records GPS data. The data quantity report can easily show how often the participant provided GPS data and how often they did not. But this report does not distinguish between cases where the participant had forgotten to carry their phone and abandoned it on the desk, versus the cases where the participant carried the phone with her at all times.
You can monitor the data quality for each data source separately on its relevant section of the dashboard.
Plotting the report
To access the graphs on data quantity, open Data Quantity Report page by selecting it from the left panel. The page allows you to select a participant, a time period, one or more data sources, and a unit of time. For example, in the image below we are requesting the compliance report for participant #231, from May 8th to June 7th 2016, for Bluetooth, GPS, and Wi-Fi data sources. We are also asking the data to be aggregated per day.
Pressing Go will extract the number of records uploaded by participant 231 for each of these data sources, for each day in the specified period, as shown in the image below. You can see that each data source generates different number of records per day. For example, Wi-Fi has generated between 4000 to 7000 records per day (it can be said user's device recorded 7000 Wi-Fi access points in proximity, where likely many of the access points were visited more than once), while GPS records can go even up to 10,000 records per day, and for Bluetooth it's mainly below 1500 records.
It's important to note that the numbers shown for each data source cannot be compared to other data sources. For example recording 100 Bluetooth data in 1 hour indicates participant's device was in proximity of less than 100 Bluetooth devices (as many might have been reported multiple times), and it cannot be compared in any form to recording 1000 GPS locations in the same time window.
For each graph, if you move your cursor on the plotted bar, you can see the time and the number of records represented by that data point. You also can drag your cursor over a specific period to zoom-in, or double-click on the graph surface to zoom-out.
Data Source-specific interpretation
The following image shows the number of GPS records uploaded from a given participant during May 8th.
The curser in this graph points to 7am, which shows there has been 217 GPS records at this hour. It means the participant has uploaded 217 GPS records from May 8th, 2016 7:00am to May 8th, 2016 7:59am, inclusive. The timezone of this values is the participant's local timezone.
This graph also shows starting from 7am, the participant's device has uploaded considerably more GPS data until 4pm, at which point again the device has reported a modest number of GPS records. This is in accordance with GPS data collection logic (as explained in GPS data source description), which tries to monitor participant's mobility, and record GPS data only if necessary.
Therefore, this graph can be interpreted as follow:
The participant has visited many locations during the day (from 7am to 4pm), while after 4pm and before 7am, she has been mainly in the place and has not moved as much.
The following image shows the number of Wi-Fi access points observed by participant's device per hour, during May 8th.
The selected point in the graph shows the participant's device has scanned 456 Wi-Fi access points from 4:00pm to 4:59pm inclusive. Note that these access points are not unique, so it does not mean the participant was in proximity of 456 unique access points. Ethica scans for access points in proximity on average every 5 minutes, and assuming the participant is been stationary for the whole hour, the same access point can be scanned 12 times. Therefore a given access point can be counted on average 12 times per hour.
While the number shown here does not indicate unique access points, the higher number usually indicates the denser Wi-Fi networks in proximity, which is a good indication of a densely populated, mostly commercial, areas.
Survey responses plot shows the number of responses recorded at each hour. In this graph, the number of responses which are fully or partially Completed by participant are shown in blue, the survey prompts which were Cancelled are shown in green, and the prompts which were Expired are shown in orange. You can read more about what is counted as completer, expired, or cancelled survey in the Surveys section.
Unlike other data sources, you don't expect to have some survey responses for each hour. Therefore it makes more sense to change the time unit for this plot to day or even week. The following image shows the number of responses provided by a participant during May 8th to June 7th 2016:
The mouse pointer is located on May 27th, and you can see this participant has answered 6 surveys, while he cancelled 1 prompt, and 7 prompts were expired before the participant answered them.
Selecting Bluetooth in the data sources section will plot the number of Bluetooth devices scanned in proximity per selected unit of time. Similar to Wi-Fi access points, the number of visited Bluetooth devices does not show unique devices visited, but each device can potentially be scanned on average 12 times per hour, once for every 5-minute interval which the destination device is been in proximity of the participant's device.
The following graph shows the number of devices visited by a participant during May 7th 2016. You can see that no device is recorded from 3:00am to 2:00pm. This can be due to multiple reasons:
- There has been no Bluetooth devices in proximity.
- Participant has turned off the Bluetooth on her device to save battery.
- The phone or the app has turned off and was not operating at that time.
To distinguish between these scenarios, we can use another data source to cross check. Battery is usually a good choice in these cases, as it's always available and cannot be disabled by the participant. The following graph shows the number of data recorded per hour for both Battery and Bluetooth data sources.
You can see here at exactly the same period that no Bluetooth data was recorded, no battery data was recorded either. Considering the consistency of battery data, we can be sure the app was not operational at that time.
Other data sources
You can plot the number of records uploaded by each participant from each data source in Data Quantity Report section. It's important to keep in mind the number of records plotted here is aggregated for each time unit, and should be interpreted by the number expected from that data source. Some data sources such as accelerometer upload thousands of records per hour, while others such as survey responses upload either none or very few.
The following image shows the number of records uploaded from accelerometer and light sensor during the course of one day from one participant. You can see while both these data are from automatic sensors, they are uploaded in different quantities.
Common causes of non-compliance
Using the plots shown here we can find out how much data is uploaded by each participant at any period of time. Here we describe the main reasons causing a gap in each participant's data.
Turning off specific sensors to save battery
Some sensors can be turned off externally, such as Wi-Fi, Bluetooth, or GPS, which prevents data collection from that sensor. In this case, Ethica shows a relevant notification to the user, letting them know the study is partially interrupted and they need to turn on the sensor to resume the data collection. You can see different notifications participants might receive in Participants section.
While the status of each sensor (whether a given device provides the sensor, and if so, whether it's enabled or not) is recorded in Ethica operational logs, as of now there is no way to access these logs directly. You need to access raw collected data in order to view these information.
In the meantime, you can check the status of other data sources. If during the same time other data sources were providing data, but not the data source you are interested, it's usually an indication that other external factors prevented Ethica from operating that particular data source.
Snoozing the app
In order to ensure participant's privacy, Ethica allows participants to snooze the app. Snoozing the app puts the app to sleep for 1 hour, and participants can put the app to snooze as many times as needed. You can read about this feature in Pausing Study Participation section.
While the app is paused, no data is being collected from any data source. While participants can start any user-triggered survey and respond to it, none of the other automatically-triggered surveys are issued during this time either. While Ethica records explicit information on using pausing behaviour by each participant, as of now the only way to access this information is through the raw data.
Turning off the sensor permanently from the app
While selecting data sources for a new study, you can choose the data source to be optional or mandatory. If the data source is marked as optional, participants will have the option to opt-out from the data source completely and not provide the requested data. You can read the steps involved for participants to opt-out from a data source here.
If participants opt-out from a data source, no data from that source is recorded and uploaded, therefore the data quantity plot for that source will be blank.
One of the most common reasons for a participant not to upload the data is if their device has run out of battery and has been off for sometime. This obviously leads to no data being collected or uploaded during that period.
If a device turns off for any reason, none of the data collected prior to it will be lost. Also any surveys in progress will be resumed when the device turns back on again, and participants will be able to continue answering the surveys. When the device turns on, Ethica will start operation automatically and participants don't have to remember to start the app.