Background: There is increasing interest in reusing person-generated wearable device data for research purposes, which raises concerns about data quality. However, the amount of literature on data quality challenges, specifically those for person-generated wearable device data, is sparse.
Objective: This study aims to systematically review the literature on factors affecting the quality of person-generated wearable device data and their associated intrinsic data quality challenges for research.
Methods: The literature was searched in the PubMed, Association for Computing Machinery, Institute of Electrical and Electronics Engineers, and Google Scholar databases by using search terms related to wearable devices and data quality. By using PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, studies were reviewed to identify factors affecting the quality of wearable device data. Studies were eligible if they included content on the data quality of wearable devices, such as fitness trackers and sleep monitors. Both research-grade and consumer-grade wearable devices were included in the review. Relevant content was annotated and iteratively categorized into semantically similar factors until a consensus was reached. If any data quality challenges were mentioned in the study, those contents were extracted and categorized as well.
Results: A total of 19 papers were included in this review. We identified three high-level factors that affect data quality—device- and technical-related factors, user-related factors, and data governance-related factors. Device- and technical-related factors include problems with hardware, software, and the connectivity of the device; user-related factors include device nonwear and user error; and data governance-related factors include a lack of standardization. The identified factors can potentially lead to intrinsic data quality challenges, such as incomplete, incorrect, and heterogeneous data. Although missing and incorrect data are widely known data quality challenges for wearable devices, the heterogeneity of data is another aspect of data quality that should be considered for wearable devices. Heterogeneity in wearable device data exists at three levels: heterogeneity in data generated by a single person using a single device (within-person heterogeneity); heterogeneity in data generated by multiple people who use the same brand, model, and version of a device (between-person heterogeneity); and heterogeneity in data generated from multiple people using different devices (between-person heterogeneity), which would apply especially to data collected under a bring-your-own-device policy.
Conclusions: Our study identifies potential intrinsic data quality challenges that could occur when analyzing wearable device data for research and three major contributing factors for these challenges. As poor data quality can compromise the reliability and accuracy of research results, further investigation is needed on how to address the data quality challenges of wearable devices.
Emerging Biomedical Data—Person-Generated Wearable Device Data
With the recent movement toward people (patient)-centered care and the widespread routine use of devices/technologies, person-generated health data (PGHD) have emerged as a promising data source for biomedical research . A survey conducted in 2019 reported that 38% of Americans currently use technologies such as mobile apps or wearables to track their health data, and 28% have used them in the past [ ]. Examples of PGHD include data collected passively through sensors, such as step count, heart rate, and sleep quality; data entered directly by people, such as diet, stress levels, and quality of life; and social or financial information that is not specifically health related but could potentially provide health-related insights [ ]. Among the different PGHD, data generated through wearable devices are unique in that they are passively, continuously, and objectively collected in free-living conditions; such data are different from those generated through other technologies that require the manual input of data (eg, dietary tracking mobile apps) [ - ]. Therefore, person-generated wearable device data are becoming a valuable resource for biomedical researchers to provide a more comprehensive picture of the health of individuals and populations.
Use of Person-Generated Wearable Device Data for Research Purposes
There are two ways to use wearable device data for research purposes. Typically, researchers collect wearable device data for a specific research by recruiting eligible participants and asking them to use the device for a certain period. For example, Lim et al  issued Fitbit devices to 233 participants and asked them to use the device for 5 days. Collecting data with this traditional method can be beneficial in that people can collect data that fits their needs, but it can be costly to recruit and follow a large number of participants for an extended period.
Researchers can also reuse existing data, which is a timely and cost-effective way to conduct research. Previous studies have used existing wearable device data collected for other research studies for their own research [, ]. For example, McDonald et al [ ] used a data set collected as part of the SingHEART/Biobank study to investigate the association between sleep and body mass index. In addition, Cheung et al [ ] used data collected from a study by Burg et al [ ] to develop a novel methodology to reduce the dimension of data while maintaining core information.
More recently, real-world wearable device data collected through routine use of devices have been reused for research purposes [, , ]. For example, the All of Us research program, which is the precision medicine initiative launched by the National Institutes of Health (NIH), initiated a Fitbit Bring-Your-Own-Device project, which allows participants to connect their Fitbit account to share data, such as physical activity, sleep, and heart rate [ ]. In addition, multiple studies have shown the potential of routinely collected wearable device data for use in large-scale longitudinal multinational studies. Menai et al [ ] used Withings Pulse activity tracker data of 9238 adults from 37 countries collected from 2009 to 2013 to examine the association between step counts and blood pressure. Kim et al [ ] used data of more than 50,000 individuals from 185 countries collected over a month, with nearly 17 million measurements generated by Nokia Health Wireless blood pressure monitors to characterize blood pressure variability. These studies underscore the potential secondary uses of person-generated wearable device data for generating health insights from large real-world population that might not have been possible using traditional methods of data collection. Furthermore, the studies demonstrate how wearable device data add value by expanding the scope of biomedical research that can be conducted, which would not have been feasible if relying on electronic health record (EHR) data alone.
Data Quality Challenges in the Use of Person-Generated Wearable Device Data
Data used in research studies, even data originally collected to support research, may not meet the ideal level of quality [, , ]. For instance, data collected daily through consumer wearables are meant to be used for routine use of devices rather than for research. Therefore, although the quality of collected data may be sufficient for an individual’s health management, it may be insufficient for research purposes. Hicks et al [ ] presented the best practices for reusing large-scale consumer wearable device data that were collected through routine use. The study describes challenges with data quality, such as missing data or inaccuracy of sensor data, as these data are collected from individuals through their daily use of wearables (not through a research study). Thus, as recommended for the use of any data set, the study recommends assessing the quality of wearable device data set before conducting research. Once the research question and data set to be analyzed are identified, it is important to assess its fitness-for-use to ensure that it would produce valid analytical results that answer the research question [ ].
There have been previous efforts to understand the data quality challenges for wearable device data. For example, Codella et al  identified the data quality dimensions that influence the analysis of PGHD. The concerns and expectations of PGHD stakeholders were identified through a literature review and mapped to the relevant data quality dimensions of an established framework [ ]. However, the review does not systematically provide the steps of how they screened and selected the literature and what information they extracted within the studies. Another systematic review by Abdolkhani et al [ ] identified factors influencing the quality of medical wearable device data and their corresponding dimensions from the literature. However, this review did not include literature on data from nonmedically approved wearables (eg, consumer wearable devices). As such, there is a research gap in understanding data quality challenges that arise from consumer wearables, specifically those from passively collected data, as there might be unique quality challenges associated with these types of data.
While assessing data quality, having a full understanding of the types of data quality challenges and the factors associated with them can be useful in implementing additional analytic procedures to ameliorate potential negative impacts or false conclusions. However, one of the barriers is that there is a lack of studies investigating the data quality challenges of wearable device data specifically for research purposes. Therefore, this study aims to (1) identify factors influencing the quality of person-generated wearable device data and potential intrinsic data quality challenges (data quality in its own right or, in other words, data quality challenges inherent to the data itself) for research, and (2) discuss implications for the appropriate use of person-generated wearable device data for research purposes based on the findings .
Data Sources and Search Strategy
We performed a rapid review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines. The literature search was conducted in four scholarly databases (PubMed, Association for Computing Machinery [ACM] Digital Library, Institute of Electrical and Electronics Engineers [IEEE], and Google Scholar) in June 2019. In PubMed, we used a combination of MeSH terms and keywords related to wearable devices and data quality. Terms related to mobile health were not searched because they include mobile apps or telemedicine, although the scope of this review focused specifically on passively collected data through wearable devices. The search results were limited to studies published within the past 5 years, studies conducted with human species and studies written in English language. The search was limited to 2014 onward because the characteristics of devices may change with advances in technologies, and this may result in changes in data quality challenges. Thus, the search was focused on recent publications using the year with the largest increase in the emergence of new consumer fitness trackers as a heuristic cutoff for determining recent studies . The publications were sorted by best match, which is appropriate for searching studies that meet the informational needs on a topic [ ].
In the ACM Digital Library and IEEE Xplore Digital Library, we used a query that combined search terms related to data quality and wearable devices. The search results were limited to studies published since 2014. To complement the search results from the 3 scholarly databases, we performed an additional literature search on Google Scholar. In total, 4 searches were conducted using different queries. The search excluded patents and citations, examined studies published since 2014, and sorted the results by relevance. Although all of the search results were reviewed for other scholarly databases, only the first 100 results for each of the 4 queries in Google Scholar were reviewed. To prevent the filter bubble effect, which customizes search results based on the search history of users, Google accounts were logged out when conducting the literature search . The full query used in each database can be found in Table S1 in .
Inclusion criteria were as follows: (1) papers that contained content on the data quality of wearable devices or sensor data; (2) papers that demonstrated the scope of wearable devices, including devices such as fitness trackers, sleep monitors, continuous glucose monitors, and remote blood pressure trackers; (3) papers on research-grade and consumer-grade devices; and (4) not only peer-reviewed studies, but also conference proceedings and book chapters to expand the search space.
Although smartphones can passively collect health data, studies that exclusively focused on smartphones were excluded, as they are not worn on the body. In addition, as we were interested in passively collected person-generated wearable device data being used for research, studies were excluded if (1) the study was on wearable device data that were generated by providers in a clinical setting (eg, device being used for clinician or surgical training), (2) the study was on wearable device data being used for clinical care of patients, and (3) the study was on data that were manually recorded (eg, food logging by user). Device validation studies such as testing the accuracy, reliability, or validity of the device were also excluded, as those studies were about testing the accuracy of the device rather than conducting analyses on data.
One reviewer (SC) screened the retrieved literature based on the title and abstract. After filtering based on titles and abstracts, the full text of the remaining studies was reviewed based on the same selection criteria by two reviewers (SC and KN). The reviewers discussed any discrepancies to reach a consensus on the final set of studies. The literature selection process was conducted using Covidence (Veritas Health Innovation), which is a web-based systematic review production tool.
Data Extraction and Categorization
Overall, two reviewers (SC and KN) examined the papers to extract sentences about the factors affecting data quality. Although our focus was on wearable device data, sentences that apply to both mobile app and wearable device data were extracted as long as the content did not exclusively apply to mobile app data. The reviewers extracted the sentences and annotated the relevant factors. In addition, intrinsic data quality challenges associated with those factors were extracted if any were mentioned. Microsoft Excel was used to manage qualitative data. Codes were assigned to phrases that indicated factors influencing data quality by 1 reviewer (SC). Coded concepts were reviewed, and semantically similar concepts were consolidated into the same category. The categories were iteratively refined to derive core categories. The categories were then iteratively reviewed by domain experts (one data quality expert [KN] and one wearable device expert [IE]) to refine and validate the results. Domain experts commented on whether they agreed with the categorization and names used for each category. The discussion continued until a consensus among the reviewers and domain experts was reached.
Literature Search and Selection Results
A total of 1290 publications were retrieved for screening. Among the retrieved publications, 139 duplicates were removed, leaving 1151 unique publications to be screened by title and abstract. The screening of titles and abstracts resulted in 131 studies after removing 1020 publications that did not meet the eligibility criteria. The full texts of the remaining 131 publications were reviewed. After removing 112 irrelevant publications, 19 studies remained. The literature selection process is depicted in, and a summary of the included studies can be found in Table S2 in .
Data Extraction and Categorization Results
Some extracted sentences were specifically related to wearable device data. For instance, sentences within a study by Wright et al  describe the challenges associated with using consumer fitness trackers in biomedical research:
The algorithms used in consumer physical activity monitors to determine steps taken, distance traveled, and energy expenditure are typically not shared with researchers due to proprietary concerns.
On the other hand, there were sentences that could apply to both wearable devices and mobile apps. For example, Bietz et al  examined data quality challenges of routine use of devices data and explicitly stated the challenges that researchers face:
Researchers also reported being concerned with the kinds of data they may get from companies, including the lack of standardization, potential problems with proprietary algorithms, and that most of the consumer-level health devices have not gone through a validation process.
Not all concerns regarding wearable device data were extracted from these studies. For example, Bietz et al  mentioned selection bias, which was not extracted, as we believe that bias is not an intrinsic data quality challenge but is a byproduct of data quality and a universal challenge to research design:
A related concern is the potential bias in PGHD that derives from who uses personal health devices and who does not.
After 5 iterations of categorizing the factors influencing data quality with domain experts, 3 broad categories emerged, which are summarized in. The mappings between the factors and the intrinsic data quality challenges are presented in .
Factors influencing data quality and the themes identified in selected literature.
Device- and technical-related factors
- Hardware issues [
- Malfunction [ , - ]
- Quality of sensor [ , , , - ]
- Sensor degradation over time [ ]
- Device update makes older models outdated [ ]
- Limited storage space [ ]
- Software issues [
, , , , , ]
- Quality (accuracy) of algorithm [ , , ]
- Proprietary algorithm or system [ , , , ]
- Wearable device companies change and update their algorithms [ ]
- Software updates may change settings to default setting or affect data [ ]
- Network and Bluetooth issues [
- , , ]
- Lost satellite connection [ , , , , ]
- Delay and error in synchronization and data upload [ , , , ]
- User nonwear [
, , , , , , ]
- Forget to wear [ , ]
- Nonwear during battery charging [ , , , , ]
- User’s health condition prevents device use [ ]
- Discomfort of wearing the device [ , ]
- Unsatisfied with the appearance of device [ ]
- User’s lifestyle or not wearing for certain everyday activities [ ]
- Concerns over privacy and security of data [ ]
- Poor usability experience [ ]
- User error [
, - , , , ]
- Device not synced by users [ ]
- Poor calibration of the device [ ]
- Quality of skin contact [ ]
- Misplacement of device on the body [ , , ]
Data governance-related factors
- Lack of standardization [
, , , , , ]
- No industry standards for data formats, range of values, and sample rates [ , - ]
- Different devices use different algorithms for the same variable [ , , ]
- Different type or placement of sensors on the body for the same variable [ ]
- Different data definition for the same variable [ , ]