European Portuguese Version of the User Satisfaction Evaluation Questionnaire (USEQ): Transcultural Adaptation and Validation Study

Background: Wearable activity trackers have the potential to encourage users to adopt healthier lifestyles by tracking daily health information. However, usability is a critical factor in technology adoption. Older adults may be more resistant to accepting novel technologies. Understanding the difficulties that older adults face when using activity trackers may be useful for implementing strategies to promote their use. Objective: The purpose of this study was to conduct a transcultural adaptation of the User Satisfaction Evaluation Questionnaire (USEQ) into European Portuguese and validate the adapted questionnaire. Additionally, we aimed to provide information about older adults’ satisfaction regarding the use of an activity tracker (Xiaomi Mi Band 2). Methods: The USEQ was translated following internationally accepted guidelines. The psychometric evaluation of the final version of the translated USEQ was assessed based on structural validity using exploratory and confirmatory factor analyses. Construct validity was examined using divergent and discriminant validity analysis, and internal consistency was evaluated using Cronbach α and McDonald ω coefficients. Results: A total of 110 older adults completed the questionnaire. Confirmatory factor analysis supported the conceptual unidimensionality of the USEQ (χ4=7.313, P=.12, comparative fit index=0.973, Tucker-Lewis index=0.931, goodness of fit index=0.977, root mean square error of approximation=0.087, standardized root mean square residual=0.038). The internal consistency showed acceptable reliability (Cronbach α=.677, McDonald ω=0.722). Overall, 90% of the participants reported excellent satisfaction with the Xiaomi Mi Band 2. Conclusions: The findings support the use of this translated USEQ as a valid and reliable tool for measuring user satisfaction with wearable activity trackers in older adults, with psychometric properties consistent with the original version. (JMIR Mhealth Uhealth 2021;9(6):e19245) doi: 10.2196/19245


Introduction
The use of mobile health (mHealth) technology has greatly increased over the past decade. These technologies enable health promotion and self-monitoring of health-related behaviors [1][2][3] and show potential in disease treatment and prevention in a cost-efficient and widely accessible manner [4]. Currently, mHealth covers a wide range of technologies, such as wearable devices and smartphone apps, for tracking different types of health-related data including physical activity [4,5].
Activity trackers are sensor-based wearable devices that automatically track and monitor various indicators of physical activity (eg, steps, calories burned, and distance traveled), with some also able to record heart rate and sleep measures [2,[6][7][8]. The technology has the potential to help older adults with health self-management and self-efficacy by improving lifestyle behaviors and motivating compliance or attainment of daily activity goals [3,6,9]. Despite this potential, most older adults do not use activity trackers for their health-tracking needs. A possible explanation may be a matter of usability. Although it is a critical factor that can determine technology adoption, these devices have been mainly developed for a younger target group; thus, older adults may have difficulties due to usability barriers [2,9].
Published by the International Organization for Standardization, ISO-9241-11 defines usability in terms of effectiveness, efficiency, and user satisfaction rating of a product in a specific environment, by a specific user, for a specific purpose [10,11]. Accordingly, user satisfaction can be thought of as a usability component, but it cannot be evaluated in the same manner as efficiency and effectiveness. Satisfaction is the user's attitude toward the system they use, affecting behavior intention for continuous or future use [12][13][14]. Moreover, satisfaction depends on how comfortable the user feels using the system [10,11,14,15]. Despite the importance of user satisfaction to ensure usability and the improved development of mHealth solutions, there is a gap in research assessing user satisfaction with mHealth [4].
Currently, there are a few validated and widely used questionnaires to collect targeted user feedback for the evaluation of a system's usability. These include the System Usability Scale (SUS) [15], the Post-Study System Usability Questionnaire (PSSUQ) [10,[16][17][18], and the Usefulness, Satisfaction and Ease of Use (USE) questionnaire [10,17,19]. Of these, only the SUS [20] and the USE [21] questionnaires are available in European Portuguese for generalized evaluation of usability. Regarding user satisfaction questionnaires, a study by Melin et al [4] presented the development of the mHealth Satisfaction Questionnaire, the Questionnaire for User Interaction Satisfaction (QUIS) [22], and the User Satisfaction Evaluation Questionnaire (USEQ). Nonetheless, despite this, there is no validated questionnaire for measuring user satisfaction with technologies available in European Portuguese.
Therefore, this study aimed to provide a valid questionnaire to specifically evaluate user satisfaction with technologies in Portuguese older adults. For this, the USEQ was selected since it is a short but comprehensive questionnaire with a reasonable number of questions; importantly, it is clear and easy to understand [23]. We also aimed to evaluate user satisfaction with the Xiaomi Mi Band 2 among older adults. Regardless of spoken language, and despite a growing number of studies on subjective experiences of user satisfaction with and the usability and usefulness of wearable technologies, only a few studies have focused on older adults [3,8,24]. Understanding whether a cohort of older adults is satisfied with these activity trackers is important to ensure devices can be successfully implemented in clinical and research settings [25].

Methods
The study was divided into two phases. Phase 1 addressed the translation of the USEQ questionnaire into European Portuguese and its cross-cultural adaptation. Phase 2 involved the assessment of the USEQ psychometric properties and its validation in the new context.

Translation and Cultural Adaptation of the USEQ
The USEQ is composed of 6 questions and uses a 5-point Likert scale for responses. The total score ranges from 6 (poor satisfaction) to 30 (excellent satisfaction). All questions are affirmative, except question 5, which is a negatively posed question. The numerical value of the affirmative questions is used to calculate the score. The negative question subtracts the numerical value of the response from 6 and then adds this result to the total score [23]. Since the USEQ was designed to evaluate user satisfaction with virtual rehabilitation systems, the questionnaire comprises one item that specifically measures the perceived usefulness of using the technology for rehabilitation. Thus, in this study, we adapted this item to include an item that could be applied to general-purpose systems or across different types of mHealth.
The adapted English version of the USEQ was culturally and linguistically adapted to European Portuguese after obtaining formal authorization from the original author (Gil-Gómez [23]). The process of translation followed the general guidelines provided by Lenz et al [26] and the World Health Organization [27]. Briefly, it comprised the following steps: forward translation, translation review and reconciliation of content, back translation, preliminary version, pretesting, and final version (Multimedia Appendix 1). In step 1 (forward translation), the USEQ questionnaire was translated to European Portuguese by two independent English-proficient translators, whose native language is European Portuguese. In step 2 (translation review and reconciliation of content), the independent translations were reviewed by both forward translators as well as by an independent team member. A reconciliation version of the document was obtained. In step 3 (back translation), the reconciled version was translated from European Portuguese into English by independent translators fluent in both languages, who were blinded to the original USEQ version of the document. The retroversion was done as a quality control step and to verify that both versions were equivalent. In step 4 (preliminary version), all team members performed a comparative analysis between the back-translation and original version. A preliminary version was prepared after items were reviewed and a consensus was reached. Finally, in step 5 (pretesting), the preliminary version of the document was tested in a pilot study with a sample of 20 independent and representative individuals selected from those who will be administered the final questionnaire. The individuals were asked to provide feedback on their understanding of the questions, the preferred use of alternative words for a given expression, terms deemed unacceptable or offensive, and their opinion of the questionnaire. The information collected was used to improve and develop the final version of the USEQ.

Psychometric Properties and Validation of the USEQ
The psychometric validation of the final version of the USEQ was based on real data collection after the users' experience with the wearable activity tracker. The analyses included assessment of structural validity, construct validity, and internal consistency.
To provide evidence of construct validity, the participants' responses on the USEQ were correlated with the pre-existing instruments that measure similar concepts-the SUS and the technology acceptance model 3 (TAM3; convergent validity). For divergent validity, the participants' responses on the USEQ were correlated with Mini-Mental State Examination (MMSE) scores [28]. Briefly, the SUS is the most widely used standardized questionnaire to measure perceived usability [10,15,29,30], while the TAM is the most applied theoretical model for evaluating or predicting users' acceptance of new technologies [31]. Lastly, the MMSE is a widely accepted questionnaire to assess cognitive function in older adults [32].

Participants and Data Collection
Following the application of inclusion and exclusion criteria, a total of 120 community-dwelling older adults (aged 64-75 years) from Northern Portugal were recruited to the study. The primary exclusion criteria were an inability to understand informed consent and neuropsychiatric and neurodegenerative disorders. Among the recruited participants, a final sample of 110 participants completed the usability test of the wearable activity tracker.
A baseline characterization was performed through a sociodemographic questionnaire and a standardized clinical interview. Moreover, since individual differences, including demographics, cognitive state, and emotional state influence individuals' perceptions regarding the technology [33], we included a neuropsychological evaluation to obtain mood (Geriatric Depression Scale [GDS]) [34] and global cognitive profiles (MMSE) [35].
For usability testing, the Xiaomi Mi Band 2 was selected among several commercially available wearable activity trackers, since it is ergonomic, accessible, easy to operate, and offers the best price-quality ratio. The device provides general health monitoring, combining sensors that allow objective assessment of activity levels, heart rate, and sleep patterns [5,36,37]. The participants used a Xiaomi Mi Band 2 over 15 days while performing their normal daily activities. They were instructed to wear the activity tracker continuously. After concluding the usage testing period, participants were asked to provide information about their experience. This was attained through application of the USEQ [23] to evaluate user satisfaction, the TAM 3 [38] to collect information about technology acceptance, and the SUS for perceived usability [20].

Statistical Analysis
Data analysis was performed using IBM SPSS Statistics (version 26; IBM Corp), and JASP (version 0.11.1). Descriptive statistics (mean, median, standard deviation, minimum, maximum, skewness, and kurtosis) were calculated for each variable. Normality was considered adequate if absolute values for skewness and kurtosis were above 2.0 and 7.0, respectively [39,40].

Structural Validity
The creators of the original scale analyzed the factor structure through principal component analysis (PCA) [23]. Results indicated two components with an eigenvalue greater than 1. The first component had all six items and explained 43% of the variance, and the second component had only four items (items 1, 4, 5, and 6), only two of which had factor loadings greater than 0.5. Therefore, after the analysis of the scree plot, they considered a one-factor solution explaining 42.9% of the variance to be appropriate.
Before conducting exploratory factor analysis (EFA) with our data, the "parameters" R package was used to decide the number of factors to extract. The solution for one dimension was supported by 6 (42.9%) methods of 14 (acceleration factor, standard error scree, Tucker-Lewis index [TLI], root mean square error of approximation [RMSEA], adjusted root mean square residual, and Bayesian information criterion) [41]. As PCA is a data reduction technique that does not conduct to a latent variable model, principal axis factoring was used to extract the latent factor.
Variables with factor loadings above 0.4 were extracted. Before the analysis, the appropriateness of the data for factor analysis was examined using the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy and Bartlett test of sphericity [42,43]. Regarding the sample size for the EFA, there is no consensus on the number of participants required to perform the analysis [44,45]. Hatcher et al [46] recommend a minimum subject to item ratio of at least 5:1, with a minimum of 100 subjects. The number of components was determined by Kaiser criteria (retaining factors with eigenvalues greater than 1), scree plot inspection [43], and parallel analysis [47].
Subsequently, confirmatory factor analysis (CFA) was performed using maximum likelihood estimation. The factor loadings were used as local indices of goodness of fit as well as the following goodness-of-fit indices and thresholds for a good fit: chi-square test (χ 2 ; P>.05), chi-square value divided by degrees of freedom (a ratio of ≤3), comparative fit index (CFI≥0.90), Tucker-Lewis index (TLI≥0.90), goodness of fit index (GFI ≥0.90), root mean square error of approximation (RMSEA<0.08) and standardized root mean square residual (SRMR≤0.08) [48][49][50].
Ideally, a CFA should be performed in a subsequent study using another sample (validation sample) [51]. However, due to having an insufficient sample size to perform the analysis in two subsamples, the same sample was used for both approaches (EFA and CFA).

Internal Consistency
Internal consistency was assessed using both Cronbach α and McDonald ω coefficients [52,53]. Cronbach α is the most widely used measure of reliability. However, it overestimates the true composite reliability and is negatively biased when used to measure the reliability of ordinal variables [52,54]. Given the ordinal response format of USEQ items, McDonald ω was used to overcome these limitations, providing a more accurate approximation of the scale reliability [55]. Cronbach α and McDonald ω coefficients over .70 are considered indicators of satisfactory item homogeneity [55,56]. Item-total and inter-item correlations were analyzed, considering cutoff values over . 30 and under .70, respectively [57].

Construct Validity: Convergent and Divergent Validity
Convergent and divergent validity were estimated using Spearman correlation (r). It was hypothesized a priori that the USEQ score would be positively correlated with the SUS score and TAM 3 score, while correlation was not expected with the MMSE score. A coefficient of r=0.3 is assumed to provide evidence of convergent and divergent validity [58,59].

USEQ Scores
Descriptive statistics and normality were assessed for the USEQ's total score. Correlations between USEQ scores and demographic as well as mood and global cognitive characteristics were estimated using Spearman correlation.

Ethics Statement
The study was conducted in accordance with the Declaration of Helsinki and was approved by the local and national ethics committees (approval number . The study goals and assessments were explained to potential participants. All participants provided written informed consent before study enrollment.

Study Participants
A total of 110 participants completed the USEQ questionnaire. Demographic, mood, and global cognitive characteristics of the sample are presented in Table 1. Participants had an average age of 68.41 (SD 3.11) years with a mean of 7.95 (SD 5.38) years of education. Study participants responded to all items of the USEQ. Descriptive statistics for USEQ items are presented in Table 2. Given the ordinal nature of the variables assessed, item distribution demonstrates some degree of nonnormality. In fact, most data collected in behavioral research does not follow univariate normal distributions [40,60]. For the USEQ total score, the results reveal acceptable values for both skewness (sk=-2.02) and kurtosis (k=3.96), showing no severe violation of normality. Regarding the individual items, the kurtosis value was not acceptable for item 1; thus, it was excluded from further analysis.

Structural Validity
An exploratory factor analysis was conducted on the 5 items of the USEQ. The Kaiser-Meyer-Olkin measure demonstrated adequacy for the analysis (KMO=0.629) as a value above 0.6 indicates an adequate sample size [43]. The Bartlett sphericity test (χ 2 10 =126, P<.001) indicated that the correlation between the items is sufficient to perform the analysis. The analysis showed that the one-factor solution explains 47% of the variance (Table 3) and comprises all items with factor loadings higher than 0.3. Our findings corroborated the decision of the original questionnaire authors, who considered a one-factor solution to be appropriate. Table 3. Factor matrix containing obliquely unrotated factor loadings of principal axis factoring (forcing one-factor solution). The eigenvalue and the percentage of variance explained by the factor are also shown.  [61]. The final model is presented in Figure 1.

Internal Consistency
Analysis of the internal consistency showed acceptable reliability (Cronbach α=.677; McDonald ω=0.722), indicating a reliability homogeneity of the items for the one-factor solution.
The corrected item-total correlation values ranged from 0.080 to 0.654, showing an adequate correlation of each item and suggesting adequate scale homogeneity. The inter-item correlations were all below 0.70, indicating nonredundancy of items (Table 5).

Discussion
Physical activity is associated with health benefits, a decreased burden of disease, and a decrease in all-cause mortality in adults [2]. Nowadays, wearable activity trackers provide the opportunity to increase physical activity levels through continuous monitoring [8], which may be especially beneficial for older adults. However, over 75% of the over-65 age group state that they require assistance to use new technologies [2]. Usability studies on wearable activity trackers are needed to better understand the barriers that older adults face when using these technologies. In a cohort of older adults, this study aimed to provide a valid questionnaire to evaluate user satisfaction using an activity tracker (Xiaomi Mi Band 2).
The results from the translation phase show that the items were easy to understand and that there were no semantic problems. Moreover, the translated items were considered equivalent to the original version. In particular, the USEQ was found to be a simple and easy-to-understand questionnaire with an appropriate number of questions to apply in older populations. Validity evidence was obtained with 5 questions, maintaining the original one-factor structure. Similar to the original study by Gil-Gómez et al [23], reporting that the one-factor solution explained about 43% of the variance, here an approximate 47% of the variance was explained by the one-factor solution.
Confirmatory factor analysis showed acceptable fit indexes (χ 2  No demographic variables were found to be significantly correlated to user satisfaction with the device. However, a higher depressive mood, as evaluated by the GDS, was negatively associated with the satisfaction perceived by participants using the device. Similarly, a recent study investigating the impact of depressive symptoms on measures of web user experience found a significant association between depressive symptoms and subjective user experience [63]. These results indicate that mood may be a factor influencing technology usability and may warrant further guidance and/or targeted approaches in the use of these technologies by specific populations. Regarding user satisfaction with the Xiaomi Mi Band 2, the device achieved a score of 23.30 (SD 2.40) in the USEQ, demonstrating an excellent reported level of satisfaction and thus suggesting suitability for older adults. Furthermore, results showed that item 4 ("Is the information provided by the system clear?"), which is related to the perceived ease of use, yielded the lowest score. This result may suggest that older adults could have difficulties in understanding the information provided by the activity tracker, which should be noted by manufacturers. The perceived ease of use refers to the degree to which a person perceives how easy it is to use the technology and is one of the primary factors that affect an individual's intention to use new technology [7,9,31]. This kind of difficulty is especially interesting considering the age of the participants enrolled in the study. Older adults tend to perceive technologies as difficult to use due to usability problems related to poor memory, decreased vision, and poor literacy [64], but this may not necessarily be the case for all older individuals. Thus, results should be interpreted with caution and future studies should include cohorts with different characteristics (for instance, higher school levels or those that have been [early] adopters of different types of technologies).
Concerning limitations, the study was conducted using a convenience sample; therefore, the participants may not represent the entire older population. Moreover, if the sample used in this study is more homogenous than the wider population on the common factors, this can lead to attenuation in correlations and can influence the strength or bias of correlations among variables [51]. Future studies should have a larger sample, and it would be beneficial to the study to maximize variance on measured variables relevant to the constructs of interest [51]. It would also be of value to evaluate the long-term use of this device and motivations for long-term use. Moreover, studies combining quantitative and qualitative methods, such as interviews, would also be valuable to explore older adults' perceptions and experiences, and to provide details about user behaviors, user needs, and specific problems that quantitative measures cannot address. Finally, further user satisfaction studies of older adults using activity trackers should include other devices. This would ensure that such devices can be effectively implemented in clinical and research settings to promote physical activity.
In conclusion, the European Portuguese version of the USEQ has adequate psychometric properties consistent with the original version, supporting its use as a valid and reliable tool for measuring user satisfaction in older adults. Furthermore, we adapted USEQ to a generic questionnaire for user satisfaction that can be used with several mHealth technologies, including smartphones, patient monitoring devices, tablets, mobile health apps, personal digital assistants, and other wireless devices. Finally, this study has contributed to the currently available and growing body of information on the usability of wearable technologies among older adults.