This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mHealth and uHealth, is properly cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright and license information must be included.
Advances in voice technology have raised new possibilities for apps related to daily health maintenance. However, the usability of such technologies for older users remains unclear and requires further investigation.
We designed and evaluated two innovative mobile voice-added apps for food intake reporting, namely voice-only reporting (VOR) and voice-button reporting (VBR). Each app features a unique interactive procedure for reporting food intake. With VOR, users verbally report the main contents of each dish, while VBR provides both voice and existing touch screen inputs for food intake reporting. The relative usability of the two apps was assessed through the metrics of accuracy, efficiency, and user perception.
The two mobile apps were compared in a head-to-head parallel randomized trial evaluation. A group of 57 adults aged 60-90 years (12 male and 45 female participants) was recruited from a retirement community and randomized into two experimental groups, that is, VOR (n=30) and VBR (n=27) groups. Both groups were tested using the same set of 17 food items including dishes and beverages selected and allocated to present distinct breakfast, lunch, and dinner meals. All participants used a 7-inch tablet computer for the test. The resulting data were analyzed to evaluate reporting accuracy and time efficiency, and the system usability scale (SUS) was used to measure user perception.
For eight error types identified in the experiment, the VBR group participants were significantly (
Experimental results showed that VOR outperformed VBR, suggesting that voice-only food input reporting is preferable for elderly users. Voice-added apps offer a potential mechanism for the self-management of dietary intake by elderly users. Our study contributes an evidence-based evaluation of prototype design and selection under a user-centered design model. The results provide a useful reference for selecting optimal user interaction design.
International Standard Randomized Controlled Trial Registry ISRCTN17335889; http://www.isrctn.com/ISRCTN17335889.
Older people are at increased risk of malnutrition [
Older adults typically experience reduced physical and physiological functioning that can increase the challenge involved in operating mobile apps, such as tapping buttons and scrolling down the screen [
We previously provided a proof of concept for a combinatorial approach of dietary recording that accounts for a wide range of dish variations [
We developed our voice-added design based on a user-centered design model [
The two apps were implemented in the Android operating system for use on 7-inch tablet computers. The VOR app allows users to simultaneously verbally report food names and food attributes, whereas the VBR app allows users to verbally report food names and then select food attributes by clicking the optional buttons. The Google speech cloud service (Google, Inc) was used for continuous speech recognition in both apps. The developed interfaces included senior-friendly design elements, such as bigger buttons and text, a simple layout, and high-contrast colors. Based on recommended design guidelines for seniors [
As shown in
Voice-only reporting operation of a dish with a single ingredient, using steamed rice as an example.
For dishes with two ingredients, the user verbally inputs the first ingredient followed by its associated food attribute, and then repeats the process for the second ingredient (
Voice-only reporting operation of a dish with two or three ingredients, using stir-fried broccoli with carrot as an example.
Using VBR for a dish with a single ingredient, users first verbally input the name of the dish (
Voice-button reporting operation of a dish with a single ingredient, using steamed rice as an example.
For dishes with two or three ingredients, the user begins by verbally inputting the first food ingredient and follows the first four steps (
Voice-button reporting operation for a dish with two or more ingredients, using stir-fried broccoli with carrot as an example.
A parallel two-group randomized trial was designed to evaluate the relative effectiveness of the two apps in terms of reporting accuracy, task time, and user acceptance. The study protocol was reviewed by the Ethics Committee of Chang Gung Memorial Hospital and received Institutional Review Board approval (201900324B0). Recruitment was conducted through notices placed on designated bulletin boards in the Chang Gung Health and Culture Village retirement community located in northern Taiwan. Registration, schedule arrangement, and collection of background information were conducted through an online form. Biographic data were used to allocate participants into the VOR or VBR group. Self-reported baseline information included gender, age, BMI, experience in nutrition education, use of nutrition-related apps, cooking experience, and experience using mobile phones/tablets. Eligible participants were (1) aged from 60 to 90 years and (2) capable of reading and operating the app on their mobile phone. Participants currently under any form of dietary control, currently engaged in deliberate weight loss, or following a vegetarian diet were excluded. The assessment was conducted in a public area inside the community.
Dishes and beverages for the experiment were selected under the supervision of a senior nutritionist. The dishes were typical local Asian and Western-style foods. Three set meals involving 17 food items were used to represent breakfast, lunch, and dinner. Each set meal contained five food items (ie, a staple food, a main course, a dish with two ingredients, a dish with three ingredients, and a beverage). These set meals were presented on life-size colored food-photo boards (30 cm × 42 cm; photographed from above). Following previous research [
The sample size was based on our previous experience of customized dietary recording [
A total of 57 senior participants were recruited and completed informed consent. SAS [
App evaluation flow using a randomized design. SUS: system usability scale.
The following three outcome types were assessed to evaluate the respective effectiveness of the two mobile apps for food reporting: accuracy, user operation time, and perception of efficacy.
An error was defined as the participant engaging in operating steps outside of those required to obtain the predefined answer. Possible error types of dish reporting were identified, and they have been described in the subsection “Error Types.” The rate of a specific error type was expressed as the error count divided by the total count. The error count was defined as the sum of participants with incorrect responses in the error type. The total count was calculated as the number of participants multiplied by the number of all dishes. In a specific error type, each participant’s reporting task might encounter more than one incorrect response, but was counted as one. The accuracy for error type was defined as the difference between the total count and error count divided by the total count.
The error types were derived thematically for data analysis [
For VOR and VBR, the operating duration covered the time from when the participant began to input a food item until the participant tapped the “complete” button on the screen. For the VOR group, the task duration was calculated from the time the participant clicked the “voice” button to begin speaking to the time the participant clicked the “complete” button (
The system usability scale (SUS) [
The experiment was carried out by two research assistants. Informed consent was explained to and obtained from each participant. All participants utilized the same hardware (ie, a 7-inch Android tablet). All participant trials were conducted on a single day. Each participant session was scheduled by appointment and implemented individually. Each participant was first trained by watching an instructional video demonstrating how the food reporting app could be used. The researchers then spent several minutes teaching each participant how to navigate the interface, to ensure familiarity with app operation and features. The experiment was arranged on the basis of a set meal, with each set meal involving a staple food, a main course, a dish with two ingredients, a dish with three ingredients, and a beverage. Having understood the meal concept, each participant conducted a “dry run,” which involved voice reporting of five food items (porridge, sausage, chicken egg, gluten with peanuts, and soy milk) on a photo board (
Respondents were asked to report three set meals (breakfast, lunch, and dinner). The first set meal, representing breakfast, featured boiled rice porridge, grilled pork sausage, stir-fried chicken egg, wheat gluten stewed with peanuts, and soy milk. The second set meal, representing lunch, featured steamed rice, deep-fried chicken, stir-fried broccoli with carrots, stir-fried tofu with green beans, stir-fried cabbage with bacon and black mushrooms, and green tea. The third set meal, representing dinner, featured fried noodles, pan-fried mackerel, stir-fried bitter melon with bell peppers and carrots, and tea with milk. The meal tests were performed in sequence, with a rest of 1 to 3 minutes between each test. The total test time for each participant took about 1 hour, beginning from when the participant first clicked the voice record button, according to the procedure shown in the “General Overview of the Approach” section. All participants completed the assessment.
The chi-square test and
Following the study by Bree and Galagher [
A total of 68 participants were registered. Of these, 57 participants were scheduled and completed the experiment (
Participant characteristics in the voice-only reporting and voice-button reporting groups.
Variables | Total (N=57), n (%) or mean (SD) | Voice-only reporting group (n=30), n (%) | Voice-button reporting group (n=27), n (%) | ||||||
|
|
|
|
.39 | |||||
|
Male | 12 (21%) | 5 (17%) | 7 (26%) |
|
||||
|
Female | 45 (79%) | 25 (83%) | 20 (74%) |
|
||||
|
|
|
|
.15 | |||||
|
≤64 | 7 (12%) | 6 (20%) | 1 (4%) |
|
||||
|
65-74 | 23 (40%) | 10 (33%) | 13 (48%) |
|
||||
|
≥75 | 27 (48%) | 14 (47%) | 13 (48%) |
|
||||
BMI (kg/m2)a | 22.55 (2.25) | 22.63 (2.37) | 22.45 (2.15) | .77 | |||||
|
|
|
|
>.99 | |||||
|
Junior high school | 12 (21%) | 6 (20%) | 6 (22%) |
|
||||
|
Senior high/vocational school | 11 (19%) | 6 (20%) | 5 (19%) |
|
||||
|
Bachelor’s degree | 28 (49%) | 14 (47%) | 14 (52%) |
|
||||
|
Master’s degree | 5 (9%) | 3 (10%) | 2 (7%) |
|
||||
|
Others | 1 (2%) | 1 (3%) | 0 (0%) |
|
||||
|
|
|
|
.40 | |||||
|
Yes | 18 (32%) | 8 (27%) | 10 (37%) |
|
||||
|
No | 39 (68%) | 22 (73%) | 17 (63%) |
|
||||
|
|
|
|
.58 | |||||
|
Yes | 17 (30%) | 8 (27%) | 9 (33%) |
|
||||
|
No | 40 (70%) | 22 (73%) | 18 (67%) |
|
||||
|
|
|
|
.60 | |||||
|
Yes | 54 (95%) | 29 (97%) | 25 (93%) |
|
||||
|
No | 3 (5%) | 1 (3%) | 2 (7%) |
|
||||
|
|
|
|
>.99 | |||||
|
Yes | 4 (7%) | 2 (7%) | 2 (7%) |
|
||||
|
No | 53 (93%) | 28 (93%) | 25 (93%) |
|
||||
|
|
|
|
.38 | |||||
|
Yes | 39 (68%) | 19 (63%) | 20 (74%) |
|
||||
|
No | 18 (32%) | 11 (37%) | 7 (26%) |
|
aAge and BMI data were analyzed with analysis of variance.
Overall accuracy comparison of error types in the voice-only reporting and voice-button reporting groups.
Error type (correct/incorrect)a | Total (N=57), n (%) | Voice-only reporting group (n=30), n (%) | Voice-button reporting group (n=27), n (%) | |||
|
|
|
|
|
||
|
Correct | 796 (99.7%) | 418 (99.5%) | 378 (100.0%) | .50 | |
|
Incorrect | 2 (0.3%) | 2 (0.5%) | 0 (0.0%) |
|
|
|
|
|
|
|
||
|
Correct | 792 (99.2%) | 416 (99.0%) | 376 (99.5%) | .69 | |
|
Incorrect | 6 (0.8%) | 4 (1.0%) | 2 (0.5%) |
|
|
|
|
|
|
|
||
|
Correct | 743 (93.1%) | 392 (93.3%) | 351 (92.9%) | .79 | |
|
Incorrect | 55 (6.9%) | 28 (6.7%) | 27 (7.1%) |
|
|
|
|
|
|
|
||
|
Correct | 766 (96.0%) | 416 (99.0%) | 350 (92.6%) | <.001 | |
|
Incorrect | 32 (4.0%) | 4 (1.0%) | 28 (7.4%) |
|
|
|
|
|
|
|
||
|
Correct | 794 (99.5%) | 416 (99.0%) | 378 (100.0%) | .13 | |
|
Incorrect | 4 (0.5%) | 4 (1.0%) | 0 (0.0%) |
|
|
|
|
|
|
|
||
|
Correct | 771 (96.6%) | 404 (96.2%) | 367 (97.1%) | .48 | |
|
Incorrect | 27 (3.4%) | 16 (3.8%) | 11 (2.9%) |
|
|
|
|
|
|
|
||
|
Correct | 759 (95.1%) | 420 (100.0%) | 339 (89.7%) | <.001 | |
|
Incorrect | 39 (4.9%) | 0 (0.0%) | 39 (10.3%) |
|
|
|
|
|
|
|
||
|
Correct | 775 (97.1%) | 409 (97.4%) | 366 (96.8%) | .64 | |
|
Incorrect | 23 (2.9%) | 11 (2.6%) | 12 (3.2%) |
|
aThree items in beverage were not counted as no error types were found. Fourteen out of the 17 food items were included.
b#1 Missing first food name/syllable(s): After verbal reporting, the presented answer list did not include the first food name or the first syllable(s) of the food names.
c#2 Missing last food name/syllable(s): After verbal reporting, the presented answer list did not include the last food name or the last syllable(s) of the food names after voice reporting.
d#3 No desirable choices: After verbal reporting, the presented answer list did not present the desired food name or cooking method.
e#4 Missing cooking method(s): After verbal reporting, the presented answer list did not include the desired cooking method(s).
f#5 Repeated pronunciations: The presented answer list showed repeated pronunciations of food names and/or food attributes after voice reporting.
g#6 Incorrect selections in the list: Participant had trouble accurately tapping the desired choice (click interaction), leading to incorrect selection in the answer list.
h#7 Did not select the ‘mix’ button: Trouble before dish completion (click interaction). The user did not tap the “mix” button to complete dishes with two or three ingredients.
i#8 Incorrect operations: Incorrect operation procedure.
The results are presented in terms of dish complexity (ie, number of ingredients) (
Accuracy comparison of each food item in the voice-only reporting and voice-button reporting groups.
Food item and error type | Total (N=57), n (%) | Voice-only reporting group (n=30), n (%) | Voice-button reporting group (n=27), n (%) | ||
|
|
|
|
||
|
|
|
|
|
|
|
|
#1a | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2b | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3c | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
#4d | 3 (5%) | 1 (3%) | 2 (7%) |
|
|
#5e | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#6f | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#7g | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#8h | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3 | 3 (5%) | 2 (7%) | 1 (4%) |
|
|
#4 | 6 (11%) | 0 (0%) | 6 (22%) |
|
|
#5 | 1 (2%) | 1 (3%) | 0 (0%) |
|
|
#6 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#7 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#8 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#4 | 2 (4%) | 0 (0%) | 2 (7%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#6 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#7 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#8 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
|
|
||
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3 | 2 (4%) | 1 (3%) | 1 (4%) |
|
|
#4 | 3 (5%) | 1 (3%) | 2 (7%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#6 | 2 (4%) | 1 (3%) | 1 (4%) |
|
|
#7 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#8 | 2 (4%) | 1 (3%) | 1 (4%) |
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#4 | 2 (4%) | 0 (0%) | 2 (7%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#6 | 3 (5%) | 2 (7%) | 1 (4%) |
|
|
#7 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#8 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3 | 2 (4%) | 0 (0%) | 2 (7%) |
|
|
#4 | 2 (4%) | 0 (0%) | 2 (7%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#6 | 3 (5%) | 0 (0%) | 3 (11%) |
|
|
#7 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#8 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3 | 10 (18%) | 6 (20%) | 4 (15%) |
|
|
#4 | 2 (4%) | 0 (0%) | 2 (7%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#6 | 1 (2%) | 1 (3%) | 0 (0%) |
|
|
#7 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#8 | 1 (2%) | 1 (3%) | 0 (0%) |
|
|
|
|
||
|
|
|
|
|
|
|
|
#1 | 1 (2%) | 1 (3%) | 0 (0%) |
|
|
#2 | 1 (2%) | 1 (3%) | 0 (0%) |
|
|
#3 | 19 (33%) | 12 (40%) | 7 (26%) |
|
|
#4 | 3 (5%) | 0 (0%) | 3 (11%) |
|
|
#5 | 2 (4%) | 2 (7%) | 0 (0%) |
|
|
#6 | 12 (21%) | 10 (33%) | 2 (7%) |
|
|
#7 | 14 (25%) | 0 (0%) | 14 (52%) |
|
|
#8 | 6 (11%) | 2 (7%) | 4 (15%) |
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
#4 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#6 | 2 (4%) | 0 (0%) | 2 (7%) |
|
|
#7 | 3 (5%) | 0 (0%) | 3 (11%) |
|
|
#8 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 1 (2%) | 1 (3%) | 0 (0%) |
|
|
#3 | 2 (4%) | 1 (3%) | 1 (4%) |
|
|
#4 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#6 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#7 | 7 (12%) | 0 (0%) | 7 (26%) |
|
|
#8 | 1 (2%) | 1 (3%) | 0 (0%) |
|
|
|
|
|
|
|
|
#1 | 1 (2%) | 1 (3%) | 0 (0%) |
|
|
#2 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#4 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#6 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
#7 | 5 (9%) | 0 (0%) | 5 (19%) |
|
|
#8 | 2 (4%) | 0 (0%) | 2 (7%) |
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
#3 | 7 (12%) | 2 (7%) | 5 (19%) |
|
|
#4 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#6 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#7 | 3 (5%) | 0 (0%) | 3 (11%) |
|
|
#8 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
|
|
||
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
#3 | 5 (9%) | 2 (7%) | 3 (11%) |
|
|
#4 | 4 (7%) | 2 (7%) | 2 (7%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#6 | 2 (4%) | 2 (7%) | 0 (0%) |
|
|
#7 | 5 (9%) | 0 (0%) | 5 (19%) |
|
|
#8 | 5 (9%) | 4 (13%) | 1 (4%) |
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 2 (4%) | 2 (7%) | 0 (0%) |
|
|
#3 | 3 (5%) | 2 (7%) | 1 (4%) |
|
|
#4 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
#5 | 1 (2%) | 1 (3%) | 0 (0%) |
|
|
#6 | 1 (2%) | 0 (0%) | 1 (4%) |
|
|
#7 | 2 (4%) | 0 (0%) | 2 (7%) |
|
|
#8 | 3 (5%) | 2 (7%) | 1 (4%) |
|
|
|
|
||
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#4 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#4 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
|
|
|
|
|
|
#1 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#2 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#3 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#4 | 0 (0%) | 0 (0%) | 0 (0%) |
|
|
#5 | 0 (0%) | 0 (0%) | 0 (0%) |
a#1 Missing first food name/syllable(s): After verbal reporting, the presented answer list did not include the first food name or the first syllable(s) of the food names.
b#2 Missing last food name/syllable(s): After verbal reporting, the presented answer list did not include the last food name or the last syllable(s) of the food names after voice reporting.
c#3 No desirable choices: After verbal reporting, the presented answer list did not present the desired food name or cooking method.
d#4 Missing cooking method(s): After verbal reporting, the presented answer list did not include the desired cooking method(s).
e#5 Repeated pronunciations: The presented answer list showed repeated pronunciations of food names and/or food attributes after voice reporting.
f#6 Incorrect selections in the list: Participant had trouble accurately tapping the desired choice (click interaction), leading to incorrect selection in the answer list.
g#7 Did not select the ‘mix’ button: Trouble before dish completion (click interaction). The user did not tap the “mix” button to complete dishes with two or three ingredients.
h#8 Incorrect operations: Incorrect operation procedure.
These food items included three staple foods and four main courses. The VOR group featured fewer “missing cooking method(s)” errors (n=2) than the VBR group (n=18). The two groups showed similar results for error types #3 and #6. Both groups showed elevated error rates for error type #3 for pan-fried mackerel (n=6 in the VOR group; n=4 in the VBR group). In the VBR group, the incidence of error type #4 was higher for steamed white rice (n=6), but low for boiled rice porridge (n=2), stir-fried noodle (n=2), grilled pork sausage (n=2), stir-fried chicken egg (n=2), and deep-fried chicken leg (n=2). The incidences of other error types were relatively low.
Five dishes included two ingredients. In the VOR group, error type #3 was more frequent for stewed wheat gluten with peanuts (n=12, 40%) and error type #6 was more frequent for stewed wheat gluten with peanuts (n=10, 33%). In the VBR group, error type #7 was more frequent for stewed wheat gluten with peanuts (n=14, 52%), stir-fried tofu with green bean (n=7, 26%), and stir-fried chicken egg with tomato (n=5, 19%). In the VBR group, the frequency of error #3 was also relatively high for stir-fried bitter melon with salted duck egg (n=5, 19%). The incidences of other error types were relatively low in both groups.
Three dishes were tested. In both the VOR and VBR groups, error type #3 occurred for two dishes, that is, stir-fried cabbage with bacon and black fungus (n=2 and n=3, respectively) and stir-fried dry bean curd with bell pepper and carrot (n=2 and n=1, respectively). Error type #7 had a higher incidence in the VBR group for stir-fried cabbage with bacon and black fungus (n=5), while error type #8 occurred frequently in the VOR group for stir-fried cabbage with bacon and black fungus (n=4). The incidences of other error types were relatively low in both groups.
Reporting time in the voice-only reporting and voice-button reporting groups.
Food item | Reporting time (s) | ||||
Total (N=57), mean (SD) | Voice-only reporting group (n=30), mean (SD) | Voice-button reporting group (n=27), mean (SD) | |||
|
|
|
|
|
|
|
Boiled rice porridge | 20.44 (16.20) | 10.50 (4.57) | 31.49 (17.36) | <.001 |
|
Steamed rice | 14.37 (8.19) | 10.11 (4.98) | 19.11 (8.53) | <.001 |
|
Stir-fried noodle | 14.20 (10.75) | 8.67 (3.45) | 20.35 (12.69) | <.001 |
|
|
|
|
|
|
|
Grilled pork sausage | 26.39 (38.58) | 12.20 (6.70) | 42.15 (51.63) | .006 |
|
Deep-fried chicken egg | 16.54 (10.47) | 11.46 (8.82) | 22.17 (9.32) | <.001 |
|
Fried chicken leg | 16.80 (13.53) | 8.99 (2.97) | 25.48 (15.36) | <.001 |
|
Pan-fried mackerel | 20.24 (12.69) | 15.01 (11.23) | 26.06 (11.81) | <.001 |
|
|
|
|
|
|
|
Stewed wheat gluten with peanuts | 51.73 (32.09) | 42.38 (28.05) | 62.11 (33.58) | .02 |
|
Stir-fried broccoli with carrot | 36.20 (36.30) | 12.68 (4.17) | 62.34 (38.34) | <.001 |
|
Stir-fried tofu with green bean | 30.98 (27.86) | 10.80 (4.90) | 53.41 (25.56) | <.001 |
|
Stir-fried chicken egg with tomato | 32.32 (28.73) | 12.32 (6.32) | 54.55 (27.54) | <.001 |
|
Stir-fried bitter melon with salted duck egg | 33.90 (32.36) | 12.39 (5.11) | 57.80 (33.16) | <.001 |
|
|
|
|
|
|
|
Stir-fried cabbage with bacon and black fungus | 44.39 (35.08) | 21.23 (19.68) | 70.13 (30.20) | <.001 |
|
Stir-fried dry bean curd with bell pepper and carrot | 41.81 (33.76) | 16.62 (8.42) | 69.80 (28.82) | <.001 |
|
|
|
|
|
|
|
Soymilk | 10.82 (7.20) | 9.91 (6.31) | 11.86 (8.11) | .31 |
|
Green tea | 8.76 (2.66) | 8.86 (3.07) | 8.66 (2.16) | .78 |
|
Milk tea | 9.64 (6.26) | 10.81 (8.35) | 8.35 (1.84) | .13 |
In the VOR group, the operation time ranged from 8 to 15 seconds per task, with pan-fried mackerel taking the longest time (mean 15.01, SD 11.23 s). In the VBR group, the operation time ranged from 19 to 41 seconds per task (mean 26.70 s), with grilled pork sausage taking the longest time (mean 42.15, SD 51.63 s). On average, the performance of the VOR group was roughly twice that of the VBR group.
In the VOR group, four of the five dishes took 11 to 13 seconds, while stewed wheat gluten with peanuts took over 42 seconds. In the VBR group, the operation time ranged from 50 to 60 seconds, with stewed wheat gluten with peanuts taking over 60 seconds.
The operation time in the VOR group ranged from 16 to 23 seconds, as opposed to 67 to 68 seconds in the VBR group.
Both groups showed similar reporting operation time performance for beverages, with the VOR group taking 9 to 11 seconds per task, as opposed to 9 to 12 seconds in the VBR group.
System usability scale and subjective perception in the voice-only reporting and voice-button reporting groups.
Scorea,b,c | Voice-only reporting group (n=30), mean (SD) | Voice-button reporting group (n=27), mean (SD) | |
Overall score | 83.80 (9.49) | 80.44 (10.25) | .20 |
Usability score | 83.58 (9.57) | 81.57 (9.69) | .43 |
Learnability score | 84.67 (14.56) | 75.93 (20.24) | .06 |
aQuestionnaires were presented in Chinese.
bThe mean score of the system usability scale with adjective ratings were as follows: 35.7 (“poor”), 50.9 (“ok”), 71.4 (“good”), and 85.5 (“excellent”).
cThe questionnaire’s Cronbach α for voice-only reporting (α=.77) and voice-button reporting (α=.78) exceeded .70, indicating good internal consistency and reliability.
Two different voice-reporting designs were compared to investigate their respective effectiveness for food reporting among elderly users. VOR was designed to use verbal inputs for food names and attributes. VBR was operated through a sequential process of voice input and button tapping to report dietary intake. Experimental results showed the respective advantages and disadvantages of the two design concepts for authentic food reporting by older people. Our evidence-based findings provide insights into the relative usability of voice input in the food intake reporting context. The implications of these findings are discussed below, along with suggestions for further system improvement through the integration of voice input in the mobile health domain.
The eight error types identified in this research provide a useful reference for potential types of errors that will be encountered in voice-enabled user dietary intake interactions. The better performance of VOR for error types #4 and #7 indicates that VOR has the potential to provide greater accuracy in food reporting. Participants experienced error type #3 in both the VOR and VBR groups. This error is related to phoneme and syllable-based speech recognition issues, and food names or food attributes with similar phonemes tend to have lower recognition accuracy. For instance, in the VOR and VBR groups, error type #3 was most prevalent for “stewed wheat gluten with peanuts,” and the Chinese term for “wheat gluten” (miàn cháng) was frequently misunderstood as “miàn chá.” Participants also experienced a higher incidence of recognition errors for cooking methods, for example, lǔ (stew) was misrecognized as rǔ (milk), zhǔ (boil), and fǔ (rotten), contributing to the system’s difficulty in accurately recognizing “stewed wheat gluten with peanuts.” In addition, incorrect recognition results were found for food names such as “peanut,” “bacon,” and “salted duck egg,” possibly because seniors have greater difficulty articulating nasal vowels [
The error type “did not select the ‘mix’ button” (#7) had the highest frequency among all errors in the VBR group and was specific to the item categories “dishes with two ingredients” (23.7%) and “dishes with three ingredients” (13.0%). This error may result from the app imposing cognitive overloading, as advanced age is associated with a decline in working memory [
The error “incorrect selections in the list” (#6) occurred with relatively high frequency in both groups (ranked second in the VOR group and third in the VBR group). This error is related to the user selecting the correct answer from a list of one to five possible choices, and could be explained by issues related to multimodal interaction in hand-eye coordination and speech input [
For dishes with two ingredients, the need to tap buttons in the VBR group contributed to time efficiency up to five times worse than that in the VOR group (eg, 51.63 vs 10.67 s for “stir-friend tofu with green bean”). Aside from the beverage items, VOR consistently outperformed VBR in terms of time efficiency. The slower response time of VBR may be due to the need for additional button tapping to move between pages, and time spent on trial and error to obtain the correct food names or food attributes.
The overall SUS score exceeded the adjective rating of “good,” indicating that the participants considered the two apps to be useful. The high accuracy rate achieved by the two groups may conform with the high overall SUS scores. The significant time difference (
Some previous studies [
The experiments were conducted under laboratory conditions using a predetermined list of dishes and beverages. Participants were recruited from a retirement community; thus, further tests are required using different target populations (eg, seniors with specific chronic illnesses) whose results may differ from those of the groups tested here. The intended use case [
Experimental results showed that, while users assessed both VOR and VBR as having similar utility, VOR had better accuracy and time efficiency, making it a better candidate for food reporting by seniors. The design of VOR is superior to that of VBR in that it relies solely on voice input for food intake reporting and does not require additional button taps. Experimental results showed that speech recognition results for certain food items have reduced recognition accuracy, and both groups evidenced challenges in selecting the desired items from the postvoice input suggestion menu. The user experience assessment results for the two apps developed for this research provide a useful empirical reference for the development of high usability consumer apps for dietary monitoring among elderly people. Further studies are required, including investigations involving authentic dining environments with real-world meal options, along with full-scale randomized controlled trials to assess test efficacy.
Mobile app voice-only reporting of a dish with one ingredient.
Mobile app voice-only reporting of a dish with two or more ingredients.
Mobile app voice-button reporting of a dish with one ingredient.
Mobile app voice-button reporting of a dish with two or more ingredients.
Images of three set meals.
Error types of dietary intake using a voice reporting approach.
CONSORT eHealth checklist.
system usability scale
voice-button reporting
voice-only reporting
The authors are grateful to all study participants at the Food Interaction Design Lab and the Health Informatics Lab of the Chang Gung University in the College of Management. This research was funded by the Research Fund of Chang Gung Memorial Hospital and Chang Gung University (BMRPD67 and BMRPB81) and the Ministry of Science and Technology, Taiwan (MOST-108-2221-E-182-008-MY3 and MOST-109-2314-B-182-038-MY3).
None declared.