Carbohydrate Counting App Using Image Recognition for Youth With Type 1 Diabetes: Pilot Randomized Control Trial

Background: Carbohydrate counting is an important component of diabetes management, but it is challenging, often performed inaccurately, and can be a barrier to optimal diabetes management. iSpy is a novel mobile app that leverages machine learning to allow food identification through images and that was designed to assist youth with type 1 diabetes in counting carbohydrates. Objective: Our objective was to test the app's usability and potential impact on carbohydrate counting accuracy. Methods: Iterative usability testing (3 cycles) was conducted involving a total of 16 individuals aged 8.5-17.0 years with type 1 diabetes. Participants were provided a mobile device and asked to complete tasks using iSpy app features while thinking aloud. Errors were noted, acceptability was assessed, and


Introduction
Type 1 diabetes is among the most common chronic diseases of childhood, and its incidence is rising [1].The management of diabetes in youth is complex and impacted by numerous factors including numeracy skills, education, socioeconomic status, family dynamics, engagement with treatment regimens, and use of technologies such as pumps and continuous sensors.Among these factors, insulin administration remains the cornerstone of type 1 diabetes management, but its optimal dosing is often complicated by the need to count carbohydrates [2][3][4].Carbohydrate counting allows individuals with type 1 diabetes to match their insulin doses to planned food consumption, and accurate carbohydrate counting can improve blood glucose control (measured by hemoglobin A 1c ; HbA 1c ) [2].For example, in one study [5] focused on parents of children with type 1 diabetes, more accurate parental carbohydrate counting was associated with 0.8% lower HbA 1c values in their children.Among adults with type 1 diabetes, a meta-analysis [6] of 5 studies showed that HbA 1c levels improved by an average of 0.6% with improved carbohydrate counting.
Despite its importance, up to two-thirds of individuals with diabetes report having trouble with carbohydrate counting [7].It has been reported that only a quarter of youth can routinely count carbohydrates within 10 g of the true net carbohydrate value, even for commonly eaten foods [8] and that carbohydrate counting is a barrier to diabetes management [9].Carbohydrate counting often requires multiple training sessions with experienced dietitians or educators and ongoing efforts by patients and families to maintain competency.Estimating carbohydrate intake can be difficult when the portions of food being consumed are not the same as those listed in an exchange system or on the food label, requiring youth to adjust the carbohydrate count to the appropriate portion size.Accuracy of carbohydrate counting can be further limited by low nutritional literacy and poor numeracy skills [10].
Technologies such as mobile health apps that address these barriers have the potential to ease burden and improve blood glucose control.Unfortunately, most diabetes-related mobile health apps have not undergone formal evaluation and lack evidence of clinical effectiveness, making it difficult for prospective users to assess the value of a particular app to their self-management [11], a situation that is also true within the domain of carbohydrate counting apps.
To help address these gaps, a mobile app was designed and developed to assist youth in counting carbohydrates.The app (iSpy) uses image recognition and artificial intelligence to identify foods and report their carbohydrate content.Here we addressed the question of how iSpy would perform during usability and pilot testing.We hypothesized that it would be well-accepted and that its use would be associated with improved carbohydrate counting accuracy.

Description of iSpy
The iSpy app (see Multimedia Appendix 1 for screenshots) was developed and evaluated in sequential phases [12,13].The image recognition algorithm that identifies foods from images uses a convolutional neural network.The interface for iSpy was initially co-designed with intended users including certified diabetes educators, registered dietitians, and individuals (aged 12-75 years) living with type 1 diabetes.Once developed, cycles of refinement were conducted after assessing how users functionally navigated the app.The app was then tested for accuracy on a sample of 200 commonly consumed food items (169 items from the Youth Adolescent Food Frequency Questionnaire [14] and 31 commonly consumed complex items containing 2 or more components) selected by a registered dietitian and diabetes educator.An accuracy test required iSpy to report a carbohydrate content that was within 10 g of the food item's true net (total minus fiber) carbohydrate value [15].Revisions were made until iSpy was able to achieve this degree of accuracy for ≥90% (180/200) of the items.Current overall accuracy is 94.5% (189/200).The app was then moved into clinical testing, described herein.

Setting
The usability testing and pilot randomized controlled trial were approved by the research ethics board and conducted within the diabetes program at The Hospital for Sick Children.Informed consent was obtained from all participants, and the pilot randomized controlled trial was registered with clinicaltrials.gov(NCT04354142).

Usability Testing Procedures
Inclusion criteria were (1) age 8.0-18.0years, (2) a diagnosis of type 1 diabetes per Diabetes Canada guidelines [16], (3) use of carbohydrate counting as part of treatment regimen, and (4) fluency in English (iSpy is only available in English).The sole exclusion criterion was cognitive impairments.
Iterative cycles of testing and app refinement (3 cycles) were utilized.Testing consisted of 4 scenario-based tasks that were developed using standardized guidelines [17], a semistructured interview, and app acceptability measured by the 5-point Acceptability E-Scale [13].The focus for the task was on user performance (ie, ease of use, navigation among screens, functions, errors, and efficiency); the semistructured interview and Acceptability E-Scale were focused on overall satisfaction with the app.Participants were purposively selected to achieve a range of age, gender, and duration of type 1 diabetes.The participants were asked to think aloud during use of the app and dialog was audiorecorded.
Participants were provided with an Android or iOS mobile device, depending on the participant's preference.Scenario-based tasks included use of app features such as photo taking, portion sizing, and food identification.Errors, efficiency (time taken to complete a task), acceptability (ease of use), and suggestions for improvements were logged, and tasks were classified into 1 of 3 categories (successfully completed, completed with minor issues, incomplete due to usability issues).Following each cycle, refinements were made to the user interface based on problems and recommendations, with the revised interface being evaluated in the subsequent cycle [13].iSpy was moved to pilot testing (pilot randomized controlled trial) when no further issues were identified in the third cycle.

Pilot Randomized Controlled Trial Procedures
Inclusion criteria were (1) age 10 years-17.0years (adjusted after usability testing because those under 10 years of age had difficulty navigating the app), (2) ≥6 months since diagnosis with type 1 diabetes, (3) completion of initial carbohydrate counting classes, (4) incorporation of carbohydrate counting into treatment regimen, and ( 5) access to a smartphone and data plan.Exclusion criteria were (1) cognitive impairment, (2) comorbid physical or psychiatric conditions that might impact ability to use iSpy, (3) diagnosis of a condition that affects dietary exposure, and (4) participation in usability testing.
A convenience sample was enrolled (n=46) and randomly assigned to either usual care (control) or usual care and iSpy (intervention) group using a 2-group randomized block design in blocks of 4 and 6, where the block sizes were not known to the investigator.The randomization schedule was created using SAS (version 9.4; SAS Institute).Data from previous work in our clinic [18] was used to estimate the sample size, indicating that 20 participants per group would be sufficient to detect a mean accuracy difference of 7.1 g in carbohydrate counting (which fit with our aim of assessing accuracy within 10 g), assuming 80% power (β=.2), α=.05, and using a 2-sided paired t test; therefore, 23 participants were recruited per group to allow for potential dropout over the 3-month trial.
Duration of diabetes (time since diagnosis) and HbA 1c levels were obtained from chart review.Accuracy and efficiency (time taken) of carbohydrate counting were based on a performance task.Participants counted carbohydrates for 10 foods (consisting of 2 foods from each of the 4 main food groups-vegetables and fruit, grain products, milk and alternatives, and meat and alternatives-and in addition, desserts).In each of the 5 food groups, a simple food item (eg, a single item such as an apple) as well as a complex food item (eg, an item containing 2 or more components but with the base food from the selected food group, such as pasta with tomato sauce) were included.Two sets of foods (Diet A and Diet B) of similar difficulty were utilized with half of the participants in each group counting foods from Diet A at baseline and foods from Diet B at 3 months, and vice versa for the other half.This methodology allowed us to control for confounding from participants educating themselves on test items or from any unanticipated differences between test diets.The net carbohydrate value for each food item was determined by either the nutrition label for packaged foods, the United States Department of Agriculture's National Nutrient Database for Standard Reference [19], the Canadian Nutrient File [20], or by our dietitian (VP) who specializes in diabetes care.We chose to utilize the performance task metric to assess carbohydrate counting instead of using tools such as the PedCarbQuiz [21] so that we could assess the effect of iSpy on counting the carbohydrate content of foods as opposed to its effect on domains such as nutrition label reading or insulin dosing, which are part of the PedCarbQuiz.
Additional measures were also collected.At baseline, comfort with technology was assessed.Quality of life, measured by a subset of questions from quality of life for youth [22,23]; self-care, measured by a subset of questions from the Self Care Inventory [24][25][26]; and patient or parent responsibility, measured by a subset of questions from Diabetes Family Responsibility Questionnaire were also assessed at baseline and 3 months postintervention.We also assessed factors related to usability of the app including fidelity (tracking of technical difficulties, errors within the app); levels of engagement; and acceptability using a 7-item Acceptability E-Scale (5-point scale) [27].Qualitative feedback was obtained via postintervention, semistructured interviews among all iSpy users.
At the start of the study, participants in the intervention group downloaded the app on their phone, and a demonstration of iSpy and its functionality were provided.iSpy participants were instructed to use the app at their discretion and when they thought its use would be beneficial.We recognized, for example, that participants may know the carbohydrate counts of the food items that they regularly consume.Thus, they may only want to use iSpy occasionally to assess the counts of only some of these food items whereas they may want to use the app more frequently for food items that they do not regularly consume.Given these instructions instead of a recommended number of uses per day, engagement levels were assessed based on frequency of using the app to log foods per week categorized as high (logging ≥2 meals per week), medium (logging ≥1 meal every 2 weeks but <2 times per week), or low (logging <1 meal every 2 weeks).This structure is similar to that used by others to assess app use [28].In other instructions, iSpy participants were asked to contact the team should they encounter technical difficulties, and they received a phone call 6 weeks postbaseline for general troubleshooting.As this was a pilot study, we strove to encourage the use of iSpy by sending a maximum of 3 automated alerts to participants not accessing iSpy at least once every 2 weeks.
Statistical analysis for the pilot randomized controlled trial was conducted using R (version 3.6.0)statistical software.Descriptive statistics of participant characteristics for the intervention and control groups are presented as means and standard deviations for continuous variables, and counts and proportions for categorical variables.Differences in these characteristics between the intervention and control groups were tested using 2-sided independent t tests for continuous variables, and chi-square tests for categorical variables.
Differences between the intervention and control group on the primary outcome variables (accuracy, time taken for counting, and the percentage of food items for which participants estimated the carbohydrate content within 10 g of the true net carbohydrate value), secondary outcomes (quality of life for youth, self-care, and patient or parent responsibility), and HbA 1c level at baseline were examined using 2-sided independent t tests.Differences in these variables between the intervention and control group at the follow-up visit were assessed using multiple linear regression models, which included the baseline as a covariate.P values <.05 were considered to be statistically significant.

Usability Testing
Youth (total: n=16-cycle 1: n=6; cycle 2: n=4, cycle 3: n=6) ranging in age from 8.5 to 17.0 years (mean 13.5, SD 2.6 years) participated in iSpy's iterative usability testing.Scenarios consisting of multiple tasks were used; based on how the participant responded to iSpy or how image recognition classified the food within each scenario, follow-up tasks were required, with the total number of tasks across 4 scenarios varying between 35 and 41 per participant.Errors within each cycle were tracked (Figure 1).In cycle 1, a total of 27 errors preventing successful completion of tasks occurred (mean 4.5 SD 4.4 per participant), representing 12.2% (27/222) of the total tasks.In response, modifications were made such as simplifying the user interface and changing wording so that the app flow was more intuitive.In cycle 2, errors were made on 9.6% of the tasks (15/157).Additional changes were made to the app including simplifying input requirements, making only one action possible at a time, improving graphics, and clarifying instructions.In cycle 3, no errors (0/224, 0%) preventing task completion occurred, and only 2.7% of tasks (6/224) had minor incidents.Acceptability E-Scale scores were positive (mean 4.6, SD 0.7) on domains that included helpfulness in carbohydrate counting and food identification, ease of use, time taken, and overall satisfaction across all 3 cycles of testing.Postcycle 3, minor modifications such as aesthetic changes to the user interface were made prior to pilot testing (pilot randomized controlled trial).

Pilot Randomized Controlled Trial
Of the 46 participants who were enrolled and randomly allocated into the 2 arms of the pilot study, 43 participants completed the study (Figure 2).All participants reported being comfortable using computers and smart devices, and there were no significant differences between the 2 groups for any of the baseline characteristics (gender: P=.22; age: P=.99; duration since diagnosis: P=.79; regulation method: P=.62; confidence in counting: P=.39; Table 1).
At baseline, there was also no difference in carbohydrate counting accuracy or time taken to complete the task between the 2 groups.At the 3-month follow-up visit, the iSpy group displayed a statistically significant increase in carbohydrate counting accuracy (P=.008), and a statistically significant decrease in counting errors (P=.047) compared to that of the control group (Table 2).None of the secondary outcome variables such as quality of life measures (P=.64), self-care measures (P=.17), or patient/parent responsibility (P=.69), differed between the groups at baseline and 3-month postintervention period.Although not a main outcome variable for this pilot study, HbA 1c values were assessed at baseline and at the 3-month follow-up visit, with the iSpy group displaying statistically significant lower HbA 1c values (P=.03) compared to those of the control group.No major technical challenges were identified.App engagement was assessed over 4 time periods (first 2 weeks of study, 2 weeks to 1 month, first to second month, and second month to end of study), with 43% (9/21) of participants indicating medium or high use at the end of study (Figure 3).Over the course of the study, a mean of 1.9 (SD 0.94) reminder emails were sent to iSpy users.Acceptability E-Scale results were positive (Multimedia Appendix 2).Of the 7 questions asked, iSpy respondents ranked iSpy positively in 6 out of 7 categories.The highest rankings were related to ease of understanding and ease of use.The weakest responses were related to how helpful iSpy was in food identification.
Semistructured interviews were conducted among all iSpy participants, and results mirrored those of the questionnaires.Most participants (18/21, 86%) found the iSpy app fairly to very easy to use.Participants preferred using the photos followed by text features to identify foods with few respondents utilizing the voice function.Participants also valued speed of image recognition results and delays due to misidentification of foods were viewed negatively.Participants provided suggestions for improvement, such as including additional options for portion sizing, refining the identification features so that foods within a complex meal do not need to be added one-by-one, expanding the database of known foods, and including optional reminder notifications about logging foods.

Principal Results
With few exceptions [28,29], apps used to facilitate diabetes care have not undergone formal testing [11,30] making it difficult to assess their utility and limiting the advancement of digital health apps for diabetes care.Here we report iterative usability testing and pilot testing of iSpy showing that use of the app was associated with improved carbohydrate counting accuracy and high acceptability and satisfaction scores.Areas for further refinement were also identified such as increased speed and more focus on image and text recognition features.
Carbohydrate counting is an important component of diabetes care [31][32][33], and use of iSpy was associated with fewer counting errors of >10 g.Although not measured directly, this degree of improved accuracy would theoretically lead to improved postprandial glucose, and the improvement we observed in HbA 1c levels may suggest an effect on overall glycemic control, a finding that warrants further study in future trials.Errors of >10 g are considered clinically important [8,15,34], with one

XSL • FO
RenderX study reporting that children who received prandial insulin boluses based on carbohydrate estimates within 10 g had minimal changes in blood glucose postprandially [34].On the other hand, when insulin boluses were based on carbohydrate estimates that were off by 20 g, more instances of hypo-and hyperglycemia occurred 2-3 hours after the meal [15].

Comparison With Prior Work
iSpy is not the only app that has been developed to assist patients or caregivers with carbohydrate counting; however, the majority of the other apps, such as MyFitnessPal or Samsung Health, are general nutrition tracking apps.Data are limited, but available studies report conflicting information regarding the difference between output nutrient data from these apps compared to reference data [35,36].Furthermore, these apps have limited input modalities (generally limited to manual text searching) [37].Other diabetes-related apps address multiple aspects of diabetes self-management, including tracking of glucose data, physical activity, diet, and insulin doses; such apps may also include assistance with carbohydrate counting.When tested, many of these apps have not demonstrated significant improvements in their primary outcomes, which have mainly been centered on glycemic control [28,38].It is also worth noting that while these apps are all-encompassing for diabetes self-management, some recommend educational features such as carbohydrate counting to improve their usage [39].However, apps that have been developed specifically to assist with carbohydrate counting are limited, and few of these have been formally evaluated, with studies having only been conducted over a duration of a few days or weeks and lacking successful comparisons with controls [40][41][42][43][44].
Thus, our phased development of iSpy, along with usability testing and the 3-month pilot study are relatively unique, as are the promising results.It is possible that use of iSpy was associated with more accurate carbohydrate counting because iSpy reinforced a structured approach to carbohydrate counting: identifying each food item, determining the portion size being consumed, and asking about any "hidden" carbohydrates, such as barbeque sauce under a bun.Whether this step-by-step approach to carbohydrate counting with real-time feedback underlies the observed improvement can be tested in subsequent studies.
Participant engagement is an additional measure of an app's usability.Although use was declining, at the end of the 3-month trial 9/21 (43%) participants were still medium to high users.
It is difficult to know how to interpret this degree of usage.One could view this percentage of individuals with use at a minimum of ≥1 meal every 2 weeks as evidence of engagement that is diminishing too rapidly.However, it is our experience that app usage often wanes over time, even when users rate the app quite favorably [28].It is this expected degree of dropoff that led us to define ongoing usage of at least ≥1 meal every 2 weeks as medium engagement.Moreover, despite the dropoff, iSpy use was associated with improved carbohydrate counting at 3 months, suggesting that use of an educational app may have long lasting impact even after the period of high use has ended.Nevertheless, it will be important to consider options to further improve engagement such as push alert notifications, reminders, and ensuring ease of data entry.
Although randomized controlled trials are considered the gold standard for evaluating efficacy, there is concern that such trials may not be optimal for the assessment of apps.Thus, when considering future testing of iSpy and other apps, one must acknowledge the long timeline for recruitment and conduct of such trials within a rapidly and continually evolving technology-based environment [28,45].User testing and product revision often occurs on shorter timelines, necessitating consideration of adaptive clinical trials that allow for continual modifications while data are being collected [46].Furthermore, evaluation of potential barriers to incorporating app use into ongoing clinical care should also be assessed as a component of such trials.Assessing and addressing topics such as workflow integration and patient (or family)-provider communication will be needed to continue to support effective advancement of digital health [28,45].

Limitations
While our results are encouraging, we acknowledge that our studies may have had some limitations.The studies were conducted at a single tertiary pediatric center, and the results may not be generalizable.A larger trial and wider clinical implementation study is an important next step to verify our findings.In addition, although based on databases of commonly consumed foods [14], the number of foods recognized by iSpy is not all-encompassing.The database was not identified as a limiting factor by our participants, but we will continue to expand iSpy's ability to recognize foods eaten around the world among different cultures.Though no differences were found between the intervention and control groups for baseline technology familiarity and use, we did not acquire detailed information about other factors that can influence care such as education level, socioeconomic status data, family dynamics, or details of treatment regimen, all of which could have accounted for some differences.Finally, we did not provide text reminders to the control subjects in this pilot study.Although these reminders were brief texts that occurred at most 3 times over the trial, it is feasible that they could have motivated change and thus represent a confounding variable affecting our results.

Conclusion
Carbohydrate counting remains a challenge for youth with type 1 diabetes and their families, and errors in counting can have clinical impact.We have developed and conducted rigorous pilot testing of an app designed to assist youth with carbohydrate counting.The data suggest that use of iSpy is associated with improved carbohydrate counting and that usability and acceptability of the app is quite positive.Further testing is now warranted to verify these pilot data and determine if the app can indeed improve blood glucose control and help decrease the burden of living with type 1 diabetes.

Figure 1 .
Figure 1.Usability testing errors per cycle representing tasks completed during each cycle of usability testing.The total number of tasks varied per cycle (cycle 1: 222; cycle 2: 157; cycle 3: 224).

Figure 3 .
Figure 3. Engagement levels with the number of participants in each category displayed for each time frame.

Table 2 .
Carbohydrate counting and glycemic control outcomes (at baseline and follow-up).