This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mHealth and uHealth, is properly cited. The complete bibliographic information, a link to the original publication on https://mhealth.jmir.org/, as well as this copyright and license information must be included.
Behavioral eHealth and mobile health interventions have been moderately successful in increasing physical activity, although opportunities for further improvement remain to be discussed. Chatbots equipped with natural language processing can interact and engage with users and help continuously monitor physical activity by using data from wearable sensors and smartphones. However, a limited number of studies have evaluated the effectiveness of chatbot interventions on physical activity.
This study aims to investigate the feasibility, usability, and effectiveness of a machine learning–based physical activity chatbot.
A quasi-experimental design without a control group was conducted with outcomes evaluated at baseline and 6 weeks. Participants wore a Fitbit Flex 1 (Fitbit LLC) and connected to the chatbot via the Messenger app. The chatbot provided daily updates on the physical activity level for self-monitoring, sent out daily motivational messages in relation to goal achievement, and automatically adjusted the daily goals based on physical activity levels in the last 7 days. When requested by the participants, the chatbot also provided sources of information on the benefits of physical activity, sent general motivational messages, and checked participants’ activity history (ie, the step counts/min that were achieved on any day). Information about usability and acceptability was self-reported. The main outcomes were daily step counts recorded by the Fitbit and self-reported physical activity.
Among 116 participants, 95 (81.9%) were female, 85 (73.3%) were in a relationship, 101 (87.1%) were White, and 82 (70.7%) were full-time workers. Their average age was 49.1 (SD 9.3) years with an average BMI of 32.5 (SD 8.0) kg/m2. Most experienced technical issues were due to an unexpected change in Facebook policy (93/113, 82.3%). Most of the participants scored the usability of the chatbot (101/113, 89.4%) and the Fitbit (99/113, 87.6%) as at least “OK.” About one-third (40/113, 35.4%) would continue to use the chatbot in the future, and 53.1% (60/113) agreed that the chatbot helped them become more active. On average, 6.7 (SD 7.0) messages/week were sent to the chatbot and 5.1 (SD 7.4) min/day were spent using the chatbot. At follow-up, participants recorded more steps (increase of 627, 95% CI 219-1035 steps/day) and total physical activity (increase of 154.2 min/week; 3.58 times higher at follow-up; 95% CI 2.28-5.63). Participants were also more likely to meet the physical activity guidelines (odds ratio 6.37, 95% CI 3.31-12.27) at follow-up.
The machine learning–based physical activity chatbot was able to significantly increase participants’ physical activity and was moderately accepted by the participants. However, the Facebook policy change undermined the chatbot functionality and indicated the need to use independent platforms for chatbot deployment to ensure successful delivery of this type of intervention.
It has been established that physical activity reduces the risk of mortality and many health conditions such as cardiovascular diseases, type 2 diabetes, and cancer [
With the advancement of mobile technology, people can access the internet almost everywhere and at any time. It is estimated in 2019 that 4.48 billion people are active internet users, 4.07 billion are unique mobile internet users, and 3.66 billion are active mobile social media users [
The use of chatbots is a potential innovative avenue for achieving higher levels of engagement. A chatbot or conversational agent is a computer program that can interact with users [
Recent reviews indicate that health behavior change interventions using chatbots have mostly focused on mental health [
Given the lack of studies on the effectiveness of physical activity chatbots, the aim of this study is to investigate the feasibility, usability, and effectiveness of an interactive machine learning–based physical activity chatbot that uses natural language processing and adaptive goal setting.
A quasi-experimental design without a control group was conducted with outcomes evaluated at 2 time points—baseline and 6 weeks after participants started to use the chatbot. Prospective participants were recruited from a list of people who had previously used the 10,000 Steps program [
Owing to an unexpected Facebook policy change (we used the Messenger app to host the chatbot, which is owned by Facebook) that blocked the chatbot from sending out new messages to participants who did not respond to the previous message within 24 hours, we were forced to stop the study at that point. As the recruitment was rolling, 48 participants had already completed the study when the Facebook policy change was implemented. For those who were still engaged in the study at that time, a follow-up survey was sent to them immediately at the time of implementation of this policy, resulting in a shorter intervention period.
Invitation emails were sent to 13,670 email addresses registered in the 10,000 Steps program database between September and November 2020 (
Participant flowchart.
This study was approved by the Human Research Ethics Committee of the Central Queensland University (application #0000022181). This study was retrospectively registered on the Australian New Zealand Clinical Trials Registry (ACTRN12621000345886).
Participants who agreed to participate, provided written consent, and completed the baseline survey were mailed a package including a Fitbit Flex 1 activity tracker (with instructions on how to use it), a participant information sheet, and instructions on how to download the Fitbit app on their smartphone and how to create a Fitbit account. Follow-up phone calls were conducted to ensure that participants received the package and were able to install the Fitbit app and use the Fitbit device.
Participants wore their Fitbit for 7 days to collect their baseline physical activity data before connecting to the chatbot. To connect to the chatbot, participants were instructed to download and open the Messenger app on their smartphones and complete the secure verification process (only study participants were able to connect with the chatbot). Once verified, the participants started to receive daily messages and were able to interact with the chatbot. Participants were asked to engage with the chatbot (intervention) for a period of 6 weeks.
Follow-up surveys were sent to the participants via email. Four reminders (a combination of text messages, email, and phone calls), each of which was 3 days apart, were sent to ask participants to complete the follow-up survey. A research assistant was available during the intervention period to assist participants with any technical issues that the participants may encounter.
The chatbot, named
The intervention was designed using the COM-B model. The COM-B model forms the core of the Behavior Change Wheel, a behavioral system focusing on 3 components: capability, opportunity, and motivation [
Proactive actions include the following: (1) Providing an update on participants’ physical activity level achieved the previous day and informing them of the goal they needed to achieve on the current day. This message was sent early in the morning at the time selected by each participant. (2) Sending out 1 or 2 additional messages later in the day to encourage participants trying to achieve their daily goal or indicate they were doing great and had already achieved the goal when the message was sent. The number of messages and times was selected by each participant. (3) Automatically adjusting the daily activity goals based on the average physical activity level achieved during the 7 previous days. The type of goal (step counts or minutes) and the amount per day (eg, 8000 steps/day or 35 min/day) that the participant wanted to achieve by the end of the study was also chosen by each participant. The goal was automatically adjusted to increase by 500 steps/day or 5 minutes of moderate-vigorous physical activity/day if the participant, on average, met their current goal over the last 7 days [
Reactive actions, which occurred when the participants sent a request for information to the chatbot, include (1) providing sources of information on the benefits of physical activity, (2) sending general motivational messages to encourage participants to become more active, and (3) checking participants’ activity history (ie, the step counts or minutes that were achieved on any day) as requested. Examples of these messages are shown in
Message examples: (A) introduction, (B) request on step counts, (C) message upon reaching the goal, (D) message encouraging the participant to try reaching the goal.
Demographic characteristics were self-reported at baseline. Age, height, weight, years of schooling, and average daily work time (hours) were used as continuous variables; categorical variables included gender (male or female), marital status (not in a relationship or in a relationship), ethnicity (White or other), living area (major city, regional, or remote area), work status (full-time or other), and annual household income (≥Aus $130,000 [US $94,900], Aus $78,000 to <Aus $130,000 [US $56,940- $94900], or <Aus $78,000 [US $56,940]). Weight was also self-reported at follow-up. BMI was calculated as weight (kg)/height (m2) and was analyzed as a secondary outcome.
Physical activity was objectively measured using the Fitbit Flex 1. Although this device records both step counts and physical activity minutes, only step counts were used in the analysis. This is because the Fitbit only recorded the minutes if a user was active for at least 10 minutes, whereas all steps were counted regardless of whether they occurred during bouts of activity (10 minutes) or not.
Self-reported physical activity was assessed at baseline (before receiving a Fitbit) and follow-up using the Active Australia Survey [
Usability and acceptability were assessed at follow-up using the System Usability Scale (SUS) [
Posthoc power calculation was conducted for Fitbit step counts using the following parameters: difference in means, SDs, and correlation between step counts at 2 time points. The posthoc power for this study was 81.3%.
Fitbit data were cleaned and processed using the Python v3.7 (Python Software Foundation). As step counts of <1000 indicate that the Fitbit was not worn all day [
SAS v9.4 (SAS Institute) was used for the analysis. Baseline characteristics were compared among those participating <4 weeks, 4 to <6 weeks, and ≥6 weeks using Fisher exact tests for categorical variables and Welch analysis of variance for continuous variables, except for daily work time and total physical activity minutes, which were tested using Kruskal–Wallis tests. As a robustness check, the analysis was performed separately for 2 samples, a full sample and a subsample (excluding those using the chatbot <4 weeks). This ensures that the results reflect the effectiveness of the intervention for those with sufficient exposure to the chatbot.
Generalized linear mixed models were used to identify changes in the outcomes. Normal distribution and identity link were used for BMI and Fitbit step counts. As total physical activity minutes were highly skewed, PROC TRANSREG was used to conduct the Box-Cox transformation analysis, and as a result, a fourth root transformation was applied. Generalized linear mixed models with normal distribution and log link were used for the transformed total physical activity minutes. Estimates were converted back into ratios for interpretative purposes. Empirical estimators were used to obtain the robust SEs. Binary distribution and logit link were used to determine the outcome of meeting physical activity guidelines. For each outcome, 2 models were run to generate crude estimates and estimates adjusted for sample characteristics including age, gender, marital status, years of schooling, ethnicity, household income, living area, work status, and daily work time. Differences in BMI, step counts, and total physical activity minutes between the follow-up and baseline were reported with a 95% CI. Odds ratios (ORs) and 95% CIs were reported for meeting the physical activity guidelines. All
Characteristics at baseline by participation duration (N=116).
|
All (N=116) | <4 weeks (n=17) | 4-<6 weeks (n=51) | At least 6 weeks (n=48) | ||||||||
|
.49 | |||||||||||
|
Male | 21 (18.1) | 2 (11.8) | 12 (23.5) | 7 (14.6) |
|
||||||
|
Female | 95 (81.9) | 15 (88.2) | 39 (76.5) | 41 (85.4) |
|
||||||
|
.17 | |||||||||||
|
Not in a relationship | 31 (26.7) | 7 (41.2) | 15 (29.4) | 9 (18.8) |
|
||||||
|
In a relationship | 85 (73.3) | 10 (58.8) | 36 (70.6) | 39 (81.3) |
|
||||||
|
.27 | |||||||||||
|
White | 101 (87.1) | 13 (76.5) | 44 (86.3) | 44 (91.7) |
|
||||||
|
Others | 15 (12.9) | 4 (23.5) | 7 (13.7) | 4 (8.3) |
|
||||||
|
.72 | |||||||||||
|
Major city | 56 (48.3) | 9 (52.9) | 26 (51) | 21 (43.8) |
|
||||||
|
Regional or remote areas | 60 (51.7) | 8 (47.1) | 25 (49) | 27 (56.3) |
|
||||||
|
.53 | |||||||||||
|
Full-time | 82 (70.7) | 14 (82.4) | 34 (66.7) | 34 (70.8) |
|
||||||
|
Others | 34 (29.3) | 3 (17.7) | 17 (33.3) | 14 (29.2) |
|
||||||
|
.85 | |||||||||||
|
≥130,000 (≥94,900) | 35 (30.2) | 6 (35.3) | 14 (27.5) | 15 (31.3) |
|
||||||
|
78,000 to <130,000 (56,940-94,900) | 39 (33.6) | 5 (29.4) | 20 (39.2) | 14 (29.2) |
|
||||||
|
<78,000 (<56,940) | 42 (36.2) | 6 (35.3) | 17 (33.3) | 19 (39.6) |
|
||||||
Average age (years), mean (SD) | 49.1 (9.3) | 48.9 (11.0) | 50.3 (9.0) | 48.1 (9.0) | .48 | |||||||
Average height (cm), mean (SD) | 167.4 (8.9) | 163.7 (11.6) | 169.4 (9.2) | 166.7 (7.0) | .11 | |||||||
Average weight (kg), mean (SD) | 91.3 (24.7) | 94.3 (18.1) | 91.3 (27.8) | 90.3 (23.5) | .77 | |||||||
Average BMI (kg/m2), mean (SD) | 32.5 (8.0) | 35.3 (6.7) | 31.7 (8.6) | 32.4 (7.7) | .19 | |||||||
Average years of schooling, mean (SD) | 15.8 (3.5) | 15.5 (3.8) | 15.6 (3.4) | 16.0 (3.6) | .78 | |||||||
Average daily work time (h/day), mean (SD) | 8.0 (2.0)a | 8.0 (1.9) | 7.8 (1.8)b | 8.0 (2.2)c | .97d | |||||||
Average step counts/day, mean (SD) | 5933 (2391)e | 5761 (2076)f | 6466 (2800)g | 5428 (1895)h | .12 | |||||||
Average total physical activity (min/week), mean (SD) | 86.5 (137.5) | 72.4 (58.0) | 91.4 (143.8) | 86.3 (151.8) | .80i | |||||||
|
.12 | |||||||||||
|
No | 100 (86.2) | 16 (94.1) | 40 (78.4) | 44 (91.7) |
|
||||||
|
Yes | 16 (13.8) | 1 (5.9) | 11 (21.6) | 4 (8.3) |
|
an=106.
bn=45.
cn=44.
dFisher Exact tests or Welch analysis of variance was used unless indicated otherwise.
en=108.
fn=14.
gn=48.
hn=46.
iKruskal–Wallis test was used.
Usability and acceptability of the chatbot and Fitbit.
|
Value | |
|
||
|
Good | 12 (10.6) |
|
OK | 89 (78.8) |
|
Poor | 12 (10.6) |
|
||
|
Strongly agree or agree | 49 (43.4) |
|
Neutral | 36 (31.9) |
|
Strongly disagree or disagree | 28 (24.7) |
|
||
|
Strongly agree or agree | 40 (35.4) |
|
Neutral | 32 (28.3) |
|
Strongly disagree or disagree | 41 (36.3) |
|
||
|
Strongly agree or agree | 60 (53.1) |
|
Neutral | 27 (23.9) |
|
Strongly disagree or disagree | 26 (23) |
|
||
|
Not at all useful or a little useful | 52 (46) |
|
Somewhat useful | 27 (23.9) |
|
Quite useful or very useful | 34 (30.1) |
|
||
|
Not at all useful or a little useful | 60 (53.1) |
|
Somewhat useful | 24 (21.2) |
|
Quite useful or very useful | 29 (25.7) |
|
||
|
Not at all useful or a little useful | 59 (52.2) |
|
Somewhat useful | 26 (23) |
|
Quite useful or very useful | 28 (24.8) |
|
||
|
Not at all useful or a little useful | 63 (55.7) |
|
Somewhat useful | 20 (17.7) |
|
Quite useful or very useful | 30 (26.6) |
|
||
|
Not at all useful or a little useful | 47 (41.6) |
|
Somewhat useful | 26 (23) |
|
Quite useful or very useful | 40 (35.4) |
|
||
|
Always | 71 (62.8) |
|
Most of the time | 35 (31) |
|
Sometimes or rarely | 7 (6.2) |
|
||
|
Several times a day | 30 (26.6) |
|
Once a day | 28 (24.8) |
|
Less than once a day | 19 (48.6) |
Average messages/week sent to the chatbot (n=113), mean (SD) | 6.7 (7.0) | |
Average time/day spent with the chatbot (minutes; n=113), mean (SD) | 5.1 (7.4) | |
|
||
|
Very much | 26 (23) |
|
Average | 42 (37.2) |
|
A little or not at all | 45 (39.8) |
|
||
|
Always or most of the time | 49 (43.3) |
|
Sometimes | 34 (30.1) |
|
Rarely or never | 30 (26.5) |
|
||
|
Yes | 93 (82.3) |
|
No | 20 (17.7) |
|
||
|
Yes | 95 (84.1) |
|
No | 18 (15.9) |
|
||
|
Good | 22 (19.6) |
|
OK | 77 (68.8) |
|
Poor | 13 (11.6) |
Average weeks of wearing the Fitbit (n=112), mean (SD) | 5.4 (1.1) | |
Average day/week of wearing the Fitbit (n=112), mean (SD) | 6.7 (0.9) | |
Average h/day of wearing the Fitbit (n=112), mean (SD) | 19.5 (5.5) | |
|
||
|
<1/day | 19 (17) |
|
Once a day | 27 (24.1) |
|
At least twice a day | 66 (58.9) |
The average usability for Fitbit was 64.0 (SD 11.1) with majority scoring the Fitbit usability as
Change in mean daily step over time.
Change in mean daily step over time.
Differences in the outcomes between follow-up and baseline.
|
Baseline | Follow-up | Crude estimate (95% CI) | Adjusted estimate (95% CI)a | ||||||||||
|
Participants, n | Value | Participants, n | Value |
|
|
||||||||
|
BMI (kg/m2), mean (SD) | 116 | 32.5 (8.0) | 116 | 32.4 (8.0) | −0.08 (−0.34 to 0.17) | −0.13 (−0.37 to 0.11) | |||||||
|
Step counts/day, mean (SD) | 108 | 5933 (2391) | 102 | 6570 (2326) | 633b (242 to 1024) | 627b (219 to 1035) | |||||||
|
Total physical activity (min/week), mean (SD)c | 116 | 86.5 (137.5) | 116 | 240.7 (233.6) | 4.04d (2.59 to 6.29) | 3.58d (2.28 to 5.63) | |||||||
|
|
|||||||||||||
|
|
No | 116 | 100 (86.2) | 116 | 54 (46.6) | 1.0 | 1.0 | ||||||
|
|
Yes | 116 | 16 (13.8) | 116 | 62 (53.5) | 7.18d (3.89 to 13.24) | 6.37d (3.31 to 12.27) | ||||||
|
||||||||||||||
|
BMI (kg/m2), mean (SD) | 99 | 32.0 (8.1) | 99 | 31.9 (8.2) | −0.08 (−0.37 to 0.21) | −0.13 (−0.4 to 0.14) | |||||||
|
Step counts/day, mean (SD) | 94 | 5958 (2444) | 89 | 6530 (2297) | 576b (153 to 998) | 564f (120 to 1009) | |||||||
|
Total physical activity (min/week), mean (SD)c | 99 | 88.9 (147.0) | 99 | 265.5 (240.5) | 4.69d (2.92 to 7.55) | 4.17d (2.55 to 6.80) | |||||||
|
|
|||||||||||||
|
|
No | 99 | 84 (84.9) | 99 | 43 (43.4) | 1.0 | 1.0 | ||||||
|
|
Yes | 99 | 15 (15.1) | 99 | 56 (56.6) | 7.29d (3.77 to 14.12) | 6.41d (3.14 to 13.09) |
aAdjusted for age: gender, marital status, years of schooling, ethnicity, household income, living area, work status, and work duration.
b
cEstimates were converted back to ratios.
d
eEstimates are odds ratios.
f
This study examined the feasibility, usability, and effectiveness of a physical activity chatbot with natural language processing capability and adaptive goal setting delivered via the Facebook Messenger app. Significant improvements in both the step count and self-reported physical activity were observed. These findings are consistent with those from another Australian study examining a combined diet and physical activity chatbot using natural language processing [
The findings showed that the participants liked the chatbot with some even asking for continuing to use it after they completed the 6-week trial. Nevertheless, usability for both the chatbot and the Fitbit was rated as
Previous studies have also shown higher usability of Fitbit use [
The results also showed that BMI did not significantly improve at follow-up. This finding is not surprising, as our study did not target weight loss and therefore, no direct activity related to weight loss or weight maintenance was delivered. This is different from the other Australian chatbot-based physical activity interventions, which showed a significant decrease in weight at week 12 [
This study has several strengths: (1) both objective and subjective measures of physical activity were used to obtain accurate and complementary data on the effectiveness of the intervention [
The machine learning–based physical activity chatbot was able to significantly increase participants’ physical activity and was moderately accepted by the participants. However, a Facebook policy change undermined the chatbot functionality and indicated the need to use independent platforms for chatbot deployment so that this type of intervention could be successfully delivered.
Future studies with stronger designs, such as randomized controlled trials, in which the effect of the activity trackers can be isolated, are needed to confirm these findings. Research is also required to determine whether chatbot-based interventions could be effective for broader populations. Furthermore, technology to develop and evaluate more comprehensive chatbot interventions already exists. In addition to natural language processing, Fitbit integration and adaptive goal setting, it is possible to use deep reinforcement learning with feedback loops and integrate more real-time data sources (eg, GPS and weather data) to enable chatbots to personally tailor and continuously adapt cues to action to ensure the timing, frequency, context, and content are optimally suited for each participant. It is important that such comprehensive physical activity chatbots should be developed and evaluated in future studies.
mobile health
odds ratio
System Usability Scale
This study was supported by a Future Leader Fellowship (ID 100427) awarded to CV by the National Heart Foundation of Australia, and a postdoctoral fellowship and a commencement grant (ID RSH/5409) awarded to QGT by Central Queensland University.
None declared.