Accepted for/Published in: Online Journal of Public Health Informatics

Date Submitted:

Open Peer Review Period: -

Date Accepted:

Date Submitted to PubMed:

closed for review but you can still tweet
  • Mehrab B, Ian W H, Kimmo K, Chenglin H, Cory C, Elizabeth S C W, Callisto B, Alexandra C A, Elizabeth A Y, Majid S
  • Identifying Substance Use and High-Risk Sexual Behavior Among Sexual and Gender Minority Youth by Using Mobile Phone Data: Development and Validation Study
  • Online Journal of Public Health Informatics
  • DOI: 10.2196/11848
  • PMID: 30303485
  • PMCID: 6352016

Identifying Substance Use and High-Risk Sexual Behavior Among Sexual and Gender Minority Youth by Using Mobile Phone Data: Development and Validation Study

Abstract

background

Sexual and gender minority (SGM) individuals are at heightened risk for substance use and sexually transmitted infections than their non-SGM peers. Collecting mobile phone usage data passively may open new opportunities for personalizing interventions, as behavioral risks could be identified without user input.

objective

Our objectives were to determine (1) whether passively sensed mobile phone data can be used to identify substance use and sexual risk behaviors for STI and HIV transmission among young SGM who have sex with men, (2) which outcomes can be predicted with a high level of accuracy, and (3) which passive data sources are most predictive of these outcomes.

methods

We developed a mobile phone app to collect participants’ messaging, location, and app use data and trained a machine learning model to predict risk behaviors for STI and HIV transmission. We used Scikit-learn to train logistic regression and gradient boosting classification models with simple linear model specification to predict participants substance use and sexual behaviors (i.e. condomless anal sex, number of sexual partners, and methamphetamine use), which were validated using self-report questionnaires. F1 scores were used to quantify prediction accuracy of the model utilizing different data sources (and combinations of these sources) for prediction. Differences between text, location, app use, and Linguistic Inquiry and Word Count (LIWC) domains by outcome were investigated using Independent t-tests where associations were considered significant at p<0.05.

results

Among participants (n=82) who identified as SGM, were sexually active, and reported recent substance use, our model was highly predictive of methamphetamine use and having 6+ sexual partners (F1 scores as high as 0.83 and 0.69 respectively). The model was less predictive of condomless anal sex (highest F1 score 0.38). Overall, text-based features were found to be most predictive, but app use and location data improved predictive accuracy, particularly for detecting 6+ sexual partners. Methamphetamine use was significantly associated with dating app use (p=0.01) and use of sex-related words (p=0.002). Having six or more sex partners was associated with dating app use (0.02), use of sex-related words (p=0.001), and traveling a further distance from home (p=0.03), on average, compared to participants with fewer sex partners. Methamphetamine users were more likely to use social (p=0.002) and affect words (p=0.003) and less likely to use drive-related words (p=0.02). People having 6 or more partners were more likely to use social, affect words, and cognitive process-related words (p=0.003 and 0.004 respectively).

conclusions

Our results show that passively collected mobile phone data may be useful in detecting sexual risk behaviors. Expanding data collection may improve the results further, as certain behaviors, such as injection drug use, were quite rare in the study sample. These models may be used to personalize STI and HIV prevention as well as substance use harm reduction interventions.

International Registered Report

RR2-10.2196/58448

As per the author’s request the PDF is not available.