@Article{info:doi/10.2196/16018, author="Wang, Zhi Chun and Zhang, Shi Ping and Yuen, Pong Chi and Chan, Kam Wa and Chan, Yi Yi and Cheung, Chun Hoi and Chow, Chi Ho and Chua, Ka Kit and Hu, Jun and Hu, Zhichao and Lao, Beini and Leung, Chun Chuen and Li, Hong and Zhong, Linda and Liu, Xusheng and Liu, Yulong and Liu, Zhenjie and Lun, Xin and Mo, Wei and Siu, Sheung Yuen and Xiong, Zhoujian and Yeung, Wing Fai and Zhang, Run Yun and Zhang, Xuebin", title="Intra-Rater and Inter-Rater Reliability of Tongue Coating Diagnosis in Traditional Chinese Medicine Using Smartphones: Quasi-Delphi Study", journal="JMIR Mhealth Uhealth", year="2020", month="Jul", day="9", volume="8", number="7", pages="e16018", keywords="mobile health; smartphone; traditional Chinese medicine; telemedicine; tongue image; machine learning; oral disease; Gwet AC2; COVID-19", abstract="Background: There is a growing trend in the use of mobile health (mHealth) technologies in traditional Chinese medicine (TCM) and telemedicine, especially during the coronavirus disease (COVID-19) outbreak. Tongue diagnosis is an important component of TCM, but also plays a role in Western medicine, for example in dermatology. However, the procedure of obtaining tongue images has not been standardized and the reliability of tongue diagnosis by smartphone tongue images has yet to be evaluated. Objective: The first objective of this study was to develop an operating classification scheme for tongue coating diagnosis. The second and main objective of this study was to determine the intra-rater and inter-rater reliability of tongue coating diagnosis using the operating classification scheme. Methods: An operating classification scheme for tongue coating was developed using a stepwise approach and a quasi-Delphi method. First, tongue images (n=2023) were analyzed by 2 groups of assessors to develop the operating classification scheme for tongue coating diagnosis. Based on clinicians' (n=17) own interpretations as well as their use of the operating classification scheme, the results of tongue diagnosis on a representative tongue image set (n=24) were compared. After gathering consensus for the operating classification scheme, the clinicians were instructed to use the scheme to assess tongue features of their patients under direct visual inspection. At the same time, the clinicians took tongue images of the patients with smartphones and assessed tongue features observed in the smartphone image using the same classification scheme. The intra-rater agreements of these two assessments were calculated to determine which features of tongue coating were better retained by the image. Using the finalized operating classification scheme, clinicians in the study group assessed representative tongue images (n=24) that they had taken, and the intra-rater and inter-rater reliability of their assessments was evaluated. Results: Intra-rater agreement between direct subject inspection and tongue image inspection was good to very good (Cohen $\kappa$ range 0.69-1.0). Additionally, when comparing the assessment of tongue images on different days, intra-rater reliability was good to very good ($\kappa$ range 0.7-1.0), except for the color of the tongue body ($\kappa$=0.22) and slippery tongue fur ($\kappa$=0.1). Inter-rater reliability was moderate for tongue coating (Gwet AC2 range 0.49-0.55), and fair for color and other features of the tongue body (Gwet AC2=0.34). Conclusions: Taken together, our study has shown that tongue images collected via smartphone contain some reliable features, including tongue coating, that can be used in mHealth analysis. Our findings thus support the use of smartphones in telemedicine for detecting changes in tongue coating. ", issn="2291-5222", doi="10.2196/16018", url="https://mhealth.jmir.org/2020/7/e16018", url="https://doi.org/10.2196/16018", url="http://www.ncbi.nlm.nih.gov/pubmed/32459647" }