JMIR Publications

JMIR mHealth and uHealth


Citing this Article

Right click to copy or hit: ctrl+c (cmd+c on mac)

Published on 06.11.17 in Vol 5, No 11 (2017): November

This paper is in the following e-collection/theme issue:

    Original Paper

    Obstructive Sleep Apnea in Women: Study of Speech and Craniofacial Characteristics

    1Signal Processing Applications Group, Signal, Systems and Radiocommunications Department, Universidad Politécnica de Madrid, Madrid, Spain

    2Audio, Data Intelligence and Speech Group, Universidad Autónoma de Madrid, Madrid, Spain

    3Respiratory Department, Sleep Unit, Hospital Quirón Salud de Málaga, Málaga, Spain

    Corresponding Author:

    Fernando Espinoza-Cuadros, PhD

    Signal Processing Applications Group

    Signal, Systems and Radiocommunications Department

    Universidad Politécnica de Madrid

    Avenida Complutense, 30

    Madrid, 28040


    Phone: 34 915495700 ext 212



    Background: Obstructive sleep apnea (OSA) is a common sleep disorder characterized by frequent cessation of breathing lasting 10 seconds or longer. The diagnosis of OSA is performed through an expensive procedure, which requires an overnight stay at the hospital. This has led to several proposals based on the analysis of patients’ facial images and speech recordings as an attempt to develop simpler and cheaper methods to diagnose OSA.

    Objective: The objective of this study was to analyze possible relationships between OSA and speech and facial features on a female population and whether these possible connections may be affected by the specific clinical characteristics in OSA population and, more specifically, to explore how the connection between OSA and speech and facial features can be affected by gender.

    Methods: All the subjects are Spanish subjects suspected to suffer from OSA and referred to a sleep disorders unit. Voice recordings and photographs were collected in a supervised but not highly controlled way, trying to test a scenario close to a realistic clinical practice scenario where OSA is assessed using an app running on a mobile device. Furthermore, clinical variables such as weight, height, age, and cervical perimeter, which are usually reported as predictors of OSA, were also gathered.
    Acoustic analysis is centered in sustained vowels. Facial analysis consists of a set of local craniofacial features related to OSA, which were extracted from images after detecting facial landmarks by using the active appearance models. To study the probable OSA connection with speech and craniofacial features, correlations among apnea-hypopnea index (AHI), clinical variables, and acoustic and facial measurements were analyzed.

    Results: The results obtained for female population indicate mainly weak correlations (r values between .20 and .39). Correlations between AHI, clinical variables, and speech features show the prevalence of formant frequencies over bandwidths, with F2/i/ being the most appropriate formant frequency for OSA prediction in women. Results obtained for male population indicate mainly very weak correlations (r values between .01 and .19). In this case, bandwidths prevail over formant frequencies. Correlations between AHI, clinical variables, and craniofacial measurements are very weak.

    Conclusions: In accordance with previous studies, some clinical variables are found to be good predictors of OSA. Besides, strong correlations are found between AHI and some clinical variables with speech and facial features. Regarding speech feature, the results show the prevalence of formant frequency F2/i/ over the rest of features for the female population as OSA predictive feature. Although the correlation reported is weak, this study aims to find some traces that could explain the possible connection between OSA and speech in women. In the case of craniofacial measurements, results evidence that some features that can be used for predicting OSA in male patients are not suitable for testing female population.

    JMIR Mhealth Uhealth 2017;5(11):e169




    Sleep disorders are receiving increased attention as a cause of daytime sleepiness, impaired work, and traffic accidents and are associated with hypertension, heart failure, arrhythmia, and diabetes. The most common form of sleep-disordered breathing is the obstructive sleep apnea (OSA) syndrome, and it is characterized by an obstruction of the upper airway (UA) during sleep at the level of the pharynx, yielding partial (hypopnea) or total (apnea) breathing cessation episodes longer than 10 s at a time [1].

    The gold standard for the diagnosis of OSA is a full overnight polysomnography (PSG) test [2] performed in an attended laboratory setting. PSG monitors electrophysiologic variables to score sleep stages and detect arousals and cardiorespiratory variables to detect complete (apnea) or near-complete (hypopnea) cessation of airflow. The OSA severity is determined based on the number of apnea and hypopnea episodes per hour of sleep or apnea-hypopnea index (AHI; mild defined as an AHI of 5-15, moderate as 15-30, and severe as ≥30).

    However, PSG is expensive and time-consuming, and, furthermore, the recordings are performed in an unfamiliar environment for the patient. Therefore, faster, noninvasive, and less costly alternatives have been proposed for early OSA detection and severity assessment, such as unattended domiciliary sleep studies.

    Although overweight and an excess of regional adipose tissue are considered major risk factors for OSA, there are also other interacting elements in OSA pathogenesis, such as craniofacial abnormalities and an altered UA structure, being approached by several studies since the early approaches by means of the analysis of magnetic resonance imaging [3] until the photometry over digital photographs of head [4,5]. Among OSA phenotype-related characteristics are dental occlusion, longer distance between the hyoid bone and the mandibular plane as described by Lowe and coworkers [6], and relaxed pharyngeal soft tissues and large tongue base as described by Schwab and coworkers [3], which generally cause a longer and more collapsible UA. Consequently, abnormal or particular speech in OSA patients may also be expected from the altered structure or function of their UA.

    Therefore, several approaches to speech-based OSA detection have been developed since the acoustic perceptive analysis [7,8] until the most recent proposals for using automatic speech-processing techniques in OSA detection [9]. However, most of the previous mentioned publications have only focused on male subjects. To the best of our knowledge, there are no similar studies that concentrated on female OSA patients, and very few publications are available that discuss this issue [10,11].

    Consequently, the main purpose of this paper was to study the potential connection between AHI and speech and facial features, focusing on a female population. Furthermore, we have also considered that it might be interesting to compare our results on male versus female patients. In that way, we can observe how the connection between OSA and speech and facial features can be affected by gender.

    For an easy interpretation of our results, similar to [12], acoustic analysis is performed by evaluating formant frequencies and bandwidths on sustained phonations of vowel sounds. Facial features are extracted by identifying a set of relevant landmarks on subjects’ images, following also a rather simple procedure similar to the one we presented in [9]. Statistical analysis using correlation coefficients is employed to evaluate the connection between speech and facial features with AHI. To gain a better understanding of this connection, we have used statistical contrasts (Mann-Whitney U tests) among OSA severity groups.


    Subjects and Recording Procedure

    Patients were provided by the Hospital Quirón Salud de Málaga (Spain). The subjects referred for PSG previously reported symptoms of OSA during a preliminary interview with a pneumonologist, such as excessive daytime sleepiness, snoring, choking during sleep, or somnolent driving. By means of this interview, the subjects’ clinical history was obtained, and an exhaustive physical examination focusing on sleep-related symptoms, associated conditions, comorbidities, and anthropometrics measures was conducted and data collected. Subjects’ weight and height were recorded when wearing light clothes. Body mass index (BMI) was calculated as the ratio of body weight (in kg) and the height (in m2). Cervical perimeter (in cm) was also measured at the level of cricothyroid membrane. Most of the subjects are from Andalusia (southern Spain). The majority of subjects were white, with the exception of 1 Chinese. Exclusion criteria included subjects with no Andalusian dialect, subjects with a known history of syndromal craniofacial abnormalities, subjects who have had craniofacial surgery, ethnicity, and subjects with excessive facial hair that significantly obscured facial landmarks, as well as subjects with photograph capture errors (eg, inclination, bad position).

    The diagnosis for each patient was confirmed by specialized medical stuff through standard overnight PSG test, obtaining the AHI on the basis of the number of apnea and hypopnea episodes. According to subjects’ AHI, we defined three groups of OSA severity: low AHI (<10) indicates a healthy subject, AHI between 10 and 30 indicates mild OSA patient, whereas AHI above 30 is associated with severe OSA. These thresholds were defined to get balanced number of samples for our statistical contrast analysis. Figure 1 illustrates the data collection process.

    Figure 1. Flowchart of recording data for apnea database. AHI: apnea-hypopnea index; OSA: obstructive sleep apnea.
    View this figure

    Before the PSG test, all patients were taken to a separate room with adequate acoustic condition and the recording equipment for collecting speech and photographic data, after obtaining patients’ consent. Speech and photographic data are explained as follows:

    • Acoustic data: Sustained phonations of each Spanish vowel /a/, /e/, /i/, /o/, and /u/ were recorded from every subject at an upright or seated position and with a comfortable speech level in a quiet room. Recording equipment was a standard laptop computer equipped with an SP500 Plantronics headset microphone. Speech was recorded at a sampling frequency of 50 kHz and encoded in 16 bits. Afterwards, it was downsampled to 16 kHz before processing.
    • Photographic data: Frontal and profile digital photographs of the head were obtained before the speech recordings, also at the same normal hospital room without any particular illumination condition. In contrast to the studies by Lee and coworkers [4,5], no special actions were taken beyond a simple control for patients’ front and profile photographs and some instructions to guarantee that the neck area is visible in the profile image. No calibration action for allowing the conversion from pixel measurements to metric dimensions (eg, measuring the distance from the camera) was taken, and manual identification by palpation of facial landmarks was also avoided. A standard Logitech QuickCam Pro 5000 webcam was used to collect images with a size of 640 × 480 pixels and a color depth of 24 bits.

    It is important to point out that the recording protocol was approved by the Institutional Review Committee of the Hospital Quirón Salud de Málaga and performed strictly following the ethical consideration of the medical center. The participants were notified about the research and their signed agreement was obtained.

    After applying exclusion criteria, a total of 383 subjects (129 women and 254 men) were included in our study. The female population comprised 64 subjects in OSA group (AHI>10) and 65 in control group (AHI≥10). The male population comprised 168 subjects in OSA group (AHI>10) and 86 in control group (AHI≥10). Descriptive statistics of subjects under study are summarized in Table 1.

    Acoustic Features

    We focused on formant central frequencies and bandwidths because evidence on the influence of sleep apnea on them has been previously reported by Rob and coworkers [13]. Formants represent resonances of the vocal tract and depend on the UA properties, including its compliance, shape, and dimensions. Hence, these may embed information from specific physiological characteristics in OSA patients, although results shall vary from one sound to another [14]. As mentioned previously, in this contribution, we focused on sustained phonations, which is the common approach for pathologic voices, and apnea may essentially be regarded as one.

    Despite these elementary considerations, measuring formant frequencies can be extremely difficult as it is highly influenced by multiple factors, including the method of analysis that is chosen and the analysis settings. Moreover, higher resonances are much more difficult to determine than lower ones because of natural energy losses. Our evaluation on acoustic measurements has shown that, for formants F4 and above, no reliable information could be extracted, and therefore, we restricted our analysis to the first 3 formants. To extract a consistent set of measures on formants’ central frequencies and bandwidths, we followed a specific protocol. First, we computed the values for the first 3 formant central frequencies and bandwidths using 2 different freely available software: the Praat Version 6.0.30 (Praat software, Amsterdam) [15] and the Snack Toolkit Version 2.2.8 (Snack Sound Toolkit, Sweden) [16]. Formant frequencies and bandwidths were estimated every 5 ms using 25-ms long analysis windows. Their values were finally obtained by averaging along the most stable regions of the sustained phonations of each vowel, selecting a steady-state segment of 800 ms where the standard deviation of formant contours was the lowest, excluding initial and ending silences in each utterance.

    Table 1. Descriptive statistics on Spanish female and male subjects.
    View this table

    To guarantee a reliable estimation, we measured the absolute differences between estimated values obtained from Praat and Snack for each formant F1-F3. We then manually reviewed those cases for which differences exceeded 70 Hz for F1 and F2 and 150 Hz for F3. These thresholds match the level of accuracy in the reference study by Robb and coworkers [13] and seem consistent with values seen in studies that compare results from Praat with those from Snack [17]. In most cases for which deviations exceeded the prespecified thresholds, one of the two values that had been computed (the one from either Snack or Praat) was found to be incorrect (most often when a formant was skipped). In these cases, the erroneously estimated value was subsequently removed, and the value provided by the other software was retained. In some other cases, both Snack and Praat failed in providing precise results. In those cases, values for formant central frequencies and bandwidths had to be manually selected using spectrograms and linear predictive coding (LPC) analysis. The decision on the number of poles for an optimal fitting of the LPC envelope was based on the general knowledge about the formant structure of each vowel. Values for formants’ central frequencies were obtained as maxima values of the LPC spectral slope, whereas their associated bandwidths were computed by measuring the frequency region around formants’ central frequency within which the spectral envelope amplitude differs −3 dB from the maxima values.

    Facial Features

    Facial features were similar than those studied by Lee and coworkers [4,5], including local measurements (ie, areas, distances, angles) extracted from landmarkings on photographs. Major differences in our approach when compared with that of Lee and coworkers [4,5] are the use of supervised automatic image processing and the definition of more robust craniofacial measurements adapted to our less controlled photography capture process.

    Manual annotation of all images can be tedious, and, even when done by skilled personnel, it is prone to errors because of subjectivity. Consequently, we decided to use a widely used automatic landmarking method, first introduced by Cootes and coworkers [18], based on active appearance model (AAM). On the basis of a priori knowledge of landmark positions, AAM combines a statistical model, which represents the variation of shape and texture of the object, with a gradient-descent fitting algorithm. As depicted in Figure 2, in AAMs for frontal and profile photographs, we used a grid of 52 landmarks taken from a general face identification system and a set of 24 landmarks including specific marks for the neck area, respectively.

    During the training stage, frontal and profile AAMs were built from a set of manually annotated photographs using the aam_tools Version 3.0 (aam_tools software, Manchester) [19].

    Figure 2. Landmarks on frontal and profile views.
    View this figure

    During the fitting stage, starting from a reasonable landmark initialization, the AAM algorithm iteratively corrects the appearance parameters by minimizing the squared error to represent the texture of the target face. Although the AAM performs well for representing shape and appearance variations of an object, the model is location-sensitive to the face’s position. In this study, this effect is increased because photographs were not taken following a highly controlled procedure (illumination conditions, control of distance from the camera, and control of frontal and profile positions). Hence a human-supervised stage was found necessary in order to supervise and, if necessary, correct some large deviations in the automatically generated landmarks.

    Once landmarks were generated, we proceeded to extract a set of local features, similar to those studied by Lee and coworkers [4,5] but adapted to our less controlled photographic process. These measurements are described in the following sections:

    • Cervicomental contour ratio. One of the anatomical risk factors for OSA is the fat deposition on the anterior neck [20]. This risk factor is captured by a measurement proposed by Lee and coworkers [4,5], that is the cervicomental angle, which is formed by the horizontal plane of the submental region and the vertical plane of the neck. The fat deposition on the anterior neck will cause an increase of this angle. However, considering our limited photography capture process, it is extremely difficult to detect points such as cervical point, thyroid, cricoid, neck plane, or sternal notch involved in the cervicomental region. Consequently, we defined an alternative measurement, more robust to both our image capture and automatic landmarking processes. This measurement was defined using a contour in the cervicomental region traced by 6 landmarks placed equidistantly (ie, landmarks 11, 12, and 20-23 in Figure 3), which were annotated with high reliability following our semiautomatic AAM method. Therefore, the relative measurement of fat deposition on the anterior neck was calculated as the ratio of cervicomental-related area within the rectangular region (ie, yellow solid line defined by landmarks 11, 12, and 20-23, and the bottom right vertex landmark V of the rectangle as depicted in Figure 2) and the area of the rectangular region (ie, black dashed line defined by bottom left landmark 23 and upper right landmark 11 as depicted in Figure 3). This results in an uncalibrated measurement with a value that decreases as the fat deposition on the anterior neck increases.
    • Face-width ratio. Lee and coworkers studied the relationship between surface facial dimensions and UA structure in subjects with OSA by means of analysis of magnetic resonance images [21]. Significant positive correlations were detected between surface facial dimensions and UA structures, in particular midface width and interocular width. On the basis of these results, we used these 2 facial dimensions to define a face-width uncalibrated measurement as the midface width to interocular width ratio. The corresponding landmarks and measurements are depicted in Figure 4.
    • Tragion-ramus-stomion angle. Lowe and coworkers [6] reported that patients with OSA had retracted mandibles, which is related to the inclination of the occlusal plane and the angle between the relative position of the maxilla to mandible. On the basis of [6], we proposed an uncalibrated measure (ie, an angle) intended to capture, to some extent, the characteristic mandible position or mandibular retraction in OSA individuals. To define this angle, we selected a set of landmarks that not only are related to the posterior displacement of the mandible but also could be accurately detected by our automatic landmarking process on the photographs without need of prior marking. The proposed measurement, as depicted in Figure 5, is the angle between the line ramus-stomion (landmarks 16 and 6) and the ramus-tragion (landmarks 16 and 18).
    Figure 3. Measurements used for the cervicomental contour ratio.
    View this figure
    Figure 4. Measurements used for the face-width ratio.
    View this figure
    Figure 5. Tragion-ramus-stomion angle.
    View this figure

    Statistical Analysis

    To describe our results, we used the strength of the Spearman correlation coefficient as described by Fowler and coworkers [22], that is, values between .01 and .19 are regarded as very weak, .2 and .39 as weak, .40 and .69 as modest, .70 and .89 as strong correlation, and in the range of .90 to .99 as very strong. Values are reported hereafter as mean (SD) and range.

    Mann-Whitney U test (Wilcoxon rank-sum test) was used to assess significant differences between control and OSA groups because data were not normally distributed.

    We conducted our statistical analysis using the Statistic and Machine Learning Toolbox of Matlab.


    Due to the possible effect of clinical variables on correlation between AHI and speech and craniofacial characteristics, we first analyzed the correlation between clinical variables, speech features, craniofacial features, and AHI. Moreover, in order to observe how the connection between OSA and speech can be affected by gender, we also compared correlations between both genders.

    Clinical Variables Analysis

    Table 2 presents the Spearman correlation coefficient between clinical variables and AHI for both genders.

    As can be seen in Table 2, the strongest correlation for female population found was between age and AHI. Correlations between cervical perimeter, BMI, and AHI are also significant but weak, as well as height, in which case the detected weak correlation is negative. In contrast, in male population, the second strongest correlation found was between weight and AHI, although weak at Fowler scale.

    In a comparison by gender, the strongest correlation with AHI is different for each gender: age in the case of women (r=.52, P=.001) and cervical perimeter in the case of men (r=.42, P=.001). That is, generally, for both genders, AHI presents significant correlations with age and parameters strongly related to obesity, such as weight, BMI, and cervical perimeter, which are known as risk factors for OSA.

    Table 2. Spearman correlations between clinical variables and apnea-hypopnea index (AHI) on the female (n=129) and male (n=254) population (for clarity, nonsignificant correlation values are omitted) .
    View this table
    Table 3. Descriptive statistics of the formant frequencies and bandwidth of vowels on the female population (n=129).
    View this table

    Acoustic Features Analysis

    Table 3 presents the mean, standard deviation, and the range of formant frequencies and bandwidth for vowels /a/, /e/, /i/, /o/, and /u/ for the female population.

    Because of the association between the blockage of the UA and OSA, abnormal or particular speech may be expected in subjects with OSA due to the altered structure of their UA. Likewise, the association between clinical variables (ie, height and weight) and speech [12] is known; thus, indirect association might be expected between speech and OSA. Accordingly, correlations between formant frequencies, bandwidths, and clinical variables are presented in Table 4.

    Focusing on formant frequencies, Table 4 shows that the highest, though weak, correlations are found with AHI, age, and cervical perimeter. Surprisingly, none of these formants are correlated with weight, BMI, or height. Moreover, results show that there are 3 formants (F1/a/, F2/e/, and F2/i/), which present weak negative correlation with AHI (r=−.26, P=.001; r=−.24, P=.01; r=−.26, P=.001; respectively). It should be noted that F2/i/ is the only formant correlated with AHI but not correlated with other clinical variables. Likewise, most of the significant correlations with formants were for age, with up to 8 formants. It is known that human voice changes with age [23], which leads us to think that age may cause indirect influence on a relationship between formant frequencies and AHI.

    When considering the results for bandwidths in Table 4, only very weak correlations appear: weight negatively correlated with BW1/a/, height with BW3/o/, and age with BW2/a/ and BW2/e/, but no significant correlation was obtained between bandwidths and AHI.

    To analyze the gender influence, correlation results in Table 4 were compared with those of a male population, published in our previous study [12] (Table 5). Those results include most of male subjects of the population used in this paper. Given that the difference is very small, we have preferred to use the already published tables with 241 subjects instead of publishing a slightly different one with 254 male subjects.

    Table 4. Statistically significant Spearman correlation between formant frequencies, bandwidths, and clinical variables on the female (n=129) population (for clarity, nonsignificant correlation values are omitted).
    View this table

    According to Table 5, contrary to the results for the female population, only bandwidths present correlation with AHI—BW2/a/ (r=.13, P=.05) and BW3/e/ (r=−.17, P=.01)—but formants do not. The overall results of speech features show that negative correlation coefficients are common between formants, bandwidths, and age. Furthermore, generally those values are smaller (weak at Fowler scale) in both genders.

    This finding on the female population showed that 2 of the 3 formant frequencies correlated with AHI also have significant correlation with age (F1/a/ and F2/e/), which leads us to think that age may cause indirect influence on a relationship between formant frequencies and AHI. Similarly, in male population, BW3/e/ is also correlated with weight and BMI, which may indicate an indirect correlation with AHI.

    To analyze in detail the influence of each clinical variable on correlation between speech features and OSA, a general review is provided for both genders.

    First, we can see that both for male and female populations, most of the significant correlations between acoustic features and clinical variables are linked to age. This is in agreement to several studies on age-related acoustic characteristics, in which different speech features have been reported to correlate with age and have been linked to changes in anatomy and physiology of the speech production system [23]. Some specific studies have reported age-related changes to formants, particularly in the production of vowels. According to these studies, a negative correlation among formants and age, as is also found in our study, can be expected. This lowering of vowel formants with age can presumably be a by-product of the lowering of the vocal folds over the life span in an adult subject, which results in a longer vocal tract [24,25], and with a trend to vowel centralization in older subjects [17,26]. In some cases, these changes have been found to occur only on some particular vowels [25,26]. It should be noted that all the mentioned studies about this issue were performed for both genders.

    Table 5. Statistically significant Spearman correlation between formant frequencies and clinical variables on the male (n=241) population (for clarity, nonsignificant correlation values are omitted).
    View this table

    Considering weight and height, no significant correlations were found for female subjects. These results do not agree with those reported by González [27], where weak and modest correlations with weight and height were found: F2/e/ and height (r=−.51), and F2/e/ and weight (r=−.50). According to González [27], it seems that the most informative parameters for female height and weight were the second and the third formants from the /a/, /e/, and /i/ vowels. In the case of male population, there are no similarities with that study either. However, unlike women, there are several speech variables with significant correlations with height and weight (see Table 5). In the research by González, stronger correlations were reported for male subjects, mainly between F2/e/ and height (r=−.57) and F4/o/ and weight (r=−.48), whereas in the OSA male population in [12] the higher correlation coefficient values were obtained between F3/e/, F2/i/, and height (r=−.21, P=.001, both), and between BW1/a/ and age (r=−.21, P=.001). Likewise, in case of BMI, no significant correlation with formants was found. One may expect formants’ bandwidths to be larger for OSA patients as an increase in both velar and pharyngeal compliance could result in increased sound damping within the vocal tract [13]. However, only one significant negative correlation was detected between BMI and BW1/a/ (r=−.19, P=.03) for female patients. A similar situation was found for male patients (BW3/e/ with r=−.13, P=.05 and BW2/o/ with r=−.14, P=.03). Despite these clear differences in our studies, both point toward a similar direction: formants seem to be weak predictors of body size in both women and men. Just as in our previous discussion regarding age, it is possible to hypothesize that these significant though weak correlations with height or weight may interfere with specific acoustic characteristics related to OSA.

    Finally, cervical perimeter is another feature that is commonly used in discriminating between healthy subjects and OSA patients. More specific than BMI, neck circumference can describe how excessive weight may increase tissue bulk in the neck, which will also increase the dynamic loading of the airway, thus contributing to the pathogenesis of OSA [28]. In the female OSA population under study and similar to what we have found for the other body size measurements, only few significant and weak correlations appeared between cervical perimeter and speech: with F1/a/ and F3/a/ (r=−.24, P=.01 and r=−.21, P=.02, respectively), and F2/o/ (r=−.20, P=.02). Analogous results were found for male subjects (see Table 5). Several previous studies have similarly failed to find modest relationships between voice acoustics and body size effects measured through BMI [28], body mass composition [29] or weight, and neck circumference [30].

    Craniofacial Features Analysis

    In this section, descriptive statistics on the female (129) and male (254) subjects under study are shown as well as correlation analysis between craniofacial features and OSA through the AHI. The craniofacial analysis comprises the 3 craniofacial measurements extracted from the landmarks, previously annotated, on patient photographs. Similar to acoustic features, differences by gender were also analyzed. Table 6 presents the mean, SD, and the range of craniofacial measurements. In Table 7, correlations between craniofacial measurements and clinical variables are presented for both genders.

    In case of female population, as described in Table 7, all 3 craniofacial measurements present significant but weak correlation with AHI. Cervicomental contour ratio is also modestly correlated with BMI, weight, and cervical perimeter. As regards the face-width ratio, there is a weak negative correlation with height and positive correlation with BMI.

    In case of male population, all 3 craniofacial measurements also present correlation with AHI. Furthermore, the strongest correlations are modest, negative, and correspond to BMI, cervical perimeter, and weight, both in men and women.

    Furthermore, both genders report significant correlations between all 3 craniofacial measurements and AHI: positive in the case of face-width ratio (r=.18, P=.04 for women; r=.23, P=.001 for men), negative for cervicomental contour ratio (r=−.23, P=.01 for women; r=−.37, P=.001 for men), and TRG angle (r=−.19, P=.03 for women; r=−.12, P=.05 for men).

    Table 6. Descriptive analysis of the craniofacial measurements on Spanish female and male subjects.
    View this table
    Table 7. Statistically significant Spearman correlation between craniofacial measurements and clinical variables on the female (n=129) and male (n=254) population (for clarity, nonsignificant correlation values are omitted) .
    View this table

    In general, male population presents stronger values. Indeed, cervicomental contour ratio has the strongest correlation with AHI in both groups. However, as it was pointed out before, this craniofacial measurement also has modest correlation with BMI, weight, and cervical perimeter. Hence, an underlying connection between AHI and cervicomental contour ratio through these clinical variables may exist.

    Similar to what was considered for the acoustic feature analysis, we now analyze the influence of each clinical variable on the correlation between craniofacial features and OSA for both genders.

    In the case of age, despite the changes in the facial skeleton that occur with aging, only one significant negative correlation between age and craniofacial measurements was found for both men and women: a very weak correlation with cervicomental contour ratio (r=−.17, P=.01) in the case of male patients and a weak one with TRG angle (r=−.24, P=.001) in the case of female patients.

    As for height, there is only one significant weak correlation with face-width ratio (r=−.21, P=.02) in female subjects, and no significant correlation was found in male subjects. Regarding this item, there are some controversial conclusions within the scientific community; some of the researches reported strong relationship between craniofacial parameters and stature [31], whereas some of them have not [32,33].

    Considering now BMI, weight, and cervical perimeter, in female subjects (Table 7) the more relevant correlations correspond to BMI (r=−.66, P=.001), weight (r=−.65, P=.001), and cervical perimeter (r=−.58, P=.001) with cervicomental contour ratio. In male subjects, higher significant correlations are also related to cervicomental contour ratio with the same clinical parameters (BMI: r=−.59, P=.001; weight: r=−.49, P=.001; and cervical perimeter: r=−.59, P=.001). These results point to cervicomental contour ratio related to the neck and under-the-chin fat depositions as the most likely of facial measurements to be a possible risk factor for OSA.

    Statistical Contrasts Among OSA Severity Groups

    In the previous sections, we have studied the correlation between the full AHI range and the set of speech/craniofacial features. In this section, we analyze whether or not these features can be discriminative between two female populations: a control group, defined for AHI<10 and an OSA group for AHI≥10. Similar analyses for male populations were presented by Robb and coworkers in [13] and by ourselves in [12].

    Statistical contrasts using Mann-Whitney U test among control and OSA groups are presented in Table 8. Looking at the results in Table 8, it can be seen that most of the discriminative speech features reported by Robb are not detected. Only a significant difference in F2/i/ is present, whereas a few novel differences arise for F2/e/, F3 /i/, and BW2/e/.

    Table 8. Contrast among control and obstructive sleep apnea (OSA) severity groups on the female population (N=129).
    View this table

    We have reported results for a similar contrast among control and OSA groups for men in [12] (Table 9). This allows us to compare results for female and male populations (Tables 8 and 9, respectively) and see that only F3/i/ appears in both populations. It is also interesting to notice that for males, the remainder significant differences appear only in bandwidths BW1/o/, BW3/o/, BW2/a/, and BW3/e/.

    If we analyze now the statistical differences among control and OSA groups for the clinical variables (also shown in Tables 8 and 9), we can see that only weight in females and height in males present no statistical differences. Consequently, it must be concluded the presence of indirect influences of speech and AHI mediated through the rest of clinical variables.

    A similar statistical contrast between control and OSA groups was made for craniofacial features. Results showed no significant differences between groups for the female population (see Table 8). Results for our male population are presented in Table 10. These results show significant statistical differences in cervicomental contour ratio and face-width ratio. This points out that the studied facial measurements are more suitable for estimating the AHI in male subjects.

    Matched Groups

    As discussed before, our results indicate that there can be an indirect relationship between AHI and both speech and craniofacial features mediated through the clinical variables (age, weight, height, BMI, and cervical perimeter). To evaluate this indirect effect, statistical contrasts are again presented for control and OSA groups but now selected to exhibit no statistical differences among the clinical variables. Thus, the objective was to test whether or not statistical differences previously observed in Tables 8 and 9 (with unmatched values in clinical variables) remain in a matched condition (ie, when there are no statistical differences in clinical variables among control and OSA groups).

    Results on matched groups for female population are presented in Table 11, which correspond to control and OSA groups including subjects in the age range of 41 to 55 years and BMI≥25 so that no statistical differences in clinical variables appear.

    Table 9. Contrast among control and obstructive sleep apnea (OSA) severity groups on a male (N=241) population.
    View this table
    Table 10. Contrast among control and obstructive sleep apnea (OSA) groups on the male (N=254) population.
    View this table
    Table 11. Contrast between control and obstructive sleep apnea groups on a subset without differences either on age (41-55 years) or on body mass index (≥25) on the female population.
    View this table
    Table 12. Contrast between control and obstructive sleep apnea (OSA) groups on a subset without differences either on age (≤46) or on body mass index (≤30) on the male population.
    View this table

    As Table 11 illustrates, once the possible effect of age and BMI is minimized, only significant difference in F2/i/ remains, whereas the differences for F2/e/, F3/i/, and BW2/e/ disappear. This result is coherent with correlations in Table 4, where it can be noted that F2/i/ is only correlated with AHI, whereas correlation with some clinical variables appear for F2/e/ and BW2/e/. Also by comparing Table 8 with Table 11, it can be observed that, as it is reasonable, there are no significant differences in craniofacial measurements in both tables.

    Matched results for the male population selecting individuals with age ≤46 and BMI in the range of 25 to 30 are presented in Table 12. Results in this table indicate that only the significant difference in cervicomental contour ratio remains, which indicates that the neck fat deposition is a possible risk factor for OSA in male population, as it was pointed before because higher significant correlation was related to this craniofacial feature (see Table 7).


    Principal Findings

    The results of this investigation indicate that acoustic and facial measurements in a female population have weaker correlation with AHI than with clinical variables. Significant correlations for female individuals (mainly weak correlations) are somewhat stronger than those for male subjects (mainly very weak).

    In the studied female population, formant frequencies seem to prevail over bandwidths. Specifically, F2/i/ is the speech variable that showed to be a good predictor of OSA syndrome, as it is the only acoustic measurement that remains after contrasting OSA and non-OSA individuals, both unmatched (Table 8) and matched (Table 11), with clinical variables. Regarding craniofacial parameters, according to the results, the particular facial features that we have studied are not suitable to distinguish between OSA and non-OSA female subjects.

    In the case of male population, bandwidths seem to prevail in their correlation to AHI over formant frequencies. BW2/a/ and BW3/e/ are the only ones that remain after the same contrast analysis using groups matched in clinical variables (Table 13). Considering craniofacial measurements, cervicomental contour ratio is the variable that is still present after the contrast analysis using matched groups (see Table 12). This outcome suggests that the use of craniofacial measurements is more appropriate to differentiate OSA-affected male patients.


    We are aware that our research has several limitations. The first one is that the results presented in this study are limited to Spanish subjects, most of whom are speakers of a single Spanish dialect, the Andalusian. Consequently, a cross-language comparison should be made. Another limitation is that measuring formant frequencies and bandwidths is technically problematic, and it always achieves limited precision. To obtain results with higher accuracy and reliability, future studies will need to examine possible impact of different factors, such as patient’s position or time of the day, during the data collection process, as acoustic differences may be expected.

    With regard to craniofacial features, we have only explored uncalibrated craniofacial measurements because we have limited our study to simulate an OSA assessment app running on a mobile device. Our research may also be limited by the precision of the measurements, particularly in the case of the craniofacial measurements.


    An important outcome of our investigation is that there may be a possible underlying impact of clinical variables on the correlations between voice features and OSA. Thus, future research should consider new speech analysis techniques capable of properly compensating unwanted variability due to clinical variables. In the case of craniofacial measurements, the results suggest that the features used in this study are more suitable for male patients than for female patients. Therefore, searching for those specific features that are more convenient for female subjects would be interesting to try to improve the assessment techniques of OSA in women.

    Moreover, besides the known OSA risk factors, there are other disorders that can cause OSA, such as hypothyroidism [34] and acromegaly [35] disorders, which can give different craniofacial representation. Comparing these features in different groups could skew the data. Therefore, future studies should also contemplate these related OSA conditions as exclusion criteria to avoid false discoveries.

    Table 13. Contrast between control and obstructive sleep apnea (OSA) groups on a subset without differences either on age or on height on the male population.
    View this table


    The activities presented in this paper were funded by the Spanish Ministry of Economy and Competitiveness and the European Union (FEDER) as part of the TEC2015-68172-C2-2-P (Deep & Subspace Speech Learning, DSSL) project. The authors also thank Sonia Martinez Díaz for her effort in collecting the OSA database that is used in this study.

    Conflicts of Interest

    None declared.


    1. Lam JC, Sharma SK, Lam B. Obstructive sleep apnoea: definitions, epidemiology & natural history. Indian J Med Res 2010 Feb;131:165-170 [FREE Full text] [Medline]
    2. [No authors listed]. AARC-APT (American Association of Respiratory Care-Association of Polysomnography Technologists) clinical practice guideline. Polysomnography. Respir Care 1995 Dec;40(12):1336-1343. [Medline]
    3. Schwab RJ, Pasirstein M, Pierson R, Mackley A, Hachadoorian R, Arens R, et al. Identification of upper airway anatomic risk factors for obstructive sleep apnea with volumetric magnetic resonance imaging. Am J Respir Crit Care Med 2003 Sep 1;168(5):522-530. [CrossRef] [Medline]
    4. Lee RW, Chan AS, Grunstein RR, Cistulli PA. Craniofacial phenotyping in obstructive sleep apnea--a novel quantitative photographic approach. Sleep 2009 Jan;32(1):37-45 [FREE Full text] [Medline]
    5. Lee RW, Petocz P, Prvan T, Chan AS, Grunstein RR, Cistulli PA. Prediction of obstructive sleep apnea with craniofacial photographic analysis. Sleep 2009 Jan;32(1):46-52 [FREE Full text] [Medline]
    6. Lowe AA, Fleetham JA, Adachi S, Ryan CF. Cephalometric and computed tomographic predictors of obstructive sleep apnea severity. Am J Orthod Dentofacial Orthop 1995 Jun;107(6):589-595. [Medline]
    7. Fox AW, Monoson PK, Morgan CD. Speech dysfunction of obstructive sleep apnea. A discriminant analysis of its descriptors. Chest 1989 Sep;96(3):589-595. [Medline]
    8. Monoson PK, Fox AW. Preliminary observation of speech disorder in obstructive and mixed sleep apnea. Chest 1987 Oct;92(4):670-675. [Medline]
    9. Espinoza-Cuadros F, Fernández-Pozo R, Toledano DT, Alcázar-Ramírez JD, López-Gonzalo E, Hernández-Gómez LA. Speech signal and facial image processing for obstructive sleep apnea assessment. Comput Math Methods Med 2015;2015:489761 [FREE Full text] [CrossRef] [Medline]
    10. Goldshtein E, Tarasiuk A, Zigel Y. Automatic detection of obstructive sleep apnea using speech signals. IEEE Trans Biomed Eng 2011 May;58(5):1373-1382. [CrossRef] [Medline]
    11. Solé-Casals J, Munteanu C, Martín OC, Barbé F, Queipo C, Amilibia J, et al. Detection of severe obstructive sleep apnea through voice analysis. Appl Soft Comput 2014;23:346-354. [CrossRef]
    12. Montero-Benavides A, Blanco-Murillo JL, Fernández-Pozo R, Espinoza-Cuadros F, Torre-Toledano D, Alcázar-Ramírez JD, et al. Formant frequencies and bandwidths in relation to clinical variables in an obstructive sleep apnea population. J Voice 2016 Jan;30(1):21-29. [CrossRef] [Medline]
    13. Robb MP, Yates J, Morgan EJ. Vocal tract resonance characteristics of adults with obstructive sleep apnea. Acta Otolaryngol 1997 Sep;117(5):760-763. [Medline]
    14. Blanco-Murillo JL, Fernández-Pozo R, López-Gonzálo E, Hernández-Gómez LA. Exploring differences between phonetic classes in sleep apnoea syndrome patients using automatic speech processing techniques. The Phonetician 2008;97(1):35-54 [FREE Full text]
    15. Boersma P, Weenink D. Praat. Amsterdam: University of Amsterdam Praat: doing phonetics by computer   URL: [accessed 2017-10-26] [WebCite Cache]
    16. Sjölander K. Sweden: KTH Royal Institute of Technology The snack sound toolkit   URL: [accessed 2017-10-26] [WebCite Cache]
    17. Liss JM, Weismer G, Rosenbek JC. Selected acoustic characteristics of speech production in very old males. J Gerontol 1990 Mar;45(2):P35-P45. [Medline]
    18. Cootes TF, Edwards GJ, Taylor CJ. Active appearance models. IEEE Trans Pattern Anal Mach Intell 2001 Jun;23(6):681-685. [CrossRef]
    19. Cootes T. AAM tools. Manchester: The University of Manchester   URL: [accessed 2017-10-26] [WebCite Cache]
    20. Mortimore IL, Marshall I, Wraith PK, Sellar RJ, Douglas NJ. Neck and total body fat deposition in nonobese and obese patients with sleep apnea compared with that in control subjects. Am J Respir Crit Care Med 1998 Jan;157(1):280-283. [CrossRef] [Medline]
    21. Lee RW, Sutherland K, Chan AS, Zeng B, Grunstein RR, Darendeliler MA, et al. Relationship between surface facial dimensions and upper airway structures in obstructive sleep apnea. Sleep 2010 Sep;33(9):1249-1254 [FREE Full text] [Medline]
    22. Fowler J, Cohen L, Jarvis P. Practical Statistics for Field Biology. New York: John Wiley & Sons; 1998.
    23. Torre 3rd P, Barlow JA. Age-related changes in acoustic characteristics of adult speech. J Commun Disord 2009;42(5):324-333. [CrossRef] [Medline]
    24. Benjamin BJ. Speech production of normally aging adults. Semin Speech Lang 1997 May;18(2):135-141. [CrossRef] [Medline]
    25. Xue SA, Hao GJ. Changes in the human vocal tract due to aging and the acoustic correlates of speech production: a pilot study. J Speech Lang Hear Res 2003 Jun;46(3):689-701. [Medline]
    26. Rastatter MP, McGuire RA, Kalinowski J, Stuart A. Formant frequency characteristics of elderly speakers in contextual speech. Folia Phoniatr Logop 1997;49(1):1-8. [Medline]
    27. González J. Formant frequencies and body size of speaker: a weak relationship in adult humans. J Phon 2004 Apr;32(2):277-287. [CrossRef]
    28. Lee BJ, Ku B, Jang JS, Kim JY. A novel method for classifying body mass index on the basis of speech signals for future clinical applications: a pilot study. Evid Based Complement Alternat Med 2013;2013:150265 [FREE Full text] [CrossRef] [Medline]
    29. Hamdan AL, Al Barazi R, Khneizer G, Turfe Z, Sinno S, Ashkar J, et al. Formant frequency in relation to body mass composition. J Voice 2013 Sep;27(5):567-571. [CrossRef] [Medline]
    30. Rendall D, Kollias S, Ney C, Lloyd P. Pitch (F0) and formant profiles of human vowels and vowel-like baboon grunts: the role of vocalizer body size and voice-acoustic allometry. J Acoust Soc Am 2005 Feb;117(2):944-955. [Medline]
    31. Anibor E, Eboh DE, Etetafia MO. A study of craniofacial parameters and total body height. Pelagia Res Libr Adv Appl Sci Res 2011;2(6):400-405 [FREE Full text]
    32. Pelin C, Zağyapan R, Yazici C, Kürkçüoğlu A. Body height estimation from head and face dimensions: a different method. J Forensic Sci 2010 Sep;55(5):1326-1330. [CrossRef] [Medline]
    33. Rexhepi AM, Brestovci B. Prediction of stature according to three head measurements. Int J Morphol 2015 Sep;33(3):1151-1155. [CrossRef]
    34. Rajagopal KR, Abbrecht PH, Derderian SS, Pickett C, Hofeldt F, Tellis CJ, et al. Obstructive sleep apnea in hypothyroidism. Ann Intern Med 1984 Oct 1;101(4):491-494. [CrossRef]
    35. Mickelson SA, Senior BA, Rosenthal LD, Rock JP, Friduss ME. Obstructive sleep apnea syndrome and acromegaly. Otolaryngol Head Neck Surg 1994;111(1):25-30. [CrossRef]


    AAM: active appearance model
    AHI: apnea-hypopnea index
    BMI: body mass index
    BW: bandwidth
    F: formant
    LPC: linear predictive coding
    OSA: obstructive sleep apnea
    PSG: polysomnography
    TRG: tragion-ramus-stomion
    UA: upper airway

    Edited by C Dias; submitted 19.06.17; peer-reviewed by C Rabec, H Mishra, N Ocal; comments to author 25.07.17; revised version received 14.09.17; accepted 21.09.17; published 06.11.17

    ©Marina Tyan, Fernando Espinoza-Cuadros, Rubén Fernández Pozo, Doroteo Toledano, Eduardo Lopez Gonzalo, Jose Daniel Alcazar Ramirez, Luis Alfonso Hernandez Gomez. Originally published in JMIR Mhealth and Uhealth (, 06.11.2017.

    This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR mhealth and uhealth, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.