Arabic has a relatively large number of speech sounds whose primary or secondary articulation lies in the pharynx. Among these sounds are the pharyngeal class, including /? ?/ < ? ?> and the emphatic or pharyngealized class, including (in Standard Arabic) /s? d? t? ð?/ < ? ? ? ? >, which stand in phonemic contrast to the plain class /s d t ð/ < ? ? ? ? >. Some examples of the plain/emphatic contrast include the following minimal pairs: /nasaba/ ‘imputed’ vs. /nas?aba/ ‘erected’; /tin/ ‘fig’ vs. /t?in/ ‘clay’; and /darb/ ‘path’ vs. /d?arb/ ‘hitting’. An emphatic tap has been posited as a marginal phoneme in at least one dialect (Watson 2002). In addition, a variety of other consonants may be realized as emphatic allophones: [b? l? m?] (Watson 2002). Symbolic transcriptions tend to suggest uniformity in the production of a speech sound, which may be misleading. According to Ladefoged (1993, p. 280), “As soon as [phonetic] data is segmented or described in any way … phonological considerations are bound to be present.” For both the emphatics and pharyngeals, a symbolic transcription seems too phonologically reductive to capture the degree of phonetic variation attested in the literature. Both emphatics and pharyngeals still require significant study, particularly in terms of whole-vocal-tract imaging, to better understand their articulatory and consequent acoustic characteristics across speakers, dialects, and speech styles. Because the sounds are relatively well-studied in Arabic, it will be beneficial to study the sounds as they are realized in other Semitic and Caucasian languages, as well (Maddieson 2009).
Arabic has a relatively large number of speech sounds whose primary or secondary articulation lies in the pharynx. Among these sounds are the pharyngeal class, including /ħ ʕ/ < ح ع> and the emphatic or pharyngealized class, including (in Standard Arabic) /sˁ dˁ tˁ ðˁ/ < ظ ط ض ص >, which stand in phonemic contrast to the plain class /s d t ð/ < س د ت ذ >. Some examples of the plain/emphatic contrast include the following minimal pairs: /nasaba/ ‘imputed’ vs. /nasˁaba/ ‘erected’; /tin/ ‘fig’ vs. /tˁin/ ‘clay’; and /darb/ ‘path’ vs. /dˁarb/ ‘hitting’. An emphatic tap has been posited as a marginal phoneme in at least one dialect (Watson 2002). In addition, a variety of other consonants may be realized as emphatic allophones: [bˁ lˁ mˁ] (Watson 2002). Symbolic transcriptions tend to suggest uniformity in the production of a speech sound, which may be misleading. According to Ladefoged (1993, p. 280), “As soon as [phonetic] data is segmented or described in any way … phonological considerations are bound to be present.” For both the emphatics and pharyngeals, a symbolic transcription seems too phonologically reductive to capture the degree of phonetic variation attested in the literature. Both emphatics and pharyngeals still require significant study, particularly in terms of whole-vocal-tract imaging, to better understand their articulatory and consequent acoustic characteristics across speakers, dialects, and speech styles. Because the sounds are relatively well-studied in Arabic, it will be beneficial to study the sounds as they are realized in other Semitic and Caucasian languages, as well (Maddieson 2009).
By standard convention of the International Phonetic Association (IPA) (Thelwall and Sa’adeddin 1999), the emphatics are transcribed with a superscript voiced pharyngeal approximant or reverse glottal stop, denoting a secondary articulation in the pharynx. This is so, despite the fact that some researchers find these sounds are not articulated with a pharyngeal constriction at all. In other cases, the emphatics have non-primary constrictions at other places of articulation (e.g., labial) that may be just as significant to the acoustic outcome as the pharyngeal constriction. Because some of the literature on the emphatics fails to show conclusively that they are in fact pharyngealized, we will use non-IPA symbols (e.g., /ṣ ḍ/) for the emphatics except when referring explicitly to documented, phonetic pharyngealization.
The pharyngeals are traditionally transcribed with symbols that may not capture even coarse-grained phonetic parameters, like manner of articulation. Voiced pharyngeal approximants and voiced pharyngeal fricatives both map to the IPA symbol [ʕ], despite the fact that there is some contention in the literature as to whether this sound is a fricative, an approximant, or even a stop (the IPA currently lacks a symbol for a pharyngeal stop). In addition, reports vary widely as to the articulatory nature of the pharyngeals. It appears that both the pharyngeals and emphatics are marked by a high degree of phonetic variability across speakers, dialects, and speech styles. This complexity is increased, unfortunately, by some reports that adhere to conventional transcriptions of the sounds while providing little conclusive articulatory or acoustic data.
The posterior nature of the pharyngeal and emphatic sounds is perhaps one factor that has made their study a prize among phoneticians. Because pharyngeal constrictions are hidden in the relatively inaccessible, posterior region of the vocal tract, it has not long been possible to directly observe their articulatory character. Before the advent of whole-vocal-tract imaging techniques (and after the virtual abandonment of ionizing radiation for non-clinical research), acoustic studies allowed their articulation to be inferred, with more or less precision. The unique challenges of using spectral information to deduce the articulatory configuration of the vocal tract will be discussed in some detail later.
Pharyngeal consonants are somewhat rare, occurring in 19 of 451 (4.21%) languages in the UPSID-PC database (Maddieson and Precoda 1991). This includes mostly Caucasian, Cushitic, and some other Semitic languages. Pharyngealized consonants appear to be even less common (around 1.77% in UPSID-PC). By contrast, velarized consonants are attested in 2.88% of the sampled languages and labialized consonants in 18.63%. As in Arabic, pharyngealized sounds tend to occur in languages in which pharyngeal consonants are also posited. However, if one highlights the fact that the primary constriction for the low vowel /a/ occurs in the pharynx, then it is easy to assert that most human languages possess at least one ‘pharyngeal’ vowel. In this sense, pharyngeal constrictions in human speech are hardly exotic. (Vowels described as ‘pharyngealized’ appear to be quite rare, occurring in only 1.33% of the languages in UPSID-PC.)
This chapter will discuss our present knowledge of the acoustic and articulatory characteristics of the pharyngeal and emphatic classes in Arabic. It will conclude with a discussion of future directions in the study of these speech sounds, including recent work on real-time magnetic resonance imaging (rt-MRI) of pharyngeal and emphatics in running speech conducted by our research group. These innovative techniques allow us for the first time to visualize the pharynx from multiple angles during speech. They have considerable potential for further illuminating the articulatory variability of Arabic consonants and may help overcome the serious challenges in adequately describing these sounds. The articulatory realization of these consonants has implications for dialectology, sound change, models of articulatory control, acoustic-articulatory mapping, and language pedagogy.
Medieval Arab and Persian grammarians were the first to describe the distinctive vocal tract configurations of the pharyngeal and emphatic sounds of Arabic. As early as the eleventh century, in his Risālah on the points of articulation, Ibn Sinā wrote, “[The pharyngeal /ʕ/] is deeper in the throat, in the place where the air involved in vomiting is located” (trans. Semaan 1963). Regarding the articulatory differences between /s/ and /ṣ/, Ibn Sinā noted that the surface area of the tongue increased in length and breadth during the emphatic and that the surface of the tongue was somewhat hollowed out, as well (trans. Semaan 1963). Earlier still, eighth-century grammarians like Al-Khalil and Sibawayh proposed articulatory characteristics to differentiate the ‘plain’ and ‘emphatic’ sounds (e.g., /s/ and /ṣ/). Some were experimentally falsifiable, like iṭbāq ‘tongue spreading’ and istiʿlāʾ ‘elevation of the tongue dorsum’; others were impressionistic, like tafkhīm ‘heaviness’ or ‘thickness’ (Lehn 1963). More than twelve hundred years later, we can finally quantify the extent to which the tongue spreads and rises during the production of these sounds. It is still unclear, however, how to quantify impressions like ‘heaviness’ and ‘thickness’, which may be based on audition, kinesthesia, or both. Indeed, the conflation of auditory, acoustic, articulatory, and kinesthetic observations of these sounds is a considerable challenge in deciphering the many reports on their nature, now spanning centuries of linguistic inquiry.
The use of X-ray technology for phonetic research, beginning in the early twentieth century, led to the generation of more falsifiable hypotheses about the pharyngeals and emphatics. For example, Lehn (1963) associates the emphatic consonants in Cairo Arabic with pharyngealization, uvularization, and velarization. The term ‘u-resonance’ (a low F1 and a low F2, consistent with both labialization and velarization) is also associated with the vowels adjoining these sounds. Besides ionizing radiation-based techniques (radiography, fluoroscopy, and their dynamic counterparts), a great deal of articulatory work has investigated the vocal tract through the means of fiberoscopy (Al-Tamimi and Heselwood 2011, among others). The application of MRI to the study of these sounds is still less than a decade old and considerable advances are expected in the very near future.
The pharynx is “a cone-shaped musculotendinous tube extending from the base of the skull to the level of the sixth cervical vertebra” (Zemlin 1997, p. 273). The role of the pharynx in speech production is still poorly understood. While the size and shape of the pharynx evidently modifies the distribution of spectral energy produced at the laryngeal source (Chiba and Kajiyama 1941; Fant 1960), it is less evident how the pharynx assumes the shapes necessary to modify that energy. The pharynx is not a dynamic structure, with little ability to dilate through contraction of its own muscular walls, except at the very top, near the base of the skull. Thus, the changes that we observe in the pharynx typically result from the moving tongue, soft palate, and larynx (Zemlin 1997, p. 274). The role of the epiglottis as a speech articulator is still controversial; it is implicated in debates regarding the pharyngeal consonants, in particular (Esling 1999; Laufer and Condax 1979, 1981; Laufer and Baer 1988).
The pharyngeal tube occupies 40–55% the length of the human vocal tract but it is traditionally associated with only one to three places of articulation (pharyngeal, epiglottal, and perhaps glottal, depending on inclusion or exclusion of the larynx in what is regarded as the pharyngeal tube; Lammert et al. 2011). The oral cavity, including the alveolar ridge and hard palate, is associated with four to five places of articulation even though it accounts for at most 40% of vocal tract length (Lammert et al. 2011). To provide an equitable spatial resolution in terms of place of articulation, the pharynx should be populated with at least two more places of articulation. Despite the fact that the tongue is apparently less versatile in the pharynx than in the oral cavity, the length of the pharynx still suggests high degrees of articulatory freedom: many points of articulation in the vocal tract may rightly be considered ‘pharyngeal’. Thus, the size of the pharynx itself is an obstacle in precisely identifying the place of articulation of a ‘pharyngeal’ sound. Moreover, the terminological strictures imposed by the IPA seem to have imposed limits on descriptive accounts. This may help explain why, particularly in articulatory terms, reports of both the emphatic and pharyngeal classes differ widely.
Figure 3.1 Midsagittal illustration of the vocal tract, taken from a static MRI during the production of /s/ (shape and position of teeth, in gray, are estimated). Region a = hypopharynx; b = mediopharynx; c = hyperpharynx. Structure 1 = epiglottis; 2 = uvula; 3 = tongue body; 4 = tongue tip/blade; 5 = alveolar ridge; 6 = upper lip; 7 = lower lip.
Due to its considerable length, it seems reasonable to divide the pharynx into three components (see Figure 3.1). From bottom to top, these are: the hypopharynx or laryngoparhynx, including the region just above the vocal folds; the mediopharynx, bounded at the epiglottis; and the hyperpharynx or nasopharynx, i.e., the region immediately behind (and when the soft palate is lowered, above) the velopharyngeal port. When the place of articulation ‘pharyngeal’ is invoked, it could reasonably refer to any one of these three components. If the traditional places of articulation are used, the ‘epiglottal’ place might be construed as synonymous with ‘mediopharyngeal’; the ‘uvular’ place may be interchangeable, to some extent, with ‘hyperpharyngeal’, and the laryngeal place with ‘hypopharyngeal’. Critically, however, each of these places should be considered to account for several centimeters of the vocal tract, rather than the much more spatially compressed articulatory places of the oral cavity. If each of these terms were invoked in this way, then the cover term ‘pharyngeal’ would perhaps best be understood as the counterpart to the term ‘oral’, which does not suggest a single place of articulation, either. Nevertheless, the perhaps inadequate term ‘pharyngeal’ is quite common in the literature to date and it may only be in research from the recent past and moving forward that more specific terms are used consistently when dealing with articulatory aspects of this region of the vocal tract.
The geometry of the vocal tract shapes the distribution of acoustic energy generated inside the tract (the larynx is frequently the source of sound during speech). According to Perturbation Theory (Chiba and Kajiyama 1941), there are several points in the pharynx where a constriction will have maximal influence on the lowest formants of this energy (F1, F2, and F3). For a typical male vocal tract, a constriction in the larynx, or perhaps the hypopharynx, should raise all formants, including F1 (with respect to the formants of the ‘neutral’ vowel /ə/). Likewise, a constriction in the upper-mediopharynx (above the epiglottis) should lower the second formant (F2) with respect to the F2 of /ə/. Finally, a constriction in the hyperpharynx should raise the third formant (F3) while a constriction in the lower-mediopharynx (at the epiglottis or just below) should lower F3 (both with respect to the F3 of /ə/). Thus, if a pharyngeal consonant is produced in the hypopharynx, the articulation is expected to raise F1 of /ə/; if it is produced nearby in the lower-mediopharynx, it should lower F3 of /ə/. In most situations, there are multiple spectral cues to a place of articulation, as the predictions of Perturbation Theory are gradient. In other words, the closer a constriction comes to a prescribed point in the vocal tract, the greater the change in the associated formant frequency. In addition, when the affected vowel has a quality other than /ə/, care must be taken to consider the consequences of a constriction on a particular vowel quality.
The distinction between stop consonants is perhaps most directly signaled by the formant structure of the vocalic transitions that surround these sounds (burst characteristics, along with the durational and spectral properties of the turbulent airflow that occurs before the onset of voicing are also of significant consequence). Take, for example, the contrast between the VCV sequences /əḍə/ and /ədə/. In the milliseconds before and after the stop consonant, the vowels will differ in their formant frequencies in ways that are more or less predictable based on the shape of the vocal tract during those periods of time.
Emphatic stops are generally associated with higher F1 and lower F2 than plain stops (Al-Tamimi and Heselwood 2011; Hassan and Esling 2011; Zawaydeh 1998; among others). This is consistent with a constriction in the larynx (higher F1) and the upper-mediopharynx (lower F2). According to Zawaydeh (1998), the effect is asymmetrical: it is more evident on the vowel preceding the emphatic. This suggests that the secondary articulation of emphatics is associated with anticipatory rather than perseverative coarticulation. The effect can be widespread (up to two syllables away), suggesting a slow and gradual change in the vocal tract configuration (Zawaydeh and de Jong 2011). Hassan (1981) observed longer vowels preceding emphatic fricatives and proposed that this increased duration reflects the increased time necessary for the tongue to assume both a primary and secondary articulatory setting. Khattab et al. (2006) found significantly shorter voice onset times for emphatic versus plain stops.
As mentioned earlier, the class of emphatic consonants includes the voiceless alveolar and voiceless interdental fricatives (the latter is realized in some varieties, like Cairene Egyptian Arabic, as a voiced alveolar /ẓ/; Watson 2002). In addition to vocalic transitions, the turbulent acoustic energy associated with a fricative also holds cues pertaining to its place of articulation. Relatively little has been discovered in the spectra of fricatives that suggests clear articulatory differences between, e.g., /s/ and /ṣ/ (Abu-Al-Makarem 2005; Al-Khairy 2005). The parameters of fricative spectra are known for extreme variability due to the presence of random noise in the signal generated by turbulent airflow (Jesus and Shadle 2002). Treating spectral energy as a Gaussian distribution, Al-Khairy (2005) and Abu-Al-Makarem (2005) noted significant differences in the center of gravity and skewness of plain versus emphatic fricatives. A clear map between emphatic fricative acoustics and their articulation has yet to emerge, given that a secondary articulation posterior to a primary constriction (e.g., a pharyngeal constriction secondary to an alveolar constriction) is not predicted to have a significant impact on fricative acoustics. The spectral signature of fricatives is argued to depend mostly on the length of the resonating cavity in front of the tightest constriction (Stevens 1998). Research on this topic is ongoing, with suggestions that the emphatic fricatives have a more retracted (primary) place of articulation than their plain congeners (Hermes 2014). This account has the advantage of adhering to traditional models of fricative acoustics, in which a secondary articulation behind the primary constriction is practically irrelevant.
The many-to-one mapping between articulation and acoustics has long been recognized as a problem in the acoustic theory of speech production (Fant 1960) and has been noted, in particular, for the study of secondary articulation in Arabic emphatics (Khattab et al. 2006). Simply put, there are many articulatory configurations that can result in similar, if not identical, acoustic outcomes. Labialization and velarization, for example, are both known to lower F2. Thus, if all one knows is that a sound manifests a lowered F2, should this effect be attributed to constriction at the lips, at the velum, or both? Unfortunately, the problem is largely intractable without articulatory evidence (even computational modeling of the sounds requires some articulatory input, which should at a certain stage be corroborated instrumentally). This state of affairs is somewhat more serious with the pharyngeal and emphatic consonants of Arabic for at least two reasons, both of which inhibit research at its most basic, intuition-forming stage. First, the relevant articulatory gestures occur in a part of the vocal tract hidden from most forms of direct observation. Second, the pharyngeal region of the vocal tract transmits relatively little proprioceptive feedback, so even a phonetically-trained native speaker may have problems resolving the place of articulation. When non-native speakers of Arabic attempt to learn the pharyngeal and emphatic sounds, these problems often play out in the microcosm of the classroom.
Pharyngeal consonants are typified by a decrease in F2 in adjoining vocalic transitions (Al-Ani 1970; Ghazeli 1977; Heselwood 1992; Laufer and Baer 1988). F1 tends to increase near pharyngeals (Heselwood 1992). These observations are consistent with a constriction in the upper-mediopharynx (predictive of the F2 effect) and the larynx or hypopharynx (F1). This point has been estimated at least 2.83 cm above the glottis in a typical male vocal tract, a position that corresponds roughly to the laryngeal aditus (the entrance to the larynx) and the surrounding aryepiglottic folds (Heselwood 2007, p. 12). El-Halees (1985) and Alwan (1989) have shown that an increase in F1 is likely to produce a categorical shift in perception. Listeners tend to judge consonants with a relatively low F1 as uvular and consonants with a relatively high F1 as pharyngeal.
As shown in section 3.2, the pharyngeal class and the emphatic class are both typified by an increase in F1 and a decrease in F2, changes that are consistent with a constriction in the larynx or hypopharynx (F1) and in the upper-mediopharynx (F2). Perplexingly, articulatory studies suggest that these two types of consonant may nevertheless manifest quite dissimilar pharyngeal configurations. The more anterior the constriction, the less likely it will raise F1. The ideal spot for raising F1 (due to a constriction) is in the hypopharynx, near the larynx. There are two optimal locations for lowering F2 by means of a constriction, one in the upper-mediopharynx (just above the epiglottis) and one at the lips. Thus, to the extent that a study suggests a more anterior constriction (e.g., one at the velum), the less likely it is to have the anticipated effect on F1. Constrictions associated with velar consonants are well-known to lower F2, even though this region is considerably anterior to the upper-mediopharynx, at least in the typical male vocal tract.
Emphasis, or the attribute(s) associated with emphatic consonants versus their plain counterparts, is a well-known and intensively studied property of Arabic speech sounds. It is implicated in coarticulatory effects that cross a variety of prosodic boundaries, including the syllable and the word (Bukshaisha 1985). As mentioned in the introduction, the articulation of emphasis has only been studied instrumentally in the last sixty years or so. A wide variety of instrumental techniques have been used, but often on different dialects with more or less comparative goals in mind. Non-invasive, holistic imaging techniques like MRI and rt-MRI have been applied only recently (Israel et al. 2012; Shar and Ingram 2011; Shosted et al. 2012).
Traditionally, plain and emphatic consonants were believed to share the same primary articulation. However, research now suggests that the primary place of articulation is in fact more posterior for emphatic consonants (Al-Tamimi and Heselwood 2011; Hermes 2014). Before this work, scholars believed the fundamental distinction between the emphatic and plain consonants was the secondary articulation. Owing perhaps to the difficulties in obtaining articulatory data in the vicinity of the velopharyngeal port and further behind it, phoneticians differ widely on the nature of this secondary articulation. For Trubetzkoy (1969) and Al-Nassir (1993), the secondary articulation is velar. For Zawaydeh and de Jong (2011) and McCarthy (1994), it is uvular. A relatively large number of recent studies suggest that the constriction is more posterior still, somewhere in the pharyngeal region (Al-Masri and Jongman 2004; Al-Tamimi et al. 2009; Israel et al. 2012; Laufer and Baer 1988). Esling (1999) has posited the most posterior constriction for emphatic consonants, i.e., in the hypopharynx. Moreover, the secondary place of articulation may vary across dialects (Al-Masri and Jongman 2004; Norlin 1987) and with vowel context (Al-Tamimi and Heselwood 2011). In short, if one wishes to justify acoustic observations of emphatics by pointing to a secondary place of articulation anywhere from the soft palate to the laryngeal entryway, there is a study that can serve that purpose.
In addition, the articulatory configuration of emphatics has been associated with larynx lowering (F1 lowering) in Iraqi (Hassan and Esling 2011) and larynx raising (F1 raising) in Jordanian (Al-Tamimi and Heselwood 2011). There is some indication that the tongue dorsum is raised and the velum is lowered (Ali and Daniloff 1972), two gestures both associated with a lowering of F2. The shape of the tongue has also come under scrutiny, beginning with Ibn Sinā’s dictum that the tongue surface during emphatic consonants is hollowed or sulcalized (trans. Semaan 1963). Ali and Daniloff (1972) corroborated this observation. Finally, Lehn (1963) has argued that emphatic consonants also manifest lip protrusion and constriction. Labialization tends to lower F2, which is consistent with acoustic observations of emphatic consonants. However, it would also lower F1, which is inconsistent with these same observations.
As mentioned earlier, acoustic evidence suggests that pharyngeal consonants are produced at the lowest margin of the hypopharynx – the entrance to the larynx – at a point situated about one-sixth the length of the vocal tract from the glottis (Heselwood 2007, p. 12). In a modeling study, Yeou and Maeda (2011) found that for pharyngeals, the area of supraglottal constriction is either equal to or greater than the area of glottal constriction. This is the opposite of the situation for a more typical fricative like /s/, in which the supraglottal constriction is smaller than the glottal constriction. Based on this result, Yeou and Maeda (2011) argue that both /ʕ/ and /ħ/ are approximants rather than fricatives. However, there is considerable variation in the production of these sounds across dialects of Arabic, and disagreements abound, suggesting that the richest source of variation may be between speakers within dialects. For example, Al-Ani (1970) and Alwan (1986) conclude that Iraqi /ʕ/ may be realized as a stop. (There is no IPA symbol for a pharyngeal stop; the sign for an epiglottal stop /ʡ/ is the closest existing possibility.) A similar claim has been made for Sudanese Arabic (Adamson 1981). Ghazeli (1977) was unable to find evidence of the pharyngeal stop with another Iraqi speaker and Butcher and Ahmad (1987) found evidence of the stop realization in final position only. Finally, Heselwood (2007) reports that there is a “tight approximant” variant of /ʕ/ in addition to the much less common stop variant.
There has been some disagreement over the years regarding the role of the epiglottis in the production of pharyngeal consonants. Laufer and Condax (1979) argued that the epiglottis retracts independently towards the posterior pharyngeal wall. The null hypothesis is that the epiglottis merely rides on the tongue root, which is the true active articulator. Laufer and Baer (1988) found that the position of the tongue root and the epiglottis indeed covary. Esling (1999) argues that it is not the epiglottis, but the aryepiglottic folds that serve as the active articulator. This may lead to the conclusion that the pharyngeals are in fact (ary-)epiglottal while altogether avoiding the debate over the independent movement of the epiglottis. One concern with this hypothesis, however, is that a muscular mechanism for constricting the aryepiglottic folds has not been found. Zemlin (1997, p. 116) observes that numerous dissections failed to reveal muscles in less than 10% of specimens “and when found they were sparse”. One provocative – and as yet unsubstantiated – hypothesis is that these muscles are further developed in speakers of languages (like Arabic) that routinely articulate pharyngeal consonants.
An early MRI study of Arabic phonetics investigated the vertical displacement of the larynx and pharyngeal width in the mid-sagittal plane (Shar and Ingram 2011). At its most basic, MRI technology can be used to image anatomical structures in static position. When working with speech production, this means the sound must be pronounced for a period of time sufficient to capture a suitable image. Shosted et al. (2012) performed a static MRI scan of a male speaker of Jordanian Arabic, producing /ħ/ in isolation for about 20 seconds. MRI technology offers researchers a tradeoff between temporal and spatial resolution. If the speaker is able to maintain a static position for long periods of time, it is possible to obtain a 3D image with a fairly high spatial resolution (here, 2 × 2 mm with a relatively thin through-plane thickness of 0.54 mm). We were then able to measure the size of the pharyngeal cavity from the hyperpharynx to the glottis. For /ħ/ we observed a constriction (<100 mm2) about 3.5 cm above the glottis. For /ʕ/ we observed a constriction at about the same position (cf. Heselwood’s 2007 finding of an aryepiglottal constriction, somewhat lower, about 2.83 cm above the glottis). One remarkable difference between the constrictions for the two pharyngeal consonants is their length. The constriction for /ʕ/ was relatively short, extending only about 5 cm above the glottis, while the constriction for /ħ/ appeared to continue to the upper pharynx, at least 7 cm above the glottis. A 2D image of this speaker producing a static /ʕ/ is provided in Figure 3.2 (here the resolution is higher, 0.78 × 0.78 mm). As can be noted in the figure, the epiglottis is retracted, creating a relatively large space between the tongue root and the epiglottis (the epiglottic vallecula). The body of the tongue is also shunted forward with respect to its position during /ħ/. While this configuration may result from some hyperarticulation on the part of the speaker (recall that the sounds are produced for approximately 20 seconds to create these images), the dramatic position of the epiglottis is suggestive of an independent role for this little-understood speech articulator, as argued by Laufer and Condax (1979). The position of the arytenoid cartilages is also worth noting. A tight constriction is formed there, about 1.2 mm above the glottis for /ʕ/, whereas the same constriction appears a bit looser for /ħ/.
Figure 3.2 Jordanian speaker producing a pharyngeal approximant /ʕ/ during a 20-second interval (left) and a pharyngeal fricative /ħ/ under the same conditions (right). Note the retracted position of the epiglottis, nearly touching the posterior pharyngeal wall in both cases, and the relatively longer constriction for /ħ/. Scale: one pixel = 0.78 × 0.78 mm.
Scans of the plain and emphatic /s/ and /ṣ/ are presented in Figure 3.3. The emphatic fricative is clearly pharyngealized with little evidence of hypopharyngeal constriction as in /ħ/ (cf. Figure 3.2). The mediopharyngeal constriction in the emphatic fricative is fairly extensive and the tongue dorsum is retracted towards the posterior pharyngeal wall (cf. /ħ/ in Figure 3.2, where the tongue dorsum is elevated towards the uvula). Information about both constrictions would be difficult to obtain using other instrumental approaches, but here it is evident that the pharyngeal constriction spans the length of the mediopharynx and perhaps some portion of the hyperpharynx, as well. The constriction associated with the emphatic fricative seems more reminiscent of the configuration associated with /ħ/ rather than /ʕ/.
Another approach uses time-varying pixel intensity to draw out temporal patterns in the repetition of an utterance (Shosted et al. 2012). For example, the same Jordanian speaker mentioned earlier uttered a phrase containing a single pharyngeal fricative /ħ/. Using the time-aligned audio captured synchronously with the MRI data, we were able to detect an increase in pixel intensity (interpreted as movement of tissue) in the mediopharyngeal region of the speaker’s vocal tract, roughly corresponding to the epiglottis. Sampling time-varying pixel intensity is a relatively coarse-grained means of detecting changes in the position of anatomical structures. While this approach holds some promise of tracing the time-varying pharyngeal activity, steps should be taken to reduce noise in the resulting signals, either through advanced signal processing or through changes in the image acquisition.
rt-MRI has only recently been used to visualize and quantify the simultaneous contributions of a variety of anatomical structures to the production of pharyngeal and emphatic consonants in Arabic (Israel et al. 2012; Shosted et al. 2012; Shosted et al. 2013). We use a quantitative method based on principal components analysis and linear discriminant analysis of pixel intensity (Carignan et al. 2015). To obtain the data, we use a sophisticated reconstruction algorithm that allows us to visualize the movements of vocal tract organs at up to 100 frames per second with a spatial resolution of 2.2 × 2.2 × 8.0 mm (Fu et al. 2015). This can be done simultaneously in multiple cross-sections of the vocal tract, though each additional cross section reduces the effective frame rate of the resulting data. Here we present some results of an acquisition where the slices are in coronal section through the anterior oral cavity, in oblique section through the velopharynx, and in axial section through the mediopharynx and hypopharynx. The subject is a male speaker of Levantine Arabic (LA01). The speaker repeated the phrase /iktubu X sit marrat/ ‘write X six times!’ for approximately four minutes. The test word X varied between /basar/ ‘he frowned’ and /baṣar/ ‘eyesight’. The MR images were aligned with synchronous audio and the fricative portions of the test words were segmented by hand. The temporal middle of each fricative was subjected to further analysis. For each anatomical section, a region of interest was drawn to circumscribe structures of hypothetical importance in the production of the emphatic fricative. In one case, we analyzed the axial section through the hypopharynx, just above the larynx. The region of interest included the hypopharyngeal cavity and surrounding tissues comprising the aryepiglottic folds. In a black-and-white digital image, the intensity of each pixel can be described with a value between 0 (black) and 1 (white). The intensity of n pixels in the region of interest was submitted to a principal components analysis, which maps each image in an n-dimensional space and rotates that space to reduce variance in the entire dataset. Each image then had a score for each principal component (PC) that results from this rotated space. The loadings, or coefficients, of the lowest-order PCs (generally PC1 and PC2) were mapped back onto the original image – i.e., one loading for each pixel – producing a heatmap that allows us to interpret the articulatory meaning of each PC. Finally, the mean PC values for a distribution of images (here, those imaging /s/ and those imaging /ṣ/) were compared using a t-test.
Figure 3.3 Plain (left) and emphatic (right) alveolar fricatives /s ṣ/ produced by a male speaker of Jordanian Arabic, with the same resolution as in Figure 3.2.
For this speaker of Levantine Arabic, the heatmap indicated that PC1 was strongly associated with hypopharyngeal constriction all around the surrounding aryepiglottic folds. In other words, in all of the images of the fricatives much of the variation could be accounted for simply by taking into account the pixels immediately surrounding the hypopharyngeal cavity. Pixels on one lateral wall of the pharynx were most strongly associated with PC1. Based on our knowledge of the anatomy and the positioning of the slice, we consider these pixels to represent the aryepiglottic folds. This abstract, numerical representation of hypopharyngeal constriction (PC1) was significantly greater in /ṣ/ than in /s/. This suggests that emphasis is produced in part through constriction of the hypopharyngeal region, which could be accomplished by constricting the aryepiglottic folds. Thus, the /ṣ/ of this speaker seems to possess a hypopharyngeal constriction similar to the one produced by the Jordanian speaker during his production of /ʕ/ (Figure 3.2). This constriction is arguably deeper than the one produced by the same Jordanian speaker for /ṣ/ (Figure 3.3). Using this same method, we are able to posit a significant difference in the position of the tongue dorsum at the mediopharyngeal slice, as well (again, there is more constriction for /ṣ/, as expected).
It is most likely true that we hear sounds, not speech articulators, when we process a stream of speech (Ohala 1996). However, understanding the articulation of pharyngeals and emphatics in Arabic sheds light on our species’ fine-grained motor control of the vocal tract and on how that control shapes speech patterns both synchronic and diachronic. The rt-MRI technique discussed here is a powerful analytical tool that holds great promise in the description of Arabic pharyngeal and emphatic consonants. Challenges for future work include improving algorithms for de-noising acoustic data acquired while the relatively noisy MRI scanner is running, as well as increasing the spatial and temporal resolution of the resulting data. While it is still too early to demonstrate clear differences between Arabic speakers and dialects, it is likely that rt-MRI methods will have a significant impact on the study of Arabic pharyngeals and emphatics in the near future.
We owe special thanks to Bradley Sutton (Department of Bioengineering and Beckman Institute for Advanced Science and Technology, University of Illinois) for helpful discussion, data acquisition/reconstruction support, and facilitating our research collaboration. We are also grateful to Li-Hsin Ning for her help with initial post-processing of the rtMRI images, including audio–image alignment. Nancy Dodge and Holly Tracy at the Biomedial Imaging Center of the Beckman Institute helped us acquire the MR images. Shelly Yambert kindly helped us schedule subjects in a busy facility. Audience members at Experimental Arabic Linguistics (2013) in Al Ain offered interesting and challenging perspectives which we have tried to incorporate here. Thanks to Abbas Benmamoun and Farzad Karimzad for comments on the manuscript. Finally, we thank our patient and dedicated research participants. Any errors or omissions remain our own responsibility.