Tool Module: The Human Vocal Apparatus

Unless there is a special problem, speaking our mother tongue is something we do so effortlessly and unconsciously that we are unaware not only of the extremely complex cognitive processes that underlie the act of speaking, but also of the incredibly precise mechanics involved in articulating our words correctly.

The human vocal apparatus is like two kinds of musical instruments at once: a wind instrument and a string instrument. This apparatus includes a source of wind (the lungs), components that vibrate (the vocal cords in the larynx), and a series of resonant chambers (the pharynx, the mouth, and the nasal cavities). Here is how all these components work together when you speak.

The first component of this apparatus is the lungs that provide the necessary air and that can thus be described as the “generator”. When you are speaking, your inhalations become faster and shorter and you breathe more with your mouth, whereas otherwise you inhale only with your nose. When you exhale while speaking, you increase the volume and pressure of your airstream to vibrate the vocal cords in your larynx.

The larynx consists of a set of muscles and pieces of cartilage, with varying degrees of mobility, that can be raised or lowered like a gate to protect your bronchi and lungs from food and other foreign bodies. When you swallow food, your larynx rises, while the epiglottis, a flap of cartilage at the entry to the larynx, closes down over it to block the upper airways and let the food move down your esophagus safely into your stomach.

When you speak, the air expelled from your lungs moves up through the trachea to the larynx, where it passes over the vocal cords. These cords are a matched pair of muscles and ligaments, pearly white in colour, 20 to 25 millimetres long, and coated with mucus. They constitute the second component of your vocal apparatus: the “vibrator”.

The vocal cords are attached horizontally from the thyroid cartilage (the “Adam’s apple” in men) at the front to the arytenoid cartilages at the rear. By moving these cartilages as you speak, you alter the length and position of your vocal cords. When you start to say something, the arytenoid cartilages press the vocal cords against each other, thus closing the opening between them (known as the glottis).

Under the pressure of the air being exhaled, the vocal cords separate, then close again immediately, causing the air pressure beneath the glottis to increase again. By opening and closing the glottis rapidly during phonation, the vocal cords thus release the air from the lungs in a vibrating stream. When you speak a sentence, you modify the vibration frequency of your vocal cords many times to produce the acoustic vibrations (sounds) that are the raw materials for the words themselves.

For these sounds to be transformed into words, they must then be shaped by the rest of the vocal apparatus. The first step in this process occurs in the pharyngeal cavity, where the respiratory and digestive systems meet. The pharynx and the other cavities with which it communicates (the nasal cavities, mouth, and larynx) act as a “resonator” that alters the sounds issuing from your vocal cords, amplifying some frequencies while attenuating others.

The transformation of the sounds from the larynx is then completed by the position of the soft palate, tongue, teeth, lips, and other parts of the mouth, which act as “modulators” for this sound. While the larynx produces the vibrations without which you would have no voice, it is these other parts of your vocal apparatus that make your voice so flexible and versatile. They do so in different ways. Your he soft palate either blocks the passage to the upper nasal cavities or leaves it open so that the vibrating airstream can enter them. Your jaws open or close to change the size of the oral cavity. Your tongue changes shape and position to alter this cavity further. Your tongue and the lips obstruct the airflow through the teeth to varying extents. The lips also alter their shape—open, closed, pursed, stretched, and so on—to shape the sound further.

To produce the vowel sound “ee” (as in “teen”), for example, you must move your tongue toward the front of your palate, which widens the pharyngeal cavity while raising the larynx slightly. To produce the sound “ah” (as in “far”), you must lower your jaw and your tongue. To pronounce consonants, you must make various movements of the tongue and lips. For example, to pronounce an “F”or an “S”, you move your tongue and lips so as to slow the outgoing airstream. To pronounce a “B”, “P”, or “T”, you stop the airstream and then release it, with varying degrees of sharpness. To produce a “V” or a “J”, you make the airstream vibrate, and so on.

Is the human vocal apparatus essential for speech?

Scientists long believed that the main reason that other primates had never succeeded in mastering human language despite all the efforts that had been made to teach them (follow the blue Experiment Module link below) was that the particular anatomy of their vocal apparatus prevented them from doing so. In apes, as in human infants, the larynx is positioned very high in the neck, which would prevent it from producing all the sounds of human language. But this position does have certain advantages: for example, both apes and babies can breathe through their noses while continuing to eat.

In contrast, in adult humans, the low position of the larynx means that the pathways to the stomach and the lungs intersect, thus increasing the risks of choking. It therefore seems that the advantage that this descended larynx provides is a vocal communication system that makes this risk of choking worthwhile.

Modelling and simulation studies have shown, however, that the limited phonatory capabilities of the high-positioned larynx in primates and babies represent only a relatively minor handicap in terms of language. For that matter, the high position of the larynx in human babies does not prevent them from imitating the adult vowel sounds “ee”, “ah”, and “oo” from as early as 4 months of age, and from producing their first words 8 months later, when the larynx is still very high and the pharyngeal cavity is still very small. The reason that apes and younger babies cannot speak would therefore seem to be not that their larynx is too high, but rather that they lack the cognitive abilities needed to master language.

The descent of the larynx in the course of evolution

In Australopithecus, the larynx had not yet descended, so individuals transmitted information by means of cries and gestures. As early humans gradually adopted an erect posture, it gradually brought the position of their head back and up so that it tipped back at the base of the skull, thus causing the neck to emerge and the larynx to descend.

Since the base of the skull constitutes the roof of the vocal apparatus, the fossil record gives us some idea of when in evolutionary time the larynx descended. Indications of this descent have been found in skulls of Homo ergaster, from nearly 2 million years ago. A skull of Homo heidelbergensis found in Ethiopia shows that the larynx had almost reached its current position 600 000 years ago. These findings lead to the conclusion that a vocal apparatus capable of articulate language probably existed nearly half a million years before people began to speak.

It therefore seems unlikely that the human vocal apparatus was selected “for” language. It may have conferred some advantages in pre-linguistic communication, but was this a sufficient selective pressure? Some authors believe that this low position of the larynx may have afforded certain benefits with regard to breathing. Other authors point out that other animal species besides humans (deer, for example), also have low larynxes. These authors therefore believe that this anatomical characteristic may have evolved because it lets animals make sounds that lead others to believe that they are larger than they really are.

It would therefore not be surprising if the human vocal apparatus turned out to be an exaptation: in other words, an adaptation to pressures selecting for purposes other than speech, but whose result—a descended larynx—nevertheless facilitated the articulation of words.

Close this window.