32:48

The Human Voice

by Benjamin Boster

Rated
4.8
Type
talks
Activity
Meditation
Suitable for
Everyone
Plays
22.6k

In this episode of the "I Can't Sleep Podcast," fall asleep learning about the human voice. Even though there are many descriptive scientific terms along with anatomical references, don't get your hopes up. You're bound for a night of restful slumber. Happy sleeping!

SleepAnatomyModulationRegistersDisordersResonantCharacteristicsScientific TermsAnatomical ReferencesRestful SlumberVoice TechniquesVocal RegistersSpeech TherapyVocal ResonancePhonation TheoriesVocal CapacityVocal TechniquesHuman VoicesPracticesTheories

Transcript

Welcome to the I Can't Sleep podcast,

Where I read random articles from across the web to bore you to sleep with my soothing voice.

I'm your host,

Benjamin Bostor.

Today's episode is from a Wikipedia article titled The Human Voice,

And this episode goes out to a little girl in Vermont named Lily,

Who's six years old.

The human voice consists of sound made by a human being using the vocal tract,

Including talking,

Singing,

Laughing,

Crying,

Screaming,

Shouting,

Or yelling.

The human voice frequency is specifically a part of human sound production,

In which the vocal folds,

Vocal cords,

Are the primary sound source.

Other sound production mechanisms produced from the same general area of the body involve the production of unvoiced consonants,

Clicks,

Whistling,

And whispering.

Generally speaking,

The mechanism for generating the human voice can be subdivided into three parts,

The lungs,

The vocal folds within the larynx,

Voice box,

And the articulators.

The lungs,

The pump,

Must produce adequate airflow and air pressure to vibrate vocal folds.

The vocal folds,

Vocal cords,

Then vibrate to use airflow from the lungs to create audible pulses that form the laryngeal sound source.

The muscles of the larynx adjust the length and tension of the vocal folds to fine-tune pitch and tone.

The articulators,

The parts of the vocal tract above the larynx consisting of tongue,

Palate,

Cheek,

Lips,

Etc.

,

Articulate and filter the sound emanating from the larynx and to some degree can interact with the laryngeal airflow to strengthen or weaken it as a sound source.

The vocal folds in combination with the articulators are capable of producing highly intricate arrays of sound.

The tone of voice may be modulated to suggest emotions such as anger,

Surprise,

Fear,

Happiness,

Or sadness.

The human voice is used to express emotion and can also reveal the age and sex of the speaker.

Singers use the human voice as an instrument for creating music.

Voice Types and the Folds Themselves Adult men and women typically have different sizes of vocal fold,

Reflecting the male-female differences in larynx size.

Adult male voices are usually lower pitched and have larger folds.

The male vocal folds are between 17mm and 25mm in length.

The female vocal folds are between 12.

5mm and 17.

5mm in length.

The folds are within the larynx.

They are attached at the back side nearest the spinal cord to the arytenoids cartilages and at the front side under the chin to the thyroid cartilage.

They have no outer edge as they blend into the side of the breathing tube while their inner edges or margins are free to vibrate.

They have a three-layer construction of an epithelium vocal ligament,

Then muscle,

Vocalous muscle,

Which can shorten and bulge the folds.

They are flat triangular bands and are pearly white in color.

Above both sides of the vocal cord is the vestibular fold or false vocal cord,

Which has a small sac between its two folds.

The difference in vocal fold size between men and women means that they have differently pitched voices.

Additionally,

Genetics also causes variances amongst the same sex,

With men's and women's singing voices being categorized into types.

For example,

Among men there are bass,

Bass baritone,

Baritone,

Baritone,

Tenor,

Tenor,

And countertenor,

Ranging from E2 to C-sharp 7 and higher.

And among women contralto,

Alto,

Mezzo-soprano,

And soprano,

Ranging from F3 to C6 and higher.

There are additional categories for operatic voices.

This is not the only source of difference between male and female voice.

Women generally speaking have a larger vocal tract,

Which essentially gives the resultant voice a lower sounding timbre.

This is mostly independent of the vocal folds themselves.

Voice Modulation in Spoken Language Human spoken language makes use of the ability of almost all people in a given society to dynamically modulate certain parameters of the laryngeal voice source in a consistent manner.

The most important communicative or phonetic parameters are the voice pitch,

Determined by the vibratory frequency of the vocal folds,

And the degree of separation of the vocal folds,

Referred to as vocal fold adduction coming together,

Or abduction separating.

The ability to vary the ab adduction of the vocal folds quickly has a strong genetic component,

Since vocal fold adduction has a life preserving function in keeping food from passing into the lungs,

In addition to the covering action of the epiglottis.

Consequently,

The muscles that control this action are among the fastest in the body.

Children can learn to use this action consistently during speech at an early age,

As they learn to speak the difference between utterances such as Abba,

Having an abductor adductor gesture for the P,

As Abba,

Having no abductor adductor gesture.

Surprisingly enough,

They can learn to do this well before the age of two by listening only to the voices of adults around them,

Who have voices much different from their own,

And even though the laryngeal movements causing these phonetic differentiations are deep in the throat and not visible to them.

If an abductor movement or adductor movement is strong enough,

The vibrations of the vocal folds will stop or not start.

If the gesture is abductor and is part of a speech sound,

The sound will be called voiceless.

However,

Voiceless speech sounds are sometimes better identified as containing an abductor gesture,

Even if the gesture was not strong enough to stop the vocal folds from vibrating.

This anomalous feature of voiceless speech sounds is better understood if it is realized that it is the change in the spectral qualities of the voice as abduction proceeds that is the primary acoustic attribute that the listener attends to when identifying a voiceless speech sound and not simply the presence or absence of voice periodic energy.

An adductor gesture is also identified by the change in voice spectral energy it produces.

Thus,

A speech sound having an adductor gesture may be referred to as a glottal stop even if the vocal fold vibrations do not entirely stop.

Other aspects of the voice such as variations in the regularity of vibration are also used for communication and are important for the trained voice user to master,

But are more rarely used in the formal phonetic code of a spoken language.

Physiological and Vocal Timbre The sound of each individual's voice is entirely unique not only because of the actual shape and size of an individual's vocal cords,

But also due to the size and shape of the rest of that person's body,

Especially the vocal tract and the manner in which the speech sounds are habitually formed and articulated.

It is this latter aspect of the sound of the voice that can be mimicked by skilled performers.

Humans have vocal folds that can loosen,

Tighten,

Or change their thickness and over which breath can be transferred at varying pressures.

The shape of chest and neck,

The position of the tongue,

And the tightness of otherwise unrelated muscles can be altered.

Any one of these actions results in a change in pitch,

Volume,

Timbre,

Or tone of the sound produced.

Sound also resonates within different parts of the body,

And an individual's size and bone structure can affect somewhat the sound produced by an individual.

Performers can also learn to project sound in certain ways so that it resonates better within their vocal tract.

This is known as vocal resonation.

Another major influence on vocal sound and production is the function of the larynx,

Which people can manipulate in different ways to produce different sounds.

These different kinds of laryngeal function are described as different kinds of vocal registers.

The primary method for singers to accomplish this is through the use of the singer's formant,

Which has been shown to be a resonance added to the normal resonances of the vocal tract above the frequency range of most instruments,

And so enables the singer's voice to carry better over musical accompaniment.

Vocal Registration Vocal registration refers to the system of vocal registers within the human voice.

A register in the human voice is a particular series of tones produced in the same vibratory pattern of the vocal folds and possessing the same quality.

Registers originate in laryngeal functioning.

They occur because the vocal folds are capable of producing several different vibratory patterns.

Each of these vibratory patterns appears within a particular vocal range of pitches and produces certain characteristic sounds.

The occurrence of registers has also been attributed to effects of the acoustic interaction between the vocal fold oscillation and the vocal tract.

The term register can be somewhat confusing as it encompasses several aspects of the human voice.

The term register can be used to refer to any of the following.

A particular part of the vocal range,

Such as the upper,

Middle,

Or lower registers.

A resonance area,

Such as chest voice or head voice.

A phonatory process.

A certain vocal timbre.

A region of the voice that is defined or delimited by vocal breaks.

A subset of a language used for a particular purpose or in a particular social setting.

In linguistics,

A register language is a language that combines tone and vowel phonation into a single phonological system.

Within speech pathology,

The term vocal register has three constituent elements.

A certain vibratory pattern of the vocal folds,

A certain series of pitches,

And a certain type of sound.

Speech pathologists identify four vocal registers based on the physiology of laryngeal function.

The vocal fry register,

The modal register,

The falsetto register,

And the whistle register.

This view is also adopted by many vocal pedagogists.

Vocal resonation.

Vocal resonation is the process by which the basic product of phonation is enhanced in timbre and or intensity by the air-filled cavities through which it passes on its way to the outside air.

Various terms related to the resonation process include amplification,

Enrichment,

Enlargement,

Improvement,

Intensification,

And prolongation.

Although in strictly scientific usage,

Acoustic authorities would question most of them.

The main point to be drawn from these terms by a singer or speaker is that the end result of resonation is or should be to make a better sound.

There are seven areas that may be listed as possible vocal resonators.

In sequence from the lowest within the body to the highest,

These areas are the chest,

The tracheal tree,

The larynx itself,

The pharynx,

The oral cavity,

The nasal cavity,

And the sinuses.

Races of the human voice.

The twelve-tone musical scale upon which a large portion of all music,

Western popular music in particular,

Is based may have its roots in the sound of the human voice during the course of evolution,

According to a study published by the New Scientist.

Analysis of recorded speech samples found peaks in acoustic energy that mirrored the distances between notes in the twelve-tone scale.

Voice disorders.

There are many disorders that affect the human voice.

These include speech impediments and growths and lesions on the vocal folds.

Talking improperly for long periods of time causes vocal loading,

Which is stress inflicted on the speech organs.

When vocal injury is done,

Often an ENT specialist may be able to help,

But the best treatment is the prevention of injuries through good vocal production.

Voice therapy is generally delivered by a speech-language pathologist.

Vocal cord nodules and polyps.

Vocal nodules are caused over time by repeated abuse of the vocal cords,

Which results in soft,

Swollen spots on each vocal cord.

These spots develop into harder,

Callous-like growths called nodules.

The longer the abuse occurs,

The larger and stiffer the nodules will become.

Most polyps are larger than nodules and may be called by other names,

Such as polypoid degeneration or Reinkes edema.

Polyps are caused by a single occurrence and may require surgical removal.

Irritation after the removal may then lead to nodules if additional irritation persists.

Speech-language therapy teaches the patient how to eliminate the irritations permanently through habit changes and vocal hygiene.

Coarseness or breathiness that lasts for more than two weeks is a common symptom of an underlying voice disorder,

Such as nodules or polyps,

And should be investigated medically.

Phonation The term phonation has slightly different meanings depending on the subfield of phonetics.

Among some phoneticians,

Phonation is the process by which the vocal folds produce certain sounds through quasi-periodic vibration.

This is the definition used among those who study laryngeal anatomy and physiology and speech production in general.

Phoneticians in other subfields such as linguistic phonetics call this process voicing,

And use the term phonation to refer to any oscillatory state of any part of the larynx that modifies the airstream,

Of which voicing is just one example.

Voiceless and supraglottal phonations are included under this definition.

Voicing The phonatory process,

Or voicing,

Occurs when air is expelled from the lungs through the glottis,

Creating a pressure drop across the larynx.

When this drop becomes sufficiently large,

The vocal folds start to oscillate.

The minimum pressure drop required to achieve phonation is called the phonation threshold pressure,

PTP,

And for humans with normal vocal folds,

It is approximately 2 to 3 centimeters H2O.

The motion of the vocal folds during oscillation is mostly lateral.

Though there is also some superior component as well.

However,

There is almost no motion along the length of the vocal folds.

The oscillation of the vocal folds serves to modulate the pressure and flow of the air through the larynx,

And this modulated airflow is the main component of the sound of most voiced phones.

The sound that the larynx produces is a harmonic series.

In other words,

It consists of a fundamental tone called the fundamental frequency,

The main acoustic cue for the percept pitch accompanied by harmonic overtones,

Which are multiples of the fundamental frequency.

According to the source filter theory,

The resulting sound excites the resonance chamber that is the vocal tract to produce the individual speech sounds.

The vocal folds will not oscillate if they are not sufficiently close to one another,

Are not under sufficient tension,

Or under too much tension,

Or if the pressure drop across the larynx is not sufficiently large.

In linguistics,

A phone is called voiceless if there is no phonation during its occurrence.

In speech,

Voiceless phones are associated with vocal folds that are elongated,

Highly tensed,

And placed laterally abducted when compared to vocal folds during phonation.

Fundamental frequency,

The main acoustic cue for the percept pitch,

Can be varied through a variety of means.

Large scale changes are accomplished by increasing the tension in the vocal folds through contraction of the cricothyroid muscle.

Similar changes in tension can be affected by contraction of the thyroarytenoid muscle or changes in the relative position of the thyroid and cricoid cartilages,

As may occur when the larynx is lowered or raised,

Either volitionally or through movement of the tongue,

To which the larynx is attached via the hyoid bone.

In addition to tension changes,

Fundamental frequency is also affected by the pressure drop across the larynx,

Which is mostly affected by the pressure in the lungs and will also vary with the distance between the vocal folds.

Variation in fundamental frequency is used linguistically to produce intonation and tone.

There are currently two main theories as to how vibration of the vocal folds is initiated,

The myoelastic theory and the aerodynamic theory.

These two theories are not in contention with one another and it is quite possible that both theories are true and operating simultaneously to initiate and maintain vibration.

A third theory,

The neurochronastic theory,

Was in considerable vogue in the 1950s but has since been largely discredited.

Myoelastic and Aerodynamic Theory The myoelastic theory states that when the vocal cords are brought together and breath pressure is applied to them,

The cords remain closed until the pressure beneath them,

The subglottic pressure,

Is sufficient to push them apart,

Allowing air to escape and reducing the pressure enough for the muscle tension recoil to pull the folds back together again.

The pressure builds up once again until the cords are pushed apart and the whole cycle keeps repeating itself.

The rate at which the cords open and close,

The number of cycles per second,

Determines the pitch of the phonation.

The aerodynamic theory is based on the Bernoulli energy law in fluids.

The theory states that when a stream of breath is flowing through the glottis while the arytenoid cartilages are held together by the action of the interarytenoid muscles,

A push-pull effect is created on the vocal fold tissues that maintain self-sustained oscillation.

The push occurs during glottal opening when the glottis is convergent,

And the pull occurs during glottal closing when the glottis is divergent.

Such an effect causes a transfer of energy from the airflow to the vocal fold tissues,

Which overcomes losses by dissipation and sustain the oscillation.

The amount of lung pressure needed to begin phonation is defined by Teets as the oscillation threshold pressure.

During glottal closure,

The airflow is cut off until breath pressure pushes the folds apart and the flow starts up again,

Causing the cycles to repeat.

The textbook entitled Myelastic Aerodynamic Theory of Phonation by Ingo Teets credits Jan-Willem van der Berg as the originator of the theory and provides detailed mathematical development of the theory.

Neurochronacic Theory This theory states that the frequency of the vocal fold vibration is determined by the cranaxia of the recurrent nerve,

And not by breath pressure or muscular tension.

Advocates of this theory thought that every single vibration of the vocal folds was due to an impulse from the recurrent laryngeal nerves,

And that the acoustic center in the brain regulated the speed of vocal fold vibration.

Speech and voice scientists have long since abandoned this theory as the muscles have been shown to not be able to contract fast enough to accomplish this vibration.

In addition,

Persons with paralyzed vocal folds can produce phonation,

Which would not be possible according to this theory.

Phonation occurring in excise laryngeal nerves would also not be possible according to this theory.

State of the glottis In linguistic phonetic treatments of phonation,

Such as those of Peter Latifogad,

Phonation was considered to be a matter of points on a continuum of tension and closure of the vocal cords.

More intricate mechanisms were occasionally described,

But they were difficult to investigate,

And until recently the state of the glottis and phonation were considered to be nearly synonymous.

If the vocal cords are completely relaxed with the arytenoid cartilages apart from maximum airflow,

The cords do not vibrate.

This is voiceless phonation,

And is extremely common with substituents.

If the arytenoids are pressed together for glottal closure,

The vocal cords block the airstream,

Producing stop sounds such as the glottal stop.

In between,

There is a sweet spot of maximum vibration.

Also the existence of an optimal glottal shape for ease of phonation has been shown,

At which the lung pressure required to initiate the vocal cord vibration is minimum.

This is modal voice,

And is the normal state for vowels and sonorants in all the world's languages.

However,

The aperture of the arytenoid cartilages and therefore the tension in the vocal cords is one of degree between the end points of open and closed,

And there are several intermediate situations utilized by various languages to make contrasting sounds.

For example,

Gujarati has vowels with a partially lax phonation called breathy voice or murmured voice,

While Burmese has vowels with a partially tense phonation called creaky voice or laryngealized voice.

The Jalapa dialect of Mazatec is unusual in contrasting both with modal voice and a three-way distinction.

Note that Mazatec is a tonal language,

So the glottis is making several tonal distinctions simultaneously with the phonation distinctions.

Javanese does not have modal voice in its stops,

But contrasts two other points along the phonation scale with more moderate departures from modal voice,

Called slack voice and stiff voice.

The muddy consonants in Shanganese are slack voice.

They contrast the tenuous and aspirated consonants.

Although each language may be somewhat different,

It is convenient to classify these degrees of phonation into discrete categories.

A series of seven alveolar stops with phonations ranging from an open lax to a closed tense glottis are open glottis,

Sweet spot,

Closed glottis.

The IPA diacritics under ring and superscript wedge,

Commonly called voiceless and voiced,

Are sometimes added to the symbol for a voiced sound to indicate more lax,

Open,

Slack,

And tense closed,

Stiffed states of the glottis respectively.

Ironically,

Adding the voicing diacritic to the symbol for a voiced consonant indicates less modal voicing,

Not more,

Because a modally voiced sound is already fully voiced,

And its sweet spot and any further tension in the vocal cords dampens their vibration.

Alsatian,

Like several Germanic languages,

Has a typologically unusual phonation in its stops.

The consonants transcribed ambiguously called lenis are partially voiced.

The vocal cords are positioned as for voicing,

But do not actually vibrate.

That is,

They are technically voiceless,

But without the open glottis usually associated with voiceless stops.

The contrast was both modally voiced b-d-g and modally voiceless p-t-k in French borrowings,

As well as aspirated k-word initially.

If the arytenoid cartilages are parted to admit turbulent airflow,

The result is whisper phonation if the vocal folds are adducted,

And whispery voice phonation,

Murmur,

If the vocal folds vibrate modally.

Whisper phonation is heard in many productions of French,

Oui,

And the voiceless vowels of many North American languages are actually whispered.

Glottal Consonants It has long been noted that in many languages both phonologically and historically,

The glottal consonants do not behave like other consonants.

Phonetically,

They have no manner or place of articulation other than the state of the glottis.

Some phoneticians have described these sounds as neither glottal nor consonantal,

But instead as instances of pure phonation.

At least in many European languages.

However,

In Semitic languages they do appear to be true glottal consonants.

Supraglottal phonation In the last few decades it has become apparent that phonation may involve the entire larynx,

With as many as six valves and muscles working either independently or together.

From the glottis upward,

These articulations are 1.

Glottal the vocal cords,

Producing the distinctions described above.

2.

Ventricular the false vocal cords,

Partially covering and damping the glottis.

3.

Aritinoid sphincteric compression forwards and upwards.

4.

Epiglottal pharyngeal retraction of the tongue and epiglottis,

Potentially closing onto the pharyngeal wall.

5.

Raising or lowering of the entire larynx.

6.

Narrowing of the pharynx.

Until the development of fiber optic laryngoscopy,

The full involvement of the larynx during speech production was not observable,

And the interactions among the six laryngeal articulations is still poorly understood.

However,

At least two supraglottal phonations appear to be widespread in the world's languages.

These are harsh voice,

Ventricular or pressed voice,

Which involves overall constriction of the larynx,

And focalized voice,

Hollow or yawny voice,

Which involves overall expansion of the larynx.

3.

Meet your Teacher

Benjamin BosterPleasant Grove, UT, USA

4.8 (234)

Recent Reviews

Alicia

November 23, 2024

I never gave much thought to how humans are able to speak. Now I know just a little bit about it because thankfully, I fell asleep after about five mins of listening. This is a snoozer topic!

Julie

December 3, 2022

Love love love. Please do one on atoms, electrons, the spectrum of light, quantum physics, the periodic table of elements, iron, mitochondria, the cell, photosynthesis……talk to me about things I love in that deep soothing voice of yours! 🤍✨😘

Coral

August 30, 2022

Fantastic. The topic was interesting but the smooth, velvet voice lulled me into a deep, satisfyingly restful sleep. Thank you

Rebecca

April 20, 2022

These are amazing! Works every time

Chilli

April 20, 2021

Well done, Benjamin - every time, without fail 😊

Tanja

March 23, 2021

Great voice. Great topic. Slept well.

Rachel

March 15, 2021

It makes me happy when new podcasts come out. As great as ever. Interesting as usual but that voice always helps me drift off. Recommended for anyone who likes to fall asleep to a voice rather than music.

More from Benjamin Boster

Loading...

Related Meditations

Loading...

Related Teachers

Loading...
© 2026 Benjamin Boster. All rights reserved. All copyright in this work remains with the original creator. No part of this material may be reproduced, distributed, or transmitted in any form or by any means, without the prior written permission of the copyright owner.

How can we help?

Sleep better
Reduce stress or anxiety
Meditation
Spirituality
Something else