Earl-bird rates still available on the next workshops! (Brussels)
The COVID-19 pandemic permanently altered the way we work and communicate. Platforms like Microsoft Teams, Zoom, and Google Meet—initially adopted as emergency solutions during lockdowns—have become indispensable tools for professional and social interactions. These platforms are no longer merely technical conveniences; they are now the largest public speaking stage in history. Every day, millions of professionals rely on them to present ideas, lead teams, and collaborate with colleagues across the globe.
In this digital space, where audio and video mediate human connection, the stakes are high. Credibility, trust, and leadership often hinge on a speaker's ability to convey confidence and authenticity. A polished presentation and thoughtful words can fall flat if the voice delivering them sounds aggressive, distorted, harsh, or artificial. Most users assume that any audio issues they encounter are caused by their internet connection or hardware. Few realize the profound role that live digital signal processing (DSP) algorithms and platform defaults play in reshaping, and often degrading, the quality of their voice. I have thoroughly discussed this here.
This problem, though universal, has an unintentional gendered dimension. Women’s voices, due to their natural acoustic properties, are more heavily affected by the aggressive processing applied by these platforms. While male voices also suffer from significant distortion, female voices often emerge from the digital ether sounding piercing, jarring, or grating—qualities that can subtly undermine credibility and professional presence. This issue is not rooted in patriarchal intent but in the technological limitations and design assumptions of some of the tools we now use daily. The result, however, is a distinct disadvantage for women in professional settings.
How Live Digital Signal Processing Alters Voices: A Universal Problem with Uneven Impacts
The audio processing tools embedded in videoconferencing platforms are designed to solve practical problems: removing background noise, evening out volume levels, and ensuring intelligibility across diverse environments. These tools rely on live Digital Signal Processing (DSP) algorithms to manage audio in real time. While the intent is to improve clarity and make meetings more productive, the execution has unintended consequences: all voices are distorted, losing their natural timbre and nuance. I have explained here why meetings actually become less productive because of the way participants sound.
Live DSP algorithms are typically configured to aggressively compress and equalize the audio signal, which impacts both male and female voices. Male voices lose their depth and warmth, sounding flat and tinny. However, female voices suffer even more pronounced degradation due to their denser harmonic content in high-frequency ranges, which live DSP algorithms tend to overboost or distort. This results in piercing, metallic, jarring tones that can be particularly grating to listeners.
Puberty affects male and female voices differently. The vocal cords of adult females are typically shorter and thinner, and their larynx (the organ housing the vocal cords and serving as the primary resonator) is generally smaller. Additionally, several other components of the female vocal tract, which also acts as a resonator, are smaller on average. These differences result in higher pitch and louder harmonic components in the higher frequency bands of the audible spectrum. When these components are not overboosted by poorly programmed algorithms, female voices do not sound aggressive and can even be particularly pleasant to listen to.
In professional settings, distorted voices (and voices with a flat or jarring timbre in general) undermine credibility and authority. A natural, authentic-sounding voice signals trust, confidence, and authenticity. When live DSP algorithms strip voices of their natural timbre, speakers risk appearing less trustworthy or authoritative.
Evidence of distorted and grating audio signals triggering fight-or-flight responses in listeners has been documented in the literature (this article contains numerous sources). Exposure to high-frequency distortion can cause immediate stress responses. Prolonged exposure leads to listening fatigue, further diminishing engagement with female speakers and reducing the impact of their contributions.
Women, already facing societal biases in leadership roles, bear the brunt of this technological distortion.
While live DSP managed by AI algorithms affects all voices, the specific characteristics of female voices exacerbate the distortion:
Frequency Sensitivity: Female voices are denser in the higher frequency bands. Live DSP algorithms often target and manipulate these ranges aggressively, introducing sharpness or harshness absent in natural speech.
Dynamic Range Compression: Autogain features amplify softer components of speech, such as sibilance and other high-frequency elements. While these features are present in both male and female voices, the naturally higher harmonic density in female voices intensifies the distortion.
Cumulative Processing: Layers of processing—from microphones and platforms —compound the degradation. Female voices, with their higher spectral richness in the areas targeted by algorithms, are more susceptible to overprocessing at every stage.
Male voices are also heavily degraded, but the specific interactions between live DSP algorithms and female vocal characteristics result in harsher, more jarring, and distracting artifacts for women. The voices of children can be affected even worse than those of adult women, but children rarely participate in professional meetings, so they are not considered in this article.
Real Solutions: Turn Off Live DSP, Use a Real Microphone, and Preserve Natural Sound
The answer to this problem is not refining DSP algorithms. By their very nature, live DSP tools are bound to make errors when guessing how to "improve" voices, and they operate under the superficial assumption that good sound quality and speech intelligibility depend on the absence of background noise and an artificially “even” (and therefore flat) signal. These assumptions ignore the properties of the human auditory system but are excellent marketing tools for products that claim to help people “understand speech better.”
No voice can be improved through aggressive automated processing; preserving the natural characteristics of a voice and transmitting it without unnecessary manipulation is the best favour you can do for any speaker. Solutions include:
Disabling DSP: Platforms should allow users to completely disable DSP tools. Options like Zoom’s “Original Sound” mode are an excellent example. With a 30-second configuration, you can bypass noise suppression, autogain, and other live DSP processes that distort the quality of your voice.
Using Better Microphones: Even the most basic USB podcasting microphone (cardioid) will capture your voice far more faithfully than expensive ambient microphones, Bluetooth earpods, built-in computer microphones, or even the boom-microphone headsets often recommended or provided by IT specialists.
Educating Users: Instead of preaching the questionable narrative of background noise removal, organizations should prioritize educating employees about the importance of disabling unnecessary audio processing and investing in proper audio equipment to ensure voices are transmitted authentically. Doing so reduces workplace stress and ensures women are not put at a disadvantage.
These solutions recognize that natural sound cannot be “improved” in real time without compromising its integrity. Instead, fidelity must be preserved by transmitting voices as they are.
A practical demonstration of how disabling unnecessary processing and using the correct type of microphone is given in the videoclip below:
Videoconferencing has become the world’s largest public speaking stage, but the tools designed to facilitate communication often distort and degrade voices. While this affects everyone, female voices suffer disproportionately due to their acoustic properties and the way DSP algorithms interact with their harmonic structure. The solution is clear: stop trying to “improve” voices with live DSP and instead focus on relaying them as faithfully as possible.
Achieving sound fidelity in videoconferencing is not about sophisticated processing—it’s about turning off unnecessary processing and allowing voices to speak for themselves. In doing so, we can ensure that every voice, regardless of pitch or timbre, is heard both clearly and authentically. This isn’t just a technical fix; it’s a step toward fairness and equity in the digital workplace.
Want to ensure your voice sounds clear and natural during videoconferences? I can help. Reach out, and let’s get started.
© Andrea Caniato, December 2024