Improving Audio Quality of Online Conference Application

A project by Ocean Cheng, Senyuan Fan, Beau Hanson, Aaron Messina, and Jiwei Zou

Online conference applications like ZOOM, Google Meet and Discord have been ubiquitous in our lives since COVID-19. While these apps are powerful and constantly evolving, they usually prioritize video quality and features before audio. We investigated ways to improve audio quality and experience of our online conferences.

Audio Conference Demo

© Immersitech 2021

Here is a quick demonstration of an online conference featuring our audio algorithms, including background noise removal, dynamic compression and stereo panning. The algorithms are toggled on and off throughout the meeting, as shown in the center of the meeting’s interface. The signal-to-noise ratio (SNR) of the given speech is also displayed in real-time, and can be seen on the right.

Thank You to

Our quest to improve online audio conferencing could not have happened without the help of immersitech, a local Rochester, NY based company dedicated to similar values of excellence in online conferencing. Their website can be found here.

Background

We conducted a survey to consider audio communication issues that students and professors deal with. The survey had 25 responses, and was sent to students and professors at the University of Rochester.

Chart showing responses to most prevalent issue in audio conferencing

Other takeaways (based on popular softwares)
- Speech transmission rating: 3.5/5
- Music transmission rating: 2.4/5

Algorithms

Noise Reduction
- The Wiener Filter [Philipos C. Loizou “Speech Enhancement, Theory and Practice”] is a classic way to reduce the stationary noise in audio.
- It uses a threshold to distinguish between noise signal and noisy speech signal, just like the noise gate. Instead of cutting out the noise frames, it makes use of them to calculate the Power Spectral Density (PSD) of the noise.
- Using the noise and speech PSD, it calculates and updates the frequency domain filter that will be applied to the spectrum of the raw signal. This helps reduce the noise even when speech.

Block Diagram of the Noise Reduction Algorithm

Dynamic Range Compression
- This form of compression increases intelligibility (Reinhart & Souza 2016), and can also reduce online meeting fatigue.
- Mode-adjustable parameters of Attack & Release times, Threshold
- Better regulates user volume disparities

Block Diagram of the compressor algorithm

Noise Gating
- This form of noise removal is different from the Noise Reduction algorithm in that it only targets noise below a given threshold. The noise gate mainly removes background noise present when the user is not speaking
- Mode-adjustable parameters of Threshold, Attack time and Release time for gain smoothing

Block diagram of the noise gating algorithm

Evaluation: WADA-STNR
- The amplitude distribution of speech can be approximated with Gamma speech, even with noise. [1]
- STNR can be estimated via examining the amplitude distribution of corrupted speech.

Block diagram of the STNR evaluation algorithm

Modes

Music/Transparent Mode
- All algorithms removed, bare processing to ensure transparency
Quiet Room
- Minimal gating and compression
Cafe
- Normal gating and compression
- Noise reduction engaged
Industrial
- Aggressive gating and compression
- Noise reduction engaged

Audio Examples

Speech Ex. 1 (Minimal Noise), Gz: 1.78, STNR: 100

Speech Ex. 1 – Processed using Mode 1, Gz: 5.08, STNR: 100

Speech Ex. 2 (Noisy), Gz: 0.70, STNR: 11.43

Speech Ex. 2 – Processed using Mode 2, Gz: 1.73, STNR: 100

References

[1] Kim, Chanwoo, and Richard M. Stern. “Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis.” Ninth Annual Conference of the International Speech Communication Association. 2008.