A project by Ocean Cheng, Senyuan Fan, Beau Hanson, Aaron Messina, and Jiwei Zou
Online conference applications like ZOOM, Google Meet and Discord have been ubiquitous in our lives since COVID-19. While these apps are powerful and constantly evolving, they usually prioritize video quality and features before audio. We investigated ways to improve audio quality and experience of our online conferences.
Audio Conference Demo
Here is a quick demonstration of an online conference featuring our audio algorithms, including background noise removal, dynamic compression and stereo panning. The algorithms are toggled on and off throughout the meeting, as shown in the center of the meeting’s interface. The signal-to-noise ratio (SNR) of the given speech is also displayed in real-time, and can be seen on the right.
Thank You to
Our quest to improve online audio conferencing could not have happened without the help of immersitech, a local Rochester, NY based company dedicated to similar values of excellence in online conferencing. Their website can be found here.
Background
We conducted a survey to consider audio communication issues that students and professors deal with. The survey had 25 responses, and was sent to students and professors at the University of Rochester.
- Other takeaways (based on popular softwares)
- Speech transmission rating: 3.5/5
- Music transmission rating: 2.4/5
Algorithms
- Noise Reduction
- The Wiener Filter [Philipos C. Loizou “Speech Enhancement, Theory and Practice”] is a classic way to reduce the stationary noise in audio.
- It uses a threshold to distinguish between noise signal and noisy speech signal, just like the noise gate. Instead of cutting out the noise frames, it makes use of them to calculate the Power Spectral Density (PSD) of the noise.
- Using the noise and speech PSD, it calculates and updates the frequency domain filter that will be applied to the spectrum of the raw signal. This helps reduce the noise even when speech.
- Dynamic Range Compression
- This form of compression increases intelligibility (Reinhart & Souza 2016), and can also reduce online meeting fatigue.
- Mode-adjustable parameters of Attack & Release times, Threshold
- Better regulates user volume disparities
- Noise Gating
- This form of noise removal is different from the Noise Reduction algorithm in that it only targets noise below a given threshold. The noise gate mainly removes background noise present when the user is not speaking
- Mode-adjustable parameters of Threshold, Attack time and Release time for gain smoothing
- Evaluation: WADA-STNR
- The amplitude distribution of speech can be approximated with Gamma speech, even with noise. [1]
- STNR can be estimated via examining the amplitude distribution of corrupted speech.
Modes
- Music/Transparent Mode
- All algorithms removed, bare processing to ensure transparency
- Quiet Room
- Minimal gating and compression
- Cafe
- Normal gating and compression
- Noise reduction engaged
- Industrial
- Aggressive gating and compression
- Noise reduction engaged