Real-time Echo Cancelation Device for Flat Panel Smart Speaker

Background

When we play music to and record speech from a flat panel speaker at the same time, the recorded speech will have music interference. The music interference will hurt performance in speech recognition, which is essential to smart speakers. To solve this problem, we propose an end-to-end echo cancellation device to remove the music interference. The device involves an echo-cancelation algorithm implemented on a Teensy USB Development Board as well as hardware interfacing between the flat panel speaker and Teensy.

Signal Flow

When a user speaks to the flat panel, the flat panel records not only the speech but also music played back. The piezo pramplifier amplifies the signal by the optimal gain, and then input the signal to Teensy development board with audio shield, within which the echo cancelation algorithm is implemented. Once the music echo is canceled, the signal will be fed in to the host device where speech recognition happens and music signal is output to Teensy. Finally the music signal will be amplified by a class D amplifier to a level that the driver on the flat panel can play sound out.

Echo Cancelation Algorithm

Estimation happens in the time domain and the frequency domain separately.

What we have in our hands are recorded signal which is a mix of speech and music echo, and source music signal. First of all, we convolve the source music signal with the inpulse response on the flat panel. After that, we use cross correlation algorithm to estimate the difference in time of the convolved signal and the music contained in recorded signal in order to align the two signals. Next, we use LMS algorithm to estimate the gain that should be applied to the signal we have in order to make the amplitude of the signal being processed as close of that of the music contained in recorded signal. Now we have approximated the music echo that should be substracted from the recorded signal, and the last step is to substract it and get the clean speech signal.

Implementation

Software: Audio Stream Objects[1]

Convolve: Estimate the spectral domain interference with double overlap-add.

Align: Estimate the time domain interference using cross correlation.

Optimal Gain: Estimate the energy loss in transit with gradient descent.

Hardware: System Design

Piezo Preamplifier [2] boosts the signal from a piezo sensor on the flat panel to line-level for processing by the Teensy, while keeping signal within a safe range to protect the Teensy audio shield from excessive signals. A switch controlled by the Teensy applies a gain of one when music is playing, or a gain of three without music.

Class D Amplifier receives music at line-level from the Teensy and boosts it to a higher amplitude before sending the music to the driver. The amplifier is designed for stereo, 4Ω drivers. The TPA3122D2N class D chip [3] is used, which provides mute, shutdown, and gain controls that are set using Teensy’s GPIO.

Power System receives 18VDC, 49W from a wall AC/DC adapter, selected for the class D amplifier’s power requirement. This is reduced to 9VDC through a linear voltage regulator to power the preamplifier.

Bypass option is available for both the input and the output of the system, allowing users to apply the echo cancellation algorithm to any other smart speaker system. Users may connect their own microphone and preamp at the input, or their own amplifier and driver at the output.

System Performance Verification

End-to-End Processing Latency: Time elapsed between when the signal is received by Teensy and when it is processed and ready to be output.
Music Reduction Quality: Difference in the level of music interference between processed and unprocessed signal.
Word Error Rate (WER) : Percentage of words that are not correctly recognized by automatic speech recognition (ASR) system on the host device. OpenAI Whisper [4] was selected as the ASR system.

Result

Reference

[1]“Teensy audio library,” PJRC. [Online]. Available: https://www.pjrc.com/teensy/td_libs_Audio.html. [Accessed: 26-Apr-2023].
[2]“15-W STEREO CLASS-D AUDIO POWER AMPLIFIER,” Texas Instruments, 2007. [Online]. Available: https://www.ti.com/lit/ds/symlink/tpa3122d2.pdf. [Accessed: 30-Jan-2023]
[3] R. Elliott, “Piezo Pickup Preamplifiers,” Elliott Sound Products, Mar-2020. [Online]. Available: https://sound-au.com/project202.htm. [Accessed: 13-Nov-2022].
[4]“Introducing whisper,” Introducing Whisper. [Online]. Available: https://openai.com/research/whisper. [Accessed: 26-Apr-2023].