Department of Electrical and Computer Engineering Ph.D. Public Defense
Instantaneous Frequency Analysis of Reverberant Audio
Sarah Rose Smith
Supervised by Professor Mark Bocko
Thursday, June 20, 2019
10 a.m.
Computer Studies Building Room 601
This thesis investigates the interaction of modulated audio signals, including those found in speech and music, with acoustic spaces. Sounds that are generated by the regular oscillation of physical bodies, such as the vocal cords or a musical instrument, typically display a spectrum containing a fundamental frequency and a set of harmonically related overtones. However, the constituent frequency components are commonly modulated in both amplitude and frequency. These modulations, referred to as vibrato in a musical context, contain significant information about how the sound was generated and its interaction with the acoustic space. Instantaneous frequency tracking, therefore, constitutes an important step in many audio signal processing algorithms. For example, frequency modulations within a note can be used to group sinusoidal components from the same source within a polyphonic texture. However, the extracted frequency tracks are generally interpreted as a parameter of the source signal and the impact of the acoustic environment is rarely considered. This thesis provides a detailed model for the effects of the acoustic space on the instantaneous frequency trajectories of a recorded sound and explores the coherence properties of the resulting class of signals.
First, a set of experiments is presented documenting the instantaneous frequency deviations that characterize reverberant audio. These results are presented in relation to both room reverberation and, in the case of musical vibrato, the internal resonant modes of the musical instrument. Next, a predictive model is developed to isolate the effects of different portions of a room response on instantaneous frequency tracking. Deterministic models are presented that predict the frequency deviations resulting from a set of discrete reflections or a sum of resonant modes and the relationship between statistical properties of the late reverberation and instantaneous frequency tracking is demonstrated through simulations. Each of these models is compared with the observed deviations from recorded impulse responses.
These models are supplemented with a statistical analysis of instrumental vibrato that can inform a parametric model of the observed frequency deviations in both anechoic and reverberant recordings. In this context, a metric for vibrato stability is proposed that can be used to differentiate individual players or instruments. Finally, these models are used to develop an algorithm for shifting the pitch of a reverberant signal while preserving the natural reverberation and consideration is given to the use of instantaneous frequency as a metric in acoustic characterization problems.