Course Description
Computer audition is the study of how to design a computational system that can analyze and process auditory scenes. Problems in this field include source separation (splitting audio mixtures into individual source tracks), pitch estimation (estimating the pitches played by each instrument), streaming (finding which sounds belong to a single event/source), source localization (finding where the sound comes from) and source identification (labeling a sound source).
This course is a graduate-level course (cross listed for senior undergrads) covering current research in the field. The class starts with a brief review of signal processing techniques, then introduces auditory models, audio features, and audio modeling methods. Recent advances in state-of-the-art research topics including multi-pitch analysis, source separation, source localization, and audiovisual analysis then follow.
In the first half of the semester, students will complete six programming assignments (Python programming) that cover the basics. Students are also required to read ten recently published papers in the field and write reviews about them. In the second half of the semester, students will go through the complete process of doing a small-scale research project. This includes selecting a topic, reading related papers, proposing and implementing ideas, presenting results and writing a report, and conducting peer reviews of other students' reports.
Course Information
Credits: 4
Lectures: 12:30-1:45PM on Tuesdays and Thursdays
Classroom: CSB 601
Prerequisites: ECE 246/446 or ECE 272/472 or other equivalent signal processing courses, and Matlab/Python programming. Knowledge of machine learning techniques such as Markov models, support vector machines, and neural networks is also helpful, but not required.
Textbook: No textbook is required. We will read a number of research papers in the field. The following texts are for references and have been put on reserve at UR library. Some excerpts of them will be provided to students.
- Meinard Mueller, Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Springer, 2015.
- Albert S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound. The MIT Press, 1990.
- DeLiang Wang, and Guy J. Brown, editors. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. IEEE Press / Wiley-Interscience, 2006.
- Anssi Klapuri, and Manuel Davy, editors. Signal Processing Methods for Music Transcription. Springer, 2006.
Instructor: Zhiyao Duan
Office: CSB 720
Phone: 585-275-5302
Email: zhiyao.duan (at) rochester.edu
Office hour: Tuesdays at 2-3 PM in CSB 720. Additional office hours by appointment.
TAs and Office Hours:
Huiran Yu, <hyu56 (at) ur.rochester.edu>, Mondays and Wednesdays at 2-3 PM in CSB 504.