Keynote Speakers

We are pleased to welcome four distinguished keynote speakers at WASPAA 2013:



Wang08-3


From Auditory Masking to Binary Classification: Machine Learning for Speech Separation


DeLiang Wang, The Ohio State University, USA



Speech separation, or the cocktail party problem, is a widely acknowledged challenge. Part of the challenge stems from the confusion of what the computational goal should be. While the separation of every sound source in a mixture is considered the gold standard, I argue that such an objective is neither realistic nor what the human auditory system does. Motivated by the auditory masking phenomenon, we have suggested instead the ideal time-frequency binary mask as a main goal for computational auditory scene analysis. This leads to a new formulation to speech separation that classifies time-frequency units into two classes: those dominated by the target speech and the rest. In supervised learning, a paramount issue is generalization to conditions unseen during training. I describe novel methods to deal with the generalization issue where support vector machines (SVMs) are used to estimate the ideal binary mask. One method employs distribution fitting to adapt to unseen signal-to-noise ratios and iterative voice activity detection to adapt to unseen noises. Another method learns more linearly separable features using deep neural networks (DNNs) and then couples DNN and linear SVM for training on a variety of noisy conditions. Systematic evaluations show high quality separation in new acoustic environments.

Download presentation

Biography

DeLiang Wang received the B.S. degree and the M.S. degree from Peking (Beijing) University and the Ph.D. degree in 1991 from the University of Southern California all in computer science. Since 1991, he has been with the Department of Computer Science & Engineering and the Center for Cognitive and Brain Sciences at The Ohio State University, where he is a Professor. He was a visiting scholar in the Department of Psychology at Harvard University from 1998 to 1999, and at Oticon A/S in Denmark from 2006 to 2007. Wang’s research interests include machine perception and neurodynamics. He received the Office of Naval Research Young Investigator Award in 1996, the 2005 Outstanding Paper Award from IEEE Transactions on Neural Networks, and the 2008 Helmholtz Award from the International Neural Network Society. He is an IEEE Fellow, and currently serves as Co-Editor-in-Chief of Neural Networks.


Martin_klein_bunt


Speech Enhancement for Hearing Instruments: Enabling Communication in Adverse Conditions


Rainer Martin, Ruhr-Universität Bochum, Germany



Hearing instruments are frequently used in notoriously difficult acoustic scenarios. Even for normal-hearing people ambient noise, reverberation and echoes often contribute to a degraded communication experience. The impact of these factors becomes significantly more prominent when participants suffer from a hearing loss. Nevertheless, hearing instruments are frequently used in these adverse conditions and must enable effortless communication.

In this talk I will discuss challenges that are encountered in acoustic signal processing for hearing instruments. While many algorithms are motivated by the quest for a cocktail party processor and by the high-level paradigms of auditory scene analysis a careful design of statistical models and processing schemes is necessary to achieve the required performance in real world applications. Rather strict requirements result from the size of the device, the power budget, and the admissable processing latency.

Starting with low-latency spectral analysis and synthesis systems for speech and music signals I will continue highlighting statistical estimation and smoothing techniques for the enhancement of noisy speech. The talk emphasizes the necessity to find a good balance between temporal and spectral resolution, processing latency, and statistical estimation errors. It concludes with single and multi-channel speech enhancement examples and an outlook towards opportunities which reside in the use of comprehensive speech processing models and distributed resources.

Download presentation

Biography

Rainer Martin received the M.S.E.E. degree from Georgia Institute of Technology, Atlanta, in 1989 and the Dipl.-Ing. and Dr.-Ing. degrees from RWTH Aachen University, Aachen, Germany, in 1988 and 1996, respectively. From 1996 to 2002, he was a Senior Research Engineer with the Institute of Communication Systems and Data Processing, RWTH Aachen University. From April 1998 to March 1999, he was on leave to the AT&T Speech and Image Processing Services Research Lab, Florham Park, NJ. From April 2002 until October 2003, he was a Professor of Digital Signal Processing at the Technische Universität Braunschweig, Braunschweig, Germany.

Since October 2003, he has been a Professor of Information Technology and Communication Acoustics at Ruhr-Universität Bochum, Bochum, Germany, and from October 2007 to September 2009 Dean of the Electrical Engineering and Information Sciences Department. His research interests are signal processing for voice communication systems, hearing instruments, and human–machine interfaces. He is coauthor with P. Vary of Digital Speech Transmission – Enhancement, Coding and Error Concealment (Wiley, 2006) and coeditor with U. Heute and C. Antweiler of Advances in Digital Speech Transmission (Wiley, 2008). Dr. Martin served as an Associate Editor for the IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING and is a member of the Speech and Language Processing Technical Committee of the IEEE Signal Processing Society.


Peter Vary-WASPAA_2013


Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids: Synergies and Distinctions


Peter Vary, RWTH Aachen University, Germany



Mobile phones and modern hearing aids comprise advanced digital signal processing techniques as well as coding algorithms.

From a functional point of view, digital hearing devices and mobile phones are
approaching each other. In both types of devices similar or partly even identical algorithms can be found such as echo, reverberation and feedback control, noise reduction, intelligibility enhancement, artificial bandwidth extension, and binaural processing with two or more microphones.

Actual hearing aids include digital audio receivers and transmitters not only for communication and entertainment but also for binaural directional processing. State-of-the-art mobile phones offer new speech-audio compression schemes for the emerging HD-telephone services and they are equipped with two (or more) microphones for the purpose of speech enhancement. Thus, it is not a too big step to realize hearing aid features as apps on smart phones. The further evolution might lead us to binaural mobile telephony, providing ambient and spatial information – a preferred solution for audio conferencing, for example.

Despite these relations, the signal conditions and the processing constraints are quite different, e.g., with respect to coherence of signals, complexity of algorithms, coding-noise shaping for binaural processing, power consumption, and latency. Synergies and distinctions of the corresponding signal processing and coding algorithms will be discussed. Design constraints and solutions will be presented by examples.

Download presentation

Biography

Peter Vary received the Dipl.- Ing. degree in electrical engineering from the Technical University of Darmstadt, Germany, in 1972 and the Dr.-Ing. degree from the University of Erlangen-Nuernberg, Germany, in 1978.

In 1980, he joined Philips Communication Industries (PKI), Nuremberg, Germany, where he was in charge of the Digital Signal Processing Group which was involved in the GSM standardization process. Since 1988, he has been a Professor at RWTH Aachen University, Aachen, Germany, and head of the Institute of Communication Systems and Data Processing.

His main research interests are digital wireless communications, including speech coding, joint source-channel coding, error concealment, and speech enhancement for mobile phones and digital hearing aids.

Peter Vary is a Fellow of the IEEE Signal Processing Society.


mug



About this Non-Negative Business


Paris Smaragdis, University of Illinois, USA




The foundations of signal processing are firmly set in least squares, an approach that has served us very well for years (and still does). With the increasing presence of machine learning and sophisticated statistics in audio processing, we are slowly seeing that not everything has to be based on Gaussians anymore. One recently popular approach along these lines is that of non-negative modeling, especially in problems that involve complex audio mixtures. In this keynote I’ll talk about how these models came to be, what they can do, why they have been so successful, and I’ll ponder on what the future holds as new developments are continuously coming in.

Download presentation

Biography

Paris Smaragdis is an assistant professor in the Computer Science and the Electrical and Computer Science departments at the University of Illinois at Urbana-Champaign. He completed his graduate and postdoctoral studies at MIT, where he conducted research on computational perception and audio processing. Prior to joining University of Illinois he was a senior research scientist at Adobe Systems and a research scientist at Mitsubishi Electric Research Labs, during which time he was selected by the MIT Technology Review as one of the top 35 young innovators of 2006. He is a senior member of the IEEE, the chair of the Machine Learning for Signal Processing technical committee, the chair of the Latent Variable analysis steering committee, and a member of the Audio and Acoustic Signal Processing technical committee. His research interests lie in the intersection of machine learning and signal processing.

Comments are closed.