Tutorial 12: Emotion and Affect in Next-Generation Speech Analysis

Presented by

Björn Schuller

Abstract

Human-machine and -robot dialog in the next generation will be dominated by natural speech in the sense of full spontaneity and thus driven by emotion. Systems will not only be expected to cope with affect throughout actual speech recognition, but at the same time to detect emotional and related patterns as well as non-linguistic vocalization as laughter and further social signals for appropriate reaction. Such analysis clearly must be made independent of the speaker and for all speech that “comes in” rather than only for pre-selected and pre-segmented prototypical cases. In addition - as in any speech processing task - noise, coding and blind speaker separation artifacts, together with transmission errors need to be dealt with. To provide appropriate back-channeling and socially competent reaction fitting the speaker’s emotional state in time, on-line and incremental processing will be among further concerns. Once affective speech processing is applied in real-life, novel issues as standards, confidences, distributed analysis, speaker adaptation and emotional profiling are coming up next to appropriate interaction and system design. In this respect this tutorial aims at giving in-depth insight into analysis of natural, spontaneous, and thus emotional speech by giving a broad overview, hands-on experience with recent tools, and revealing of black holes for future research endeavors.

The main audience is likewise a broad group of potentially interested experts in dialogue system engineering, speech recognition, natural language understanding or general speech processing: emotion touches any of these fields and may be expected as integral factor in future speech user interfaces and multimedia retrieval and surveillance systems. However, also the expert working exactly in the field is addressed due to the novelty of many discussed issues.

Speaker Biography

Björn Schuller received his diploma and his doctoral degree in electrical engineering and information technology for his works in Automatic Speech and Emotion Recognition from TUM (Munich University of Technology), one of Germany's first three Excellence Universities, where he currently stays as senior researcher leading the work group on Intelligent Speech and Music Processing and lecturer in Pattern Recognition and Speech Processing. He is a member of the IEEE, ACM and ISCA, and authored and co-authored more than 120 publications in books, journals, and peer reviewed conference proceedings in the field of audiovisual signal processing and machine learning. Best known are his works advancing Speech Processing and Affective Computing. He is a member of the steering committee of the IEEE's upcoming Transactions on Affective Computing and serves as associate editor and reviewer for the major scientific journals in the field, including the IEEE Transactions on Audio, Speech, and Language, Multimedia, and the Signal Processing Letters, the Elsevier Computer Speech and Language, Speech Communication, and Signal Processing, and the EURASIP Journals on Advances in Signal Processing (JASP) and Audio Speech and Music Processing (JASMP), and as invited speaker, session organizer and chairman, and program committee member of numerous international conferences. Project steering board activities and involvement in actual and past research projects in the field include SEMAINE dealing with Sensitive Artificial Listeners funded by the European Community's Seventh Framework Program, the HUMAINE CEICES initiative on Speech and Emotion, and projects funded by companies as BMW, Continental, Daimler, Siemens, Toyota, and VDO dealing with real-life application of emotion. Advisory board activities comprise his membership as invited expert in the W3C Emotion Incubator and Emotion Markup Language Groups for the specification of EmotionML - an emotion markup language, and his election into the Executive Committee of the HUMAINE Association for affective computing where he leads the Special Interest Group on Emotion Recognition from Speech. Finally, he is the initiator of the first challenge on emotion recognition from speech - the INTERSPEECH 2009 Emotion Challenge, where he also gives a tutorial on emotion recognition, and co-author of the first open source emotion recognition engine “openEAR” (www.openaudio.eu) to be introduced at IEEE Affective Computing and Intelligent Interaction 2010.