ICASSP 2010 - 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing - March 14 - 19, 2010 - Dallas, Texas, USA

Show and Tell Demonstrations

Show and Tell demonstrations will take place Tuesday through Thursday in the Exhibit Hall and Poster area.

Tuesday, March 16, 2010, 10:30 - 12:30

Booth 104

University of Trento

Low-cost Human-Machine Interfaces for Unobtrusive Assisted Living and Rehabilitation Tools

Nicola Conci, Alfredo Armanini, Mattia Daldoss, and Francesco G.B. De Natale

We present two low-budget human-machine interfaces specifically designed to foster/enable the device-less interaction of people suffering physical/cognitive impairments with a computer, as requested by medical operators in the field. The first prototype is a tool for ALS patients that enables T9-like writing and implements simple entertainment applications through eye tracking. The second demo is a tool that allows rehabilitation processes for hands and arms, based on hand recognition and tracking. It provides the user with a number of exercises and stores the whole training session in order to allow medical staff evaluating the rehabilitation progresses also in remote. Both application are built under strong cost constraints, just requiring a general purpose PC and a low-cost webcam.

Both interfaces will be demonstrated and made available for testing on two separate laptops. Demonstration videos of the interfaces are available here:
http://www.disi.unitn.it/~conci/ICASSP2010_demo_Files.html

Booth 204

Institute for Telecommunication Sciences

ADPCM speech coding (G.726 and G.722) works better backwards than forwards

Stephen Voran

The standardized ADPCM speech coders (narrowband G.726 and wideband G.722) have provided years of distinguished service, continue to be useful, and are still specified in emerging services (e.g., DECT CAT-iq).

We show that time-reversing speech before ADPCM encoding and again time-reversing it after ADPCM decoding increases measured segmental SNR values and objective speech quality estimates compared to conventional use of ADPCM. These increases average about 1 dB and 0.1 MOS units respectively. The demonstration provides full details on these measured improvements.

But does this time-reversal procedure produce better sounding speech? The demonstration allows participants to perform A vs. B listening and decide for themselves. Previous demo participants have remarked that the time-reversal procedure results in “smoother” sound and “less coding noise.”

The demo explains the root cause of these results and identifies practical situations (causality is upheld) where these results might be advantageously applied. We also offer related but broader results on time-reversal and the prediction of music and speech samples.

Booth 304

University of Wisconsin – Madison, Boise State University, University of Wyoming

winDSK for the Texas Instruments L-138 DSK

Michael Morrow, Thad Welch, Cameron Wright

winDSK was originally created to support stand alone real-time DSP demonstrations for the Texas Instruments (TI) C31 DSK. A follow on product, winDSK6, enhanced winDSK significantly and offered updated demonstrations for a number of TI DSKs within the C6xxx family. TI is poised to transition to a new DSP platform for education, the OMAP-L138. This processor contains both a traditional DSP core as well as an ARM core. This significant hardware enhancement will be demonstrated by this updated version of winDSK6.

Booth 404

Applied Technology & Research, Starkey Labs, Inc.

MEMS accelerometer, cross-correlation based occlusion event detection in an assistive listening device.

Matthew J. Green & Thomas Burns, Ph.D.

A MEMS accelerometer serves as a sensor to detect the unnatural coloration of a person’s own voice caused by occlusion of the ear canal, known as the occlusion effect. A cross-correlation algorithm compares captured vibration signatures of events known to cause the occlusion effect with real-time data from the user’s ear canal in order to quickly detect the occlusion effect. Fast detection allows for sound field equalization to be applied in the ear canal before the occlusion effect is perceived.

Tuesday, March 16, 2010, 13:30 - 15:30

Booth 104

Texas Instruments

Traffic Sign Recognition on a DSP: An Example of Embedded Vision

Dr. Branislav Kisacanin

Traffic sign recognition is one of several emerging capabilities of advanced driver assistance and safety systems. In this demonstration, images captured by a 30 fps color camera are fed to an embedded processor and analyzed to detect circular traffic signs. When traffic signs are detected, they are further analyzed and classified within the set of European speed-limit signs.

Booth 204

INRIA Rennes, INRIA Grenoble

Searching with expectations : 10 million images indexed on my laptop

Hervé Jégou, Matthijs douze, Cordelia Schmid,

This demonstrator is a large-scale image search system that runs on a laptop. It allows the content based image search (CBIR) of a query image in a database comprising 10 millions images. The system is based on local descriptors, which ensures the invariance to rotation, scaling, cropping, etc. The indexing method is an improvement of the method proposed in the ICASSP paper “searching with expectations”, which represents an image by a fingerprint of a few dozen bytes and performs the query in less than 1 second.

Booth 304

AT&T Research

Mobile Question Answering

Mazin Gilbert, Srinivas Bangalore, Taniya Mishra

The explosion of multimedia contents, the proliferation of mobile devices and the availability of larger bandwidth and computing power are fueling a new and exciting field of mobile multimodal interfaces. This demonstration will highlight AT&T’s technology and innovations in mobile question answering including very large vocabulary speech recognition, natural language understanding and search. Whether using text or voice input, users can obtain precise answers to short questions and get responses instantly and automatically from a dataset of over 400 million questions/answers. A client/server-based iphone demonstration will be shown where processing is performed in the AT&T cloud (Joint partnership with ChaCha).

Booth 404

RWTH Aachen University

Matlab and the GNU Debugger: A Powerful Duo for Implementing DSP Algorithms?

Bernd Geiser & Stefan Kraemer, Jan Weinstock, Florian Heese, Marco Jeub, Thomas Esch, Rainer Leupers, Peter Vary

Today’s software debugging tools are generally tailored to the needs of programmers and software de-velopers. However, the needs of DSP algorithm designers (data analysis, post-processing, visualization and verification) are only poorly supported. A frequently applied makeshift solution is the manual instrumentation of the source code (e.g. C) for data logging purposes followed by external post-processing.

As a more efficient alternative, we propose to extend traditional software debugging tools (e.g. the GNU pro-ject debugger GDB) with powerful algorithm analysis capabilities, as for instance provided by Matlab. This idea has been implemented for the integrated development environment (IDE) Eclipse and for GDB. The enhanced debuggers provide, among others, the following features: breakpoints with data transfer to/from Matlab, non-intrusive data sampling, support for user-defined processing scripts and direct Matlab interaction. Even external hardware targets are supported via the GDB remote interface. The efficacy of the new tools is demonstrated in an example debugging session based on representative DSP algorithms.

URL: www.ind.rwth-aachen.de/~dspdebugging

Tuesday, March 16, 2010, 16:00 - 18:00

Booth 104

Purdue University

Light-Weight Image Processing on Cellular Phones and PDAs

Mireille Boutin and Edward Delp

We will demonstrate five different image processing applications implemented on a cellular phone or PDA:

1) An automatic text area segmentation algorithm implemented on a Nokia N800 Internet tablet.

2) An iPhone version of “The Rosetta Phone”, a system for the automatic translation of signs in foreign languages with a focus on Arabic scripts. (https://redpill.ecn.purdue.edu/~rp/)

3) An iPod-based system for the automatic translation and interpretation of Spanish language menus.

4) An automatic tracking system implemented on the Nokia N800 Internet tablet.

5) An image-based diet assessment tool implemented on the iPhone. (http://www.tadaproject.org/)

All underlying algorithms have a very low complexity so they can run on the device’s processor in real-time. In particular, the systems do not require the use of a network connection. Participants will be invited to try out the different systems and devices. Desktop simulations will also be presented in addition to short movies showcasing the application of the devices in different scenarios.

Booth 204

University of Texas Dallas

A low-cost, low-power DSP/FPGA/DRP-based Software-Defined Radio for Emergency Responders

Kamran Kiasaleh

Cost and power consumption remain as key issues in realizing practical software-defined radios. In this demonstration, we will show the flexibility of today’s DSP and FPGA platforms in realizing a true software-defined radio, capable of operating as a GSM cellular phone and an emergency P25 radio. In order to lower the cost of implementation and the overall power consumption, the RF frontend is realized using digital radio processor (DRP), which reduces the overall cost and power consumption of the transceiver by an order of magnitude as compared to conventional SDR solutions. A transmitter/receiver pair with handsets will be demonstrated. The transmitter and receiver use FPGA/DSP and DRP boards.

Booth 304

Microsoft Research

Personal 3D Audio System with Loudspeakers

Cha Zhang and Dinei Florencio

A three-dimensional audio system renders sound images around a listener by using either headphones or loudspeakers. The problem of loudspeaker based audio spatialization is more challenging because the sound signal from both speakers will be heard by the ears. Traditional 3D audio systems often have a limited sweet spot for the user to perceive 3D effects successfully. In this demo, we combine a 3D model based face tracker with dynamic binaural synthesis and dynamic crosstalk cancellation to build a true personal 3D audio system. The basic hardware setup is shown in the figure below. The webcam-based 3D face tracker provides accurate head position and orientation information to the binaural audio system, which uses the information to adaptively synthesize the target audio to be played by the loudspeakers. The system runs in real-time on a dual-core 3GHz machine, which serves the listener with realistic 3D auditory experiences.

Wednesday, March 17, 2010, 10:00 - 12:00

Booth 104

Li Creative Technologies, Inc.

Microphone Arrays for Conference Phone and Robust ASR

Qi (Peter) Li, Ph.D. & Manli Zhu, Bozhao Tan, Ted Wada, Zhongkai Zhang, Josh Hajicek, Uday Jain

Two prototypes of microphone arrays from our recent research are presented. The first one is a circular array for a new conference phone. The hardware consists of eight microphone components, a DSP processor and a USB interface. The real-time software is an implementation of our new algorithms in adaptive beam-forming, sound source localization, and adaptive noise reduction. The second one is a small USB array consists of four microphone components for robust speech recognition. As interactive demos, people can compare the recorded speech from the arrays and traditional microphones and find significant difference in voice quality and speech recognition accuracy.

Booth 204

Arizona State University

The interactive Java-DSP Tool and its applications in research and education

Andreas Spanias, Shalin Mehta, Homin Kwon, Karthikeyan Natesan Ramamurthy, Jayaraman J. Thiagarajan, Harish Krishnamoorthy, Narayan Kovvali, Mahesh Banavar

Java-DSP (J-DSP) is a universally and freely accessible online simulation tool that can be used for education and research. In this show and tell session, we will demonstrate several new functions developed on J-DSP to support a) a new interactive environment for DSP education and b) a real-time hardware interface with sensor motes.

The education application revolves around a new interactive learning software interface that has been developed to support an integrated learning framework. This integrated learning paradigm includes DSP simulations synchronized with a web-based DSP quiz, lecture videos, lecture notes, and animated signal processing demonstrations. This software interface organizes student learning using multiple learning environments linked together through quiz questions that guide the study of DSP fundamentals. This tool can be easily customized by instructors and provides features for analyzing and assessing student performance.

The research demonstration involves a real-time hardware interface of Java-DSP with a sensor network. The sensor network consists of sensor motes that can be controlled with Java-DSP. J-DSP can obtain and fuse measurements from five different sensors on the Crossbow sensor platform. Furthermore J-DSP can instruct different sensors to acquire and process data samples which are then fed to programmable J-DSP functions for real time signal and event analysis. A graphical user interface will be demonstrated to highlight several features of this J-DSP / sensor network interface.

Booth 304

Technion – Israel

Sub-Nyquist Sampling of Wideband Signals

Moshe Mishali & Yonina Eldar

We present a sub-Nyquist analog-to-digital converter of wideband inputs. Our circuit realizes the recently proposed modulated wideband converter, which is a flexible platform for sampling signals according to their actual bandwidth occupation. The theoretical work enables, for example, a sub-Nyquist wideband receiver, which has no prior information on the transmitter carrier positions. Our design supports input signals with 2 GHz Nyquist rate and 120 MHz spectrum occupancy, with arbitrary transmission frequencies. The sampling rate is as low as 280 MHz. To the best of our knowledge, this is the first reported wideband hardware for sub-Nyquist conversion. Our Show & Tell demonstration includes a board-level realization of the sub-Nyquist sampler. Our hardware connects to a surrounding environment, contributed by National Instruments, which provides the wideband inputs and processes the samples according to our algorithms. The demonstration system proves that sub-Nyquist sampling rate can be achieved without prior knowledge on the transmission frequencies and at real-time processing rates.

Wednesday, March 17, 2010, 13:30 - 15:30

Booth 104

Fondazione Bruno Kessler - irst, UNITN, Trento, Italy

Real-time Blind and Semi Blind Source Separation for speech enhancement and recognition

Francesco Nesta

This demonstration shows a new real-time prototype of Blind and Semi-Blind Source Separation (SBSS) for both speech enhancement and automatic speech recognition. Audio signals are played by four loudspeakers. The first two loudspeakers play two sentences which represent the source of interest and an interfering noise; the two given sources are blind, since they are not known by the SBSS system. The other two loudspeakers play a stereo TV audio signal which is known by the system. The SBSS system uses the signals recorded by two microphones and the original TV signal to jointly remove the corresponding stereo echoes and separate the two blind sources.

The user is able to evaluate the effect of the separation process by means of a stereo headphone and a switch to enable or disable the SBSS system. Furthermore, the impact of the source separation on a real-time speech recognition task is also shown.

Booth 204

Georgia Institute of Technology

Complete system for implementing audio signal processing on a reconfigurable-analog chip

Craig Schlottmann & Paul Hasler

We will demonstrate our system for performing analog signal processing on our reconfigurable platform, the RASP Field Programmable Analog Array (FPAA). We have developed this FPAA IC and the complete infrastructure to design ASP systems and program the chip. The user experience will include specifying analog filters using our Simulink interface and custom ASP blocks, compiling these systems and programming our FPAA, then testing the system on the real analog hardware. The user will be able to play an audio signal into the chip’s sound ports, and then play the filtered signals on external speakers. This type of platform will have educational implications, as well as introducing engineers to the benefits of ASP.

Booth 304

Queensland University of Technology

Enhancing Interactive Mobile Video Experience

Salahuddin Azad, Dian Tjondronegoro, Tony Wang, and Wei Song

Despite the superior coding efficiency of H.264/MPEG-4 AVC encoder, the perceptual quality of compressed videos can be unsatisfactory at the bandwidth currently available for the 3G mobile/wireless networks. Region-of-interest (ROI) based video coding can improve the perceptual quality of videos by selectively retaining quality in the areas where the viewers are more likely to pay more attention to. Our demonstration will showcase how our simple but robust technique identifies the regions-of-interest in news videos by using the motion of the salient objects in those videos. Also it will showcase how the quality improvement of the regions-of-interest can mask the lower quality in the regions outside the ROI. Users can interactively change the change the target bitrate and see the effect on the perceptual quality.

Booth 404

University of Texas Dallas

Real-Time Implementation of Speech Signal Processing Strategies on PDA Platform for Cochlear Implant Studies

Vanishree Gopalakrishna, Nasser Kehtarnavaz and Philipos C. Loizou

Cochlear implants are surgically implanted prosthetic devices which are used to provide a sensation of hearing in profoundly deaf people. Approximately 188,000 people around the world have been fitted with cochlear implants as of 2009. Interactive real-time implementations of various speech signal processing strategies, including the one presented at this conference on wavelet packet transform, have been achieved on the PDA platform using a combination of C and LabVIEW programming. Our developed strategies have led to lower computations and more accurate outcomes compared to the conventional ones. A live demo on both PC and PDA platforms will be shown.

Wednesday, March 17, 2010, 16:00 - 18:00

Booth 104

LENA Foundation & Center of Robust Speech Systems, University of Texas Dallas

LENA: The Language ENvironment Analysis System for Children of 0-4 Years Old

Dongxin Xu, Jill Gilkerson, Jeffrey Richards, John Hansen

Child development from age 0 to 4 is a significant emerging topic for parents/caregivers, clinicians and researchers. LENA is an innovative tool that enables data collection and analysis from children’s natural home environment in an easy, unobtrusive way by applying speech signal processing, pattern recognition technologies to day-long audio recordings. It provides the information about the quality of children’s daily language environments and their development status. LENA represents an important breakthrough in this area, providing a unique tool for parenting, clinical evaluation and research purposes. This demonstration will interactively show how the system works, highlighting several successful applications, including automatic childhood autism detection, TV’s negative impact on child development. The demonstration illustrates a truly innovative transition of basic research advancements in speech and language technology to address critical needs of the population.

URL: http://www.lenafoundation.org/

Booth 204

Georgia Institute of Technology

Stereo Acoustic Echo Cancellation Based on Independent Component Analysis and Integrated Residual Echo Enhancement

Ted S. Wada & Biing-Hwang Juang

This work is an extension of our WASPAA 2009 publication “Acoustic Echo Cancellation Based on Independent Component Analysis and Integrated Residual Echo Enhancement.” We will show through a real-time demonstration that a proper combination of batch-wise adaptation, regularization, and so-called residual echo enhancement allows the stereo acoustic echo cancellation (SAEC) to be performed effectively in the frequency domain with an adaptive algorithm based on least-mean square (LMS) when there are both ambient noise and speech at the near end. The technique, which is related closely to ICA-based semi-blind source separation (SBSS), not only allows continuous adaptation during very noisy situations but also enables the recovery of lost echo reduction performance due to the non-uniqueness problem that retards the convergence speed of LMS-based adaptive filters.

Booth 304

Texas Instruments

Medical Ultrasound Signal Processing Demonstration on the OMAP3530

Uday Gurnani & Rama Pailoor

The emergence of portable and handheld ultrasound machines is driving the need for high image quality with power efficiency and performance. This demonstration implements ultrasound b-mode, color-flow, and scan conversion processing on TI’s OMAP3530 EVM. Envelope detection, compression, ensemble aggregation, wall filter, flow estimation and scan conversion algorithms required for ultrasound processing were researched, coded, and optimized to run on the C64x+ DSP core embedded in the OMAP SOC. An innovative open-source Linux software framework showcases how to utilize the processing power of TI’s OMAP SOC where the ARM MPU runs a user interface, moves a post RF demodulated data set of a beating heart to the DSP for ultrasound processing and then routes the scan converted frames to the on-board display.

Booth 404

University of Strathclyde and Xilinx

DSP for FPGA Education using Xilinx FPGAs

Prof R.W. Stewart, Dr Louise Crockett, Dr Parimal Patel

This demonstration will feature the “XUP DSP Primer”, which is a University course for DSP for FPGAs based around Xilinx System Generator design tool, and the Xilinx Virtex5 XUP board. The DSP Primer features a complete set of lecture notes and materials, as well as a comprehensive laboratory manual with more than 200 exercises using real software and hardware. In this “show and tell”, we will demonstrate the flow from the on-line lecture materials, to the laboratory manual, to the software design, and finally to real time implementations on the FPGA Virtex5 board demonstrating a QAM based comms link. The materials have been effective used in many universities and are made available free to University Professors and students.

URL: http://www.xilinx.com/xup

Thursday, March 18, 2010, 10:00 - 12:00

Booth 104

University of Illinois

FPGA Prototyping of Stochastic Sensor Network-on-Chip

Eric Kim, Sriram Naranayan, Naresh Shanbhag, Douglas Jones

Decentralized communication-inspired systems such as stochastic sensor network-on-chip (SSNOC) systems are emerging as promising low power and robust frameworks for nanoscale processes. SSNOC views computation as a distributed estimation problem and robust estimation theory is employed to achieve robustness and energy-efficiency in nonidealities such as process and voltage variations as well as noise and soft errors. In an application for PN code acquisition, SSNOC based architectures show 2 orders of magnitude improvement in process variation errors, and 36% power savings in VOS errors compared to conventional architectures. These results are shown on an FPGA board with different error pdfs chosen to simulate both process variation and VOS errors.Set Up: An FPGA demo of a PN code acquisition system using conventional correlation methods and SSNOC based methods will be shown with a laptop for visual presentation of the data.

Booth 204

Texas Instruments

TOF(Time Of Flight) based 3D depth measuring method and its implementation on TI OMAP processor.

Dong-Ik Ko, Nara Won, and Dave Hutchison

Vision based human machine interface technology has been widely exploited over the past decade. However, 2D based vision approaches have intrinsic difficulties in analyzing objects in 3D world even with various sophisticated algorithms. The suggested system provides a breakthrough in modeling 3D world with a dedicated 3D depth range sensor. 3D depth range technology tackles a natural feature of light e.g. the speed of light, along with light pulse modulation called TOF (Time Of Flight). A light pulse is transmitted by a sender unit and the target distance is measured by determining the turn-around time the pulse needs to travel from the sender to the target and back to the receiver. Our system captures a 3D depth cloud image and allows a delicate user control on three dimensional rotation of a cube object.

Booth 304

University of Texas Dallas

Hybrid Programming for FPGA Implementation of Signal Processing Algorithms

Nasser Kehtarnavaz, Issa Panahi, and Sidharth Mahotra

Implementation of signal processing algorithms on FPGAs has been steadily growing in consumer electronics products. The primary reason for choosing FPGA processors over DSP processors in such products is MIPS requirements. This show and tell provides demos of laboratory experiments that have been developed at the University of Texas at Dallas exhibiting real-time implementation of signal processing algorithms on FPGA platforms. It is shown how FPGA implementation of signal processing algorithms can be achieved in a time efficient manner by combining existing VHDL textual codes with the graphical programming capability of LabVIEW in a hybrid mode.

Booth 404

Université de Bretagne-Sud and UEB, CNRS LabSTICC

GAUT – A Free and Open Source High-Level Synthesis Tool for designing DSP applications on FPGA

Philippe Coussy & Cyrille Chavet

GAUT is an open source HLS tool dedicated to DSP applications. Starting from a pure bit-accurate C/C++ function GAUT generates a potentially pipelined architecture composed of a processing unit, a memory unit and a communication unit. The synthesis constraints are the application throughput, the clock frequency, the target technology and optionally the memory architecture/mapping and the I/O timing diagram. GAUT generates an IEEE-P1076 VHDL file which is an input for commercial tools like ISE/Foundation from Xilinx and Quartus from Altera. The synthesized architectures have been validated with a platform based on C6x from TI, Microblaze and Virtex from Xilinx.

Thursday, March 18, 2010, 13:30 - 15:30

Booth 104

Institute of Communication Acoustics (IKA) and Ruhr-Universität Bochum

Real-time speech enhancement using temporal cepstrum smoothing

Timo Gerkmann & Colin Breithaupt, Dirk Mauler, Martin Krawczyk, and Rainer Martin

In this demonstration we present a low-latency, real-time framework for single channel speech enhancement. The demonstrator allows selecting and switching between many state-of-the-art algorithms. It includes the first real-time implementation of the newly developed temporal cepstrum smoothing concept for output quality enhancement. This concept has been introduced at ICASSP 2008 by the authors and has attracted considerable interest since. The demonstrator can be used to compare different spectral windows, a priori SNR estimators and speech presence probability estimators with and without temporal cepstrum smoothing.

Booth 204

Rice University, Center for Multimedia Communication

MIMO Relays using Cooperative Partial Detection on the Rice University Wireless Platform (WARP).

Michael Wu, Yang Sun, Kiarash Amiri, Melissa Duarte, and Joseph R. Cavallaro,

Cooperative communication with multi-antenna relays can significantly increase the reliability and speed. However, cooperative MIMO detection would impose considerable complexity overhead onto the relay if a full detect-and-forward (FDF) strategy is employed. In order to address this challenge, we propose a novel cooperative partial detection (CPD) strategy to partition the detection task between the relay and the destination. CPD utilizes the inherent structure of the tree-based sphere detectors, and modifies the tree traversal so that instead of visiting all the levels of the tree, only a subset of the levels, thus a subset of the transmitted streams, are visited.

Link to WARP research platform at Rice: http://warp.rice.edu

Booth 304

The HEARing Cooperative Research Centre, Cochlear Limited

SNR Based Noise Cancellation in Nucleus® Cochlear Implants

Adam Hersbach & Adam Hersbach, Stefan Mauger, Pam Dawson

A clinical study investigated the performance of a noise canceller algorithm in 13 recipients using the Nucleus cochlear implant with the ACE strategy. The algorithm estimates the signal-to-noise ratio (SNR) of each frequency channel and discards those channels with negative SNRs. A real-time computer based system was used to compare two versions of the noise canceller, both of which demonstrated a statistically significant improvement in speech intelligibility over the ACE strategy, in all three types of noise tested (speech weighted noise, party noise and city noise).

The demonstration will incorporate real time processing of a microphone signal under each of the noise canceller conditions used in the clinical study, including the ACE condition. The processing will produce an acoustic output which emulates cochlear implant encoded speech allowing normal hearing participants to ‘listen to a cochlear implant’ with and without the noise cancellation algorithm. Under headphones, participants will also be able to perform a blinded evaluation of the sound quality of each processing scheme with results provided on screen.

Booth 404

WeVoice Inc.

A Real-Time MEMS Microphone Array System for Voice Communication Inside Spacesuits

Jingdong Chen & Yiteng (Arden) Huang

Collaboration and cooperation between the crewmembers in space and the mission control center on the earth are the lifeline of astronauts and space shuttles. Bright sunlight washes out LCD screens and makes it difficult for astronauts to read instructions, data, or instant messages from a portable electronic device. Clear and reliable voice communication is essential to astronaut safety and the success of every NASA flight mission. But the special design of an astronaut’s spacesuit forms an extreme acoustic environment that imposes unique challenges for capturing and transmitting speech to and from a crewmember. The in-suit acoustic environment is characterized by a highly reflective helmet surface (causing high levels of reverberation) and a spacesuit-unique noise field (noise is generally nonstationary, inherently wideband, and possibly either directional or dispersive).

The current solution is a communication cap-based audio (CCA) system which consists of a pair of differential microphones (a redundant design for reliability). The differential microphones need to be close to the astronaut’s mouth, leading to a number of recognized logistical issues and inconveniences particularly during extravehicular activity (EVA) operations which last from 4 to 8 hours. Unfortunately these problems cannot be resolved with incremental improvements to the basic design of the CCA systems. We have been sponsored by the NASA to develop an integrated audio (IA) system using MEMS (Micro-Electro-Mechanical Systems) microphone arrays and FPGA (field-programmable gate array) processors. Combining the state-of-the-art adaptive beamforming and multichannel noise-reduction algorithms, the developed IA system can achieve similar performance to a CCA while offering astronauts inherent comfort and ease of use. This demonstration is to explain the IA system and show the performance of different beamforming and noise-reduction techniques in the very challenging acoustic environments.


©2015 Conference Management Services, Inc. »« email: webmaster@icassp2010.com »« Last updated Monday, March 08, 2010