 |
|
| |
|
|
| |
Voice Interface Technology for Hands-free Function in Automobiles
( 01 Dec 2005 )
BY SAM YU, PRODUCT MARKETING, FORTMEDIA
|
Talking and driving…a dangerous combination for sure, yet very popular for a generation who craves to stay in touch with family and friends, no matter where or what they are doing. “Of all the accessories for cellphones, the ones people most want are headsets and other hands-free devices,” said research analyst Linda Barrabee, who surveyed mobile phone buying habits for Yankee Group this year. With the advent of a standard wireless interface in cellular phones (Bluetooth), the hands-free function in automotive telematics is set for explosive growth over the next several years.
Over the past decade, technological advances have contributed to improvements in voice quality and enhanced recognition rates for the hands-free cellular phone function. Since the early days of using just a plain microphone as the input device in the car cabin, industrial engineers have designed special housing for microphones, DSP engineers have developed voice processing algorithms, and several companies have ventured into the array microphone solution.
Despite these considerable developments, two perceptible needs still remain for users: “landline” quality conversations and higher voice recognition rates, especially at highway speeds. Limitations in current solutions allow noise from all around the cabin to be picked-up by the microphone. In such environments, a new technology is required to give consumers what they want.


Hands-free Market Background
Road safety continues to be a major concern for everyone, including government agencies, car manufacturers, and drivers themselves. Anything that draws attention away from driving will make the roads hazardous, whether it is reading the newspaper, eating a sandwich, or talking on the phone. As such, hands-free cellphone functions, which allow drivers to keep both hands on the wheel and eyes on the road, are becoming an essential feature in today’s automobiles.
Worldwide, about 25 countries have passed laws restricting drivers from using handheld cellphones, including Australia, Italy, Israel and Japan to name a few. Three U.S. states have enacted similar laws (e.g., New York), and at least 40 other states are now proposing such legislation, according to the National Conference of State Legislatures.
While consumers will need to comply with these laws, they also want to enjoy their conversations. Up to today, there is still a noticeable difference in voice quality between using a handheld and using a hands-free device. On the other hand, service providers such as On-Star continually look to improve their customer satisfaction, especially for the automated voice activated systems.
Technology Trend
Historically, hands-free telematics functions have used a single microphone as the input interface. This can be a uni-directional or an omni-directional microphone, strategically placed in the cabin (visor, steering wheel, rear-view mirror, etc.). While this setup served its purpose in picking up the talker’s voice, unfortunately, it also picked up all the surrounding noise and echo. This quickly became unbearable for the far-end user.
Industrial designers quickly realized that by using special acoustically-designed microphone housings, they could block out a certain level of noise, while focusing the microphone pickup at a certain location. While this helped to increase the signal to noise ratio (SNR), echo still remained a pestering concern.
A big step in making hands-free telematics a widely acceptable feature was the deployment of acoustic echo cancellation (AEC) and noise suppression (NS) software. Running on either general DSP platforms or on dedicated IC chips, these algorithms have the capability to reduce acoustic echo by 45 dB and suppress both stationary noise by 10 dB. AEC and NS signal processing has significantly improved the voice quality of hands-free conversations in the car cabin.
However, for many of these software DSP solutions, the user experience is still not where everyday users could comfortably use the hands-free function. Several hands-free car kit models on the market today, including the Jabra SP500 and the Parrot Easydrive, have been received with mixed reviews, with complaints about distorted sounds and robotic sounding voices.
Array Microphone and Small Array Microphone

The big leap in voice interface technology is the array microphone. By arranging multiple microphones in an array, companies such as Fortemedia, AKG, Knowles, and even Microsoft can further reduce surrounding noise, providing a more natural sounding voice.
Leveraging the information gathered by the multiple microphones about the voice and surrounding environment, an array microphone can process the signals in such a way that effectively forms a beam to pick up the wanted signal within the beam, and cancel out noise outside the beam. Several hands-free car kits using the array microphone solution have already been deployed (e.g., 2006 Jaguar XK models, Mercedes Benz E-Class, LG hands-free car kit).
While there are improvements in noise suppression, however, the traditional array microphone is still impractical and limited in two ways:
• Requires at least 30 mm between each microphone, putting placement and space constraints on the end solution.
• Can only cancel noise on a 2-D plane. This makes it harder to pin-point the talker, while allowing noise to leak into the beam; diffused noise, engine noise, rattling of the dash board, and general road noise coming from above and below the pie-shaped beam will cause major problems for voice recognition related applications.
A new array microphone technology, small array microphone (SAM), is the next step in the voice interface market. Requiring only 5mm between microphones, SAM can be deployed in practically any situation or application. SAM uses a fundamentally different algorithm than the traditional array microphone to process the voice, effectively forming a 3-D cone shaped beam. As such, any noise outside the beam, whether above or below, will be cancelled out, without any leakage. The discussion that follows will provide more background in the differences between these two array microphone setups.
Traditional Beam-forming
Traditional beam-forming utilizes the difference in time delay between signals received at different microphones in the array. As such, the microphones are placed further apart so the information received at each microphone is sufficiently different. The width of a broadside array beam is based on the wavelength of the signal divided by the length of the aperture. So, at low frequencies (longer wavelength), the beam will need to be wider than that of higher frequencies (shorter
wavelength).
Due to the need to process the difference in time delay, and the need to capture frequencies between 300 Hz to 3.3 KHz, the traditional array microphone needs to be at least 30 mm apart. This brings about many limitations.
To understand why, please look at Figure 1. In this example, the 2 microphones are facing 0 degrees, meaning that the beam center is the yaxis. Now, let’s assume the signal source at point A is playing at the same dB level as the signal source at point B. Let’s also assume that point A and point B are the same distance away from the center of the array. In this case, the signal from source A will be suppressed because the array microphone can obviously detect that source A is outside the beam (time delay to Mic 1 is much longer than time delay to Mic 2).
However, the signal from source B will not be suppressed, because to the traditional array microphone, source B is effectively in the middle on the beam, since the difference in time delay is exactly the same to Mic 1 as to Mic 2. This limitation applies to every plane throughout the z-axis, as well as directly behind the array (180 degrees). Thus, the traditional array microphone can only effectively suppress noise in a 2-D manner (in our example, only noise on the xy-plane is canceled). Please see Figure 3 for the effective beam.
Small Array Microphone (SAM) Beam-forming
SAM beam-forming technology is unlike traditional setups. SAM beam-forming technology uses 1 uni-directional microphone and 1 omni-directional microphone. Since these 2 microphones can be placed very close to each other (5 mm center to center), the information coming to both microphones is highly correlated (virtually the same). Consequently, the beam-forming capability relies on the intelligence of Fortemedia’s AMBIN algorithm to decipher this information.
Because microphones of SAM can be placed virtually right next to each other, the effective beam is a 3-D cone shaped beam. This has many advantages compared to the traditional array microphone. To understand the advantages, please refer to Figure 2. In this example, the setup is exactly the same as Figure 1, except the receiving device is a small array microphone instead of the traditional array microphone. To SAM, the signals from source A and source B are exactly the same (in this case, both outside the beam). This applies throughout the y axis, forming a 3-D cone-shaped beam. Noise above, below, and behind the beam is effectively suppressed. Please see Figure 3 for the effective beam.
Value to End Users
We’ve come a long way since the early days of using just a plain microphone as the input device to the small array microphone today. Undoubtedly, we will continue to use the phone in the car. And based on legislative developments around the world, we will need to do it using hands-free kits. So what does this all mean to us, the users?
First let’s revisit the example of the On-Star service. With small array microphone technology, On-Star users will experience higher voice-recognition rates when using the automated systems; engine noise, road noise, and rattling of the dashboard will be suppressed.
This also means that the user can successfully barge in and interrupt the automated system when necessary. During a person to person conversation, the far-end user will never notice the noises and rattles from the car cabin. With the small array microphone, talking in the car will be just as clear as talking in your own living room.
|
|
|