Context-aware devices, such as the embedded smartphone keyboards, have been around for some time, though in a very limited form. By learning regularly used names, locations and phrases, they seem to know your next move, that is, predictive text. Similarly, but at a higher level, the next generation of smart audio devices will combine user-specific data, such as location, preferences and other device sensor data, including audio, to better serve the user.
Many of today’s voice-activated devices listen to commands, interpret them and execute them accordingly. However, they are not contextually aware. A smart speaker, for example, may know a list of famous artists, but if you were regularly asking it to find a less popular band to listen to, it could not understand and learn your preferences. Without contextual awareness, it is difficult for the smart speaker to deliver the ideal user experience.
The next-generation of always-listening devices use machine learning to get to know the user. Contextual awareness makes sense of natural sounds, the hustle and bustle of the big city, a user’s voice and more. The device uses signal processing technology and machine learning techniques to build up a library of ‘acoustic scenes’ and ‘acoustic events’. An acoustic scene may be a busy restaurant, commuting to or from work, or being at home watching your favorite boxset. On the other hand, acoustic events are the specific sounds that are heard in any scene – such as a cash register ringing, the honk of a horn or a crying child.
Adding context-aware acoustic event recognition to home service robots, for example, has proved useful for monitoring older people and enabling them to continue to live independently. By classifying certain acoustic events as ‘alerts’, the robot can automatically call the relevant emergency services or a family member to attend. A typical event could be a smoke alarm sound, but it can be more subtle, such as the lack of sound in a kitchen scene.
More Helpful Voice Assistants
Another application that can benefit from contextual awareness is a voice assistant. Amazon’s Alexa, for example, implements this in its Guard feature to improve home security. When leaving the house, the user tells Alexa, ‘I’m leaving.’ The Guard feature uses this context to activate the alarm-listening feature. Built-in audio analytics automatically identify the critical sound events, such as a smoke or carbon monoxide alarm and breaking glass. If it picks up the sound of breaking glass while the user is away, then it knows to send out a Smart Alert.
There is a host of other applications that can use audio analytics to improve safety and security overall. It can complement video surveillance systems, for example, to enhance the protection of people in smart cities or students in schools or colleges.
Dealing with Data
Making sense of all of the data from multiple sensors requires precise acoustic scene classification and event recognition. This process needs to happen in real-time while assuring errors or biases in each sensor are accounted for, without constant re-calibration. Sensor fusion, or sensor processing, combines a user’s data with the audio data to bring context to a voice command, enabling the device to provide a more accurate response. There can be privacy concerns with the use of personal data, but as processing is implemented within the device, the risk of security vulnerabilities is reduced.
A platform like CEVA’s SenslinQ integrates all the necessary hardware and firmware to automatically aggregate sensor data to create contextual awareness for smart devices. Using filtering techniques and signal processing and applying advanced algorithms, ‘context enablers’ are created. These include activity classification, voice and sound detection and presence and proximity detection. By centralizing the workload for sensor processing and fusing context enablers on-chip, devices will begin to understand and adapt to their surroundings.
www.ceva-ip.com
published on AudioXpress.