We here at Ceva, have spoken at length about spatial audio before, including this blog post talking about what it is, and this blog post about why head tracking is essential for any personal spatial audio solution.
As a quick refresher, ‘Spatial Audio’ is a broad term that we use to describe an array of audio playback technologies where the primary focus is to enable us to listen and experience sound as we do in the real world – in three dimensions. Compared to the standard stereo configuration (headphones, earbuds, two speaker configurations, etc.) where we hear sound from two main sources – left and right, spatial audio configurations are meant to feel as though they can immerse us from all directions.
What makes a given audio product system ‘Spatial’?
One issue to point out here is that there remains a lack of clarity over which playback systems constitute ‘spatial audio’. Within the industry, the consensus among experts seems to be that any playback system that enables us to hear sound in three dimensions – left/right, front/back, and up/down (angle, distance, and elevation), constitutes ‘spatial audio’.
For example, a 7.1 surround sound speaker setup includes seven speakers placed around you in the horizontal plane, plus one subwoofer. While this configuration does enable us to experience sound from different angles and distances, it does not give us any sense of elevation. A 7.1.4 setup builds on this by also including four speakers placed at a height, above you.
Below is a graphic of a 7.1.4 speaker configuration, outlining how if you were sitting on the sofa, you would be able to hear sound both around you and above you.
The various speakers we have are:
1- Front Left | 2- Front Right | 3- Center | 4- Subwoofer | 5- Side Surround Left | 6- Side SurroundRight | 7- Rear Surround Left | 8- Rear SurroundRight | 9- Front Top Left | 10- Front Top Right | 11- Rear Top Left | 12- Rear Top Right
7.1.4 is the standard that most mixing engineers are mixing film, TV and immersive music to. With spatial audio over consumer True Wireless Stereo (TWS) earbuds and headphones, such as our Ceva RealSpace® solution, we are trying to recreate this level of immersion – the feeling of sitting on a couch and listening to sound from speakers placed around and above you.
Dolby Atmos and IAMF (a new open-source standard led by Samsung and Google) take all these channels and add objects to the mix, where you can place a given sound in the form of an audio object somewhere specific in 3D space around the listener.
Between 5.1.2, 5.1.4, 7.1.2, 7.1.4 speaker systems, Dolby Atmos, and Head Tracked Binaural over headphones/earphones, the flexibility today over how one can experience spatial audio offers multiple possibilities and opportunities for the listener, at a multitude of price points.
Evaluating Spatial Audio is challenging
This diversity in spatial audio configurations does, however, mean that there is a wide variety in the types of spatial audio content people consume and how they consume it. Spatial audio content, be it music, movies, shows, video games, or Virtual Reality, can exist in a smorgasbord of file formats and channel configurations, requiring differing codecs, encoders and decoders, software, and hardware.
This variance and the lack of a unified standard make evaluating a spatial audio solution trickier than it ought to be. Much depends on the content itself, and how it is meant to be consumed. Comparing the experience of watching a movie in 7.1.4 Surround Sound speaker array and listening to Head-Tracked Binaural music over AirPods would be akin to comparing apples to oranges.
There is however an even more glaring issue when trying to measure and evaluate a spatial audio solution:
Spatial Audio is a subjective psychoacoustic phenomenon, and differs greatly from person to person, based on their physiology. Everyone’s ears are unique to them, and not only do they help us hear, but they also play an integral role in shaping how we perceive sound. The shapes of not just the outer ear, but also our heads and torsos – parameters unique to each person – play a massive role in how the sound that enters our ears is colored. Thus, our hearing systems and unique individual listening experiences are finely tuned by our own specific anatomy.
Small variations in any of these parameters, or their relationships, can have a noticeable and dramatic effect on the accuracy and realism of a listening experience.
Accordingly, there is a lack of an established benchmark, metric, or methodology within the industry for comparing spatial audio solutions.
The one thing that most people are looking for when initially engaging with something marketed as a spatial audio experience is some type of ‘wow factor’. It is worth keeping in mind, however, that there are multiple ways in which the brain can be fooled into thinking that something “sounds better”. For instance, loudness. Experienced audio engineers themselves often try to level match when trying to determine if their signal processing has had a net positive effect on the audio signal. Even these trained experts with years of experience are aware of how easy it is to think that something sounds better simply because it is louder.
Similarly, it would not be a stretch to state that most people find bass-boosted music to sound more enjoyable. Most of us like feeling that rumble in our chest which comes with strong, well mixed bass. If you are trying to evaluate a spatial audio experience however, it is important to make sure that one does not think the experience sounds better simply because the sound was made louder, or the bass was boosted.
With these factors in mind, we set out to try and determine the key parameters that encapsulate the experience of listening to spatial audio. We wanted to try and establish where the ‘wow factor’ comes from.
(Source: Ceva)
Subjective vs Objective Evaluation
Our findings led us to another complicated conclusion. The very nature of spatial audio is inherently subjective to each person’s unique perception and physiology. Thus, the parameters that sum up the experience of listening to spatial audio can also be subjective, objective, or both.
How far outside one’s head a given sound feels when wearing headphones or TWS depends completely on the listener. It depends on how the listener perceives it. On the other hand, the maximum number of source channels that can be rendered, or how much head tracking latency there is, are both objective, measurable parameters. They depend on the system in question, and its hardware.
There must be a way to evaluate and compare products systematically, laying bare the strengths and weaknesses of each. If you are a new user interested in spatial audio, and you were to use Google to search ‘How do I know that a spatial audio experience is good?’, it is going to be hard for you to find a definitive answer; especially one that is relevant to the exact target configuration and platform that you are interested in.
We are trying to provide a guide for how anyone can evaluate a spatial audio solution, and its experience. Our intention is to develop an industry-first framework to evaluate spatial audio in a structured and repeatable way.
Focusing on the category of Android phones and their True Wireless Stereo (TWS) Bluetooth earbuds, we conducted a competitive analysis, where the products and their spatial audio systems were analyzed and evaluated, and then compared to one another.
In this case, the product system is trying to create a virtual room over Head Tracked Stereo Bluetooth earbuds. The intent is to immerse you in the sound, such that it feels like it is emanating from virtual speakers all around you.
Breaking the spatial audio experience in this specific form factor down to eight parameters across two broad categories – Spatialization and Head Tracking – each product was evaluated in a systematic and repeatable way. However, we believe that these eight parameters can be applied to most spatial audio use cases. Let us look at all eight below.
Spatialization Parameters
Degree of Externalization
How far outside the head is the sound perceived? A few inches around your head, or a few feet? This is a subjective parameter.
Room Character & Presets
When spatialized, what does the virtual room sound and feel like? Does it feel like a small and not very lively room, such as a study? Or does it feel like a large and reflective room, such as a cathedral? Are there multiple room presets available? This is a subjective parameter, with the presets being an objective component.
Maximum Number of Channels Rendered
How many distinct virtual speakers as source locations can be rendered? For a 7.1.4 audio file for example, can all twelve channels of audio be rendered accurately at the appropriate virtual location? This is an objective parameter.
Mono and Stereo Rendering
If the content is simple mono (1 channel) or stereo (2 channel), such as stereo music streamed from Spotify, can the system spatialize it? If so, the content should go from feeling like it is within your left and right ears, to feeling like it’s coming from two speakers in front of you. Some systems only spatialize multichannel content, leaving mono and stereo unaltered. This is an objective parameter.
Artifacts
Are there any specific abnormalities or unexpected behaviors noticed in the virtual space around you? An example of this would be if the sound appears to be distorted when coming from a specific angle or location. This is a combination parameter, which can be both subjective and objective; perceived and measured.
Head Tracking Parameters
Latency
How long does it take for a change in head movement to be reflected by a change in sound timbre or location? Raw head tracking is an objective parameter, but with head tracking prediction on some systems, it can be a subjective parameter too.
Degrees of Freedom
Does the system track the movement of the head over multiple axes of rotation (yaw, pitch, roll)? This is an objective parameter.
Artifacts
Are there any specific abnormalities or unexpected behaviors noticed in the tracking of head movements? For example, while moving your head to look straight up the sound also starts to drift to one side, or it suddenly jumps from near one ear to near the other. This is also a combination parameter, which can be subjective (perceived), and objective (measured).
Spatial audio is a complex, psychoacoustic effect, making it difficult to evaluate and compare solutions. As you can see, multiple factors go into the performance of a spatial audio system, some objective, some subjective, and some a combination of both. Ceva firmly believes, however, that a structured approach can bring clarity and consistency to the problem. Now that we have broken down the key things to pay attention to in any given spatial audio solution, hopefully, you have a better understanding of how one can evaluate spatial audio, and the challenges involved.
Next time, in part two, we will take a deep dive into the specific ways in which we evaluated and measured these eight parameters in a structured, repeatable way.
If you want to learn more or evaluate RealSpace spatial audio for your products contact us.