Evaluating Spatial Audio Creating a Repeatable Test System

This is Part 3 of our deep dive into ‘Evaluating Spatial Audio.

In Evaluating Spatial Audio – Part 1 – Criteria and Challenges we talked about what constitutes a ‘spatial audio’ product system, the key challenges involved, objective and subjective means of evaluation, and some of the key spatialization parameters to pay attention to. Part 2 of our series focused on the process of creating and curating content for evaluating spatial audio, specifically focusing on content for testing purposes.

It seems as though there is a lot of confusion surrounding spatial audio, its scope, and how to effectively quantify what a ‘good’ spatial audio experience should be. Accordingly, there seems to be a lack of an industry standard framework to evaluate the spatial audio experience. Here at Ceva, we set out to develop such a framework, to evaluate spatial audio in a systematic and repeatable way, providing a guide for anyone to be able to gauge the efficacy of a spatial audio solution.

We started by limiting ourselves to the form factor of Android phones and True Wireless Stereo (TWS) earbuds, amassing a few different options to compare. Having established the parameters we wanted to test, we then conducted a competitive analysis between the various options. In this blog post, we will go over the methodology utilized, content used, and how to listen for differences in the various parameters under consideration.

The Competitive Analysis

Armed with the assortment of content we consolidated and discussed in Part 2, we started the process of creating a Competitive Analysis of the various platforms by listing out a series of questions concerning the spatial audio experience. In the process of trying to cover every possible aspect that would encapsulate the spatial experience, we ended up compiling a series of parameters that give us the coverage we needed to evaluate these different product systems systematically and repeatedly.

With Spatial Audio being inherently subjective to each person’s physiology and perception, the various parameters we tested could be either subjective (perceptual), objective (measured), or both.

We took all these questions that gave us the various parameters and turned them into a template that could be repeated for each product. From there, we spent a healthy chunk of time with each product, listening carefully and critically to answer all these questions, making notes in our template all the while.

Please keep in mind that ear fatigue is a real thing to contend with. Sustained listening to noise and loud sounds can cause one to lose perspective, and often the most valuable perspectives are gleaned in the first few minutes. We recommend taking frequent breaks while performing this type of exercise.

Based on all the characteristics and parameters we listened for, the two broad categories of parameters we ended up with were Spatialization, and Head Tracking. These two categories encapsulated the spatial audio experience well, and all the subjective, objective, and combo parameters fell into either one of these classifications. Let us take a deeper look at them.

A man holding a smartphone and wearing earbuds.

(Source: Ceva)

Spatialization

All parameters to do with the perceived spatial effect of the sound fell in this category.

Degree of Externalization

The main question here, of course, is how far outside the head does the audio now sound. One useful way to listen for this is to play one of the pink noise clips, with the noise isolated in say the Front Left channel if it’s a 5.1/7.1/7.1.4 clip, or the Left channel of a stereo clip.

When toggling between Spatial Audio on/off, you should hear the sound collapse to inside your left ear when you turn the effect off. Close your eyes and listen to where the sound moves when you turn the effect back on. This effectively tells you the location of the virtual Left/Front Left speaker of the virtual room the sound is being spatialized to. Of course, this is pretty subjective, but one can often get some sense of the sound externalizing.

In different product systems, we felt and heard the sound externalize to anywhere from an inch just outside the head, to six inches away, to at the most about a foot or so outside the head.

It is also worth checking whether this degree of externalization is consistent in all directions. In most products, the exact amount of externalization varies with direction. The front and rear left may feel further out than the side left of a 7.1 setup, which may feel further out than the Center channel immediately in front of you. It’s worth a reminder here that even in nature, moving your head is a crucial part of accurately localizing sound around you. Accordingly, head tracking makes a massive difference in spatial audio solutions, as even small head movements, when tracked, can help us to resolve issues such as Front-Back Confusion and accurately localize the sound.

Room Character/Presets

Referring back to earlier where we considered the example of clapping in a variety of rooms, this point concerns what the character of the virtual room and its reverb sounds like. The room character itself is a subjective parameter. To best be able to hear this, a transient sound such as a Kick Drum sample, Snare Drum sample, Clap sample would be ideal. A short Hi-Hat rhythm loop can sometimes also be very useful, as cymbals tend to have a very noise-like timbre, with a lot of high frequency information that can reveal a lot about the timbre and tone, whilst also being mostly fast and transient.

Use a player where you can set playback to ‘loop one/repeat one’ mode, where it plays the selected file and then loops/repeats it on concluding playback. Listen carefully to how the transient decays. This should give you a good sense of what kind of room it FEELS like. If it persists for a while, you will feel as though the sound is playing in a more reflective and lively room. If you can hear the early reflections slap back of the transient very sharply and immediately after the sound, it is likely to feel like a smaller room, whereas in larger rooms you will feel the early reflections breathe a bit more.

In this case, there isn’t really an “ideal” solution, or room character. Your experience will vary with your personal preference. It is also worth checking and seeing if the product has different presets for room types. It isn’t uncommon to have a few different options to choose from, such as Studio, Hall, Cathedral, etc. This is an objective parameter of course.

Maximum Number of Channels Rendered

This is fairly straight forward. If you play a 7.1.4 audio clip which has 12 channels of audio, can all 12 channels be rendered correctly in their proper location? If so, you should end up with 12 virtual speaker locations – 7 around you in the horizontal plane, one channel for Low Frequency Effects, and four elevated above you. If not 7.1.4, can 7.1 be rendered correctly? And if not 7.1, can 5.1 be rendered correctly?

Another group of sound files we had created during the content curation stage was a set of channel identification files. We took mono recordings of someone speaking, and naming the various channels – ‘This is Front Left’, ‘This is Side Surround Right’, ‘This is Front Top Right’ etc. We cut them all together such that the mono clips were routed to their corresponding channel and were played one after the other in sequence. At the end, this was exported as a single 7.1.4 WAV file. Thus, we were left with a 12-channel 30-45 second long channel ID file, naming the various channels one after the other. When rendered correctly on an appropriate playback system, one should hear each channel named from the appropriate virtual speaker location.

This process was repeated to create 7.1 and 5.1 channel ID files as well. This is an objective parameter once again. Can the playback system properly render multichannel content? If so, up to how many channels? Some could do 12, some could do 8, some could do 6, and some could not render multichannel content at all, and were limited to stereo.

Mono and Stereo Rendering

Not all playback systems we tested spatialize mono (one channel) or stereo (two channel) content. Music from some popular platforms for example, is only in stereo as of this writing. When spatialization is engaged, does the music go from sounding like it originates within your left and right ears, to sounding like it originates from two virtual speakers outside one’s head to the left and right?

This is once again an objective parameter, and there were some playback systems that didn’t spatialize mono/stereo content, and only spatialized multichannel content (5.1 and above).

Artifacts

This is a fairly straight forward combo parameter, being both subjective and objective, and constitutes any problems, issues, artifacts, inconsistencies etc.

Some observed artifacts could be things like the audio being distorted at a certain location in the sound field, or a virtual speaker being in an incorrect location, for example the rear left channel sounding like its right behind you. The best way to test for something like this is to just listen to a large number of files, especially those that can isolate sound in various channels, such that you can easily focus on each sound individually.

Head Tracking

All parameters to do with Head Tracking, and how that affects the spatial audio experience fall in this category. We have previously written a piece about how head tracking can elevate your spatial audio experience, outlining why head tracking is vital to selling the immersion of spatial audio.

Latency

Head Tracking latency is the time delay between when there is a change in head movement and when that change is reflected in the sound timbre or localization.

This is a factor that has a very noticeable effect on the experience. If you abruptly turn your head 90 degrees to your left with a sound playing, but there is a noticeable lag before you hear that reflected in the audio, it can really take you out of the spatial audio experience. Abrupt movements such as these are often good to get a sense of the latency.

Remember that many systems also use some amount of head tracking prediction, which can help to compensate for some latency and create a smoother experience.

There is a lot of research that is being conducted on what the acceptable amount of latency is for the average listener. Opinions differ, but we find that approx. 70 ms of effective latency (after prediction) to be a good target. Perception varies from person to person of course, but we find that it is likelier that someone will be able to perceive the latency, and be distracted by it, at values higher than this.

The raw head tracking latency for a product system is of course an objective value that is pretty tricky to measure. We developed a gimbal and dummy head based system to try and objectively measure this value. However, with the aforementioned head tracking prediction, this becomes a less precisely measurable parameter.

Degrees of Freedom

How many degrees of freedom does the system track head movements over? We normally look for three degrees of freedom – yaw, pitch, and roll.

Yaw Pitch roll

Yaw, pitch, and roll motions of the head

This is an objective parameter. Some systems only tracked head movements for yaw, while others tracked yaw, pitch, and roll. There is also the question of recentering. Even if yaw, pitch, and roll are tracked, does the system recenter for changes in yaw alone, or all three?

One good way to test for this is to play pink noise isolated in a single channel and try to turn and rotate your head in various ways. Here are some examples:
i. Keep the sound isolated in the Center channel, such that the sound is in front of you. Turn your head to the left and right, and back. Does it stay locked in that position without moving? Also try looking up and down, and rotating your head to the left and right (while facing forward, rotate your head such that your left ear almost touches your left shoulder, and similarly on the right).

Keep the sound isolated in the Side Surround Right channel (of a 7.1 or 7.1.4 configuration), such that the sound is immediately to your right. Turn your head 90 degrees to the right. Does it feel like you are now facing the sound source? If you turn your head 90 degrees to the left instead, does it feel like the sound is now behind you? If you pitch your head up and down, does the sound remain locked to your right? What happens if you roll your head to the left and right?

Artifacts

Once again, this is a pretty straight forward category that is both objective and subjective.

Is the head tracking smooth? Are there any motions where the head tracking results in jumps in the audio, or any other discrepancies such as distortion? If you play a sound that is centered in front of you and do a 360-degree turn, is the sound still dead center? With time, does the sound field drift in any direction? These are all examples of some aberrations in head tracking to listen for and pay attention to.

Conclusion

Spatial audio is an inherently subjective experience. That said, we have barely scratched the surface when it comes to what can be achieved using spatial audio. Especially with head tracking as a part of the system, spatial audio product systems have the ability to let us immerse ourselves within the content we consume at a very high level. It has the potential to help you feel the difference between having a high-octane action scene from a movie played at you, to feeling like you are in the scene yourself; or to go from merely watching your favorite band perform, to feeling like you are on stage with them.

It is an exciting time to be pushing the boundaries with spatial audio! But as with any new technology, it is vital to help people understand not only the fundamentals of the technology itself, but also what to expect, and what to be looking (listening) for.

The concepts introduced here should serve as the foundation for a structured evaluation process, that anyone should be able to follow and implement. As the technology continues to grow and evolve, we plan to continue adding, improving, and optimizing this framework. We encourage the industry as a whole to join us in developing new methodologies, tests, and quantitative and qualitative metrics that allow us to better characterize the spatial audio experience.

If you are interested in learning more about spatial audio and its evaluation, or you are interested in evaluating Ceva-RealSpace spatial audio for your products, please contact us today.

Kaushik Sethunath

Evaluating Spatial Audio -Part 3- Creating a repeatable system to evaluate spatial audio

The Competitive Analysis

Spatialization

Degree of Externalization

Room Character/Presets

Maximum Number of Channels Rendered

Mono and Stereo Rendering

Artifacts

Head Tracking

Latency

Degrees of Freedom

Artifacts

Conclusion

Get in touch