Spatial Audio: on a Budget
Yamaha Australia wanted to prove the average venue can get into spatial audio for the price of a conventional PA. So it built a system to prove it.
It wasn’t perfect conditions; outdoors on a sweltering 40-degree day, with the slatted roof of Melbourne’s M Pavilion the only respite. Nevertheless, it’s the proving ground for Yamaha’s cobbled together object-based spatial audio system and their claims that it’s possible to build one anywhere — even in punishing conditions — for the price of a conventional PA system, and with readily available parts.
“It’s the next thing,” was Simon Tait’s matter-of-fact summarisation of why he, Mick Hughes and the rest of Yamaha Australia’s pro audio division decided to try the DIY approach. “Stereo is a format that has served us well since the ’60s, but as a presentation canvas, it’s 60 f**king years old! It’s been done.”
OBJECT OF THE EXERCISE
Yamaha isn’t new to the spatial audio game. Its assisted reverb product, called AFC (Active Field Control), has been around since 1985. It’s similar to Meyer Sound’s Constellation in that it can artificially change the acoustics of a room with lots of hardware and processing grunt.
Lately, the focus for spatial audio has shifted to incorporating object-based positioning. Whether it’s flying sounds overhead or simply positioning them relative to where they are onstage, it’s become the next problem to solve.
With a few projected AFC installs on the cards, the conversation at Yamaha Australia naturally turned to how the team could implement real-time live rendering of object-based sources in those venues as well.
There are a few speaker-agnostic options, like Barco’s Iosono and Astro Spatial Audio, which still operate on a hardware rendering engine. Flux::’s SPAT Revolution requires the least overhead, as a software-specific solution. Still, the Yamaha crew wanted to go as low-cost as possible. Not as a diss, but to make the point that it’s an approachable topic for the average venue who might have previously thought it to be out of reach. “We know it can be done at the budget level a normal venue operates at. We built one, it was really simple and it cost way south of $50,000,” explained Tait. “You’re using lower wattage transducers, just more of them.
“We thought it would be an interesting project to do for the audio community, but there was a huge vacuum of information and working systems out there. We went looking around for open source equivalents and found Sound Scape Renderer, then built a prototype system in the office meeting room and it worked really well. We could place guitar amps and other sources wherever we wanted. It was a revelation, because we’d heard spatial systems before that hadn’t been very convincing. They just sounded like multi-channel surround.”
With a proof of concept, the team thought they should put on a show. It is, after all, about providing a new way of experiencing music. “The artists are the ones who put food on the table, so we thought let’s put on a show with the widest variety of artists we can. Then put them in a rehearsal room for two weeks and see what they come up with.”
To make sense of the spatial system, you have to get a handle on what the end result will achieve. For this particular show, there was no overhead fandanglery, or even surround elements. The setup of MPavilion just wouldn’t allow it. Instead, the focus was on building an immersive Wavefield Synthesis rig that could give the illusion that the sound of the PA was matching up perfectly with the position of the musician.
Why not just pan it, you ask? Well, stereo panning doesn’t work like that. The image might be maintained for one specific point in the audience, but you can never maintain that perspective across all punters. Wavefield Synthesis is different, it helps your ears locate an object to that spot no matter where you’re standing.
The rub is that if everything goes according to plan, you shouldn’t notice it’s working. It should just sound incredibly natural; like the PA isn’t even working at all. That was exactly the impression I got when sitting there waiting for New Palm Court Orchestra to start. Tait was quickly sound checking the chamber ensemble, and as each instrument went from acoustic to amplified, the only difference was in level. There was no sensation of the sound coming from a speaker, let alone being farmed out to the left and right. Big success!
3 KEY VARIABLES
There are three key variables to think about when designing a Wavefield Synthesis rig: The size of the area you’re trying to cover, the number of speakers you have at your disposal, and the distance the listener is going to be from the array.
The rule of thumb is is that the minimum distance between the punter and the closest speaker should be no less than the distance between two adjacent speakers. The closer the speakers are together, the higher the frequency you can render objects at.
In a situation where the audience member might be a metre from the closest loudspeaker, you need to therefore make the interval between speakers less than a metre. In an arena, they can be spaced further apart because the audience is likewise further away.
The speaker arrays can simply be arranged in a line across a stage, or completely surrounding the audience for a fully immersive experience. In this case, Yamaha set up a relaxed L-shape with the apex at the centre of stage. Tait used the Yamaha emblem to space each speaker 600mm apart. The reason for the tight spacing was that the array was doubling as foldback for the artists, and some were right up against it.
At that proximity, objects were being rendered accurately up to around 1.2kHz. “As you walk back, two things happen,” explained Tait, digging into the science of perception. “The ability for you to perceive frequencies higher than that increases, but the vector-based panning element becomes the localisation. At 4kHz, you’ve got all these speakers producing 4kHz, but your ears perceive the front of the speaker that hits you first with that sound because the other ones are lesser in amplitude.
“It turns out your ears are quite forgiving of that phenomenon. If you can properly render a wavefront within those critical speech frequencies, like 1kHz, then beyond that your ears perceive the wavefront at that frequency but also an amplitude-related directionality from that sound source. The associated delays are also in-phase, so your ear forgives it and locates the sound.
“Overall, with more loudspeakers, you’ll get a stronger image, but there comes a point where it kind of doesn’t matter.” What Tait it saying is that he could have used far less speakers if the double bass player wasn’t literally standing a metre from the array.
THE SYSTEM: LATENCY AS RELIGION
You can follow along in Figure 1, for a break down of the system in a conceptual manner, and compare it with the actual system in Figure 2.
All the analogue inputs were routed into the Yamaha CL1 via the Rio racks, while any digital signals from the artist’s DAW were tapped using Dante Virtual Soundcard.
Tait mixed on the CL1 in a similar way to a monitor console, creating 18 mixes that were then sent to a PC via Dante. They used a PC tower because the entire ‘religion’ with this system was latency. While Dante PCIe-R card drivers are available for Windows and Mac, PCIe slots aren’t a feature of Macs any longer. “DVS imposes a latency of 4ms, whereas the PCIe card allows you to get the latency down to 0.25ms,” explained Tait as to why the PCIe pathway was important. “The whole roundtrip latency had to be low because artists were performing live with it. It depends on the size of your stage — an arena stage might have the guitar amps eight metres from the drum kit, accumulating 20ms in time of flight — but here, anything above 8-10ms would have started to be noticeable. We got the entire roundtrip system latency down to 6ms in the end, and with audio stability.”
MORE OBJECTS, LESS SPACE
The PC was running Soundscape Renderer, open source software from GitHub. “It allowed us to automate the movements of objects over time. We synced up timecode with the performers’ kit over MIDI, then used a VST plug-in written for Soundscape Renderer inside Nuendo to read the timecode and run automation in time with it.”
While it’s tempting to turn every possible source into an object, Tait said it becomes pretty obvious that’s a bad idea. “The number of sound objects is limited to how cluttered the GUI gets,” he explained. “You could conceivably get 32 objects out of the CL1; 24 mixes and another eight matrixes. Early on, I loaded up every instance of a 32-channel live recording I had as objects. It becomes unmanageable. Especially if you want to move and place things.
“You’re better off busing the drums and everything that has a built-in spatial element to stereo. A guitar amp should be a single object. A larger instrument like a grand piano should be two objects at the most, spaced by a small amount. Backing vocals, stereo at most.
“The renderer we were using has an Android app, so if you want, you can bring a tablet around into the WiFi zone, and touch and move objects around. In much the same way the VST plug-in tells objects to move over TCP.
Out of Soundscape Renderer, the final step was a Yamaha MRX7-D (a 64 x 64 Dante matrix) as master EQ, gain and limiting for overall system management. The outputs were sent back out to the 24 Yamaha DZR10 top boxes via the same Rio racks via analogue. Tait also stacked a pair of DXS18XLF subs at the array apex to provide the bottom octave below 50Hz.
Ideally, they would have used all Dante-equipped versions of the speaker, but the Dante versions are only a few months old and Yamaha only had a dozen that weren’t already sold off. “Rather than mix analogue and Dante top boxes — the small difference in conversion might affect the rendering,” figured Tait, “it had to be either all analogue, or all Dante. So we ran into the analogue inputs of the Dante-equipped boxes.”
SPACE VS MOVEMENT
Yamaha specifically chose a variety of artists for the show — starting with the drum, bass and guitar avant-garde of Turret Truck, followed by the New Palm Court Orchestra acoustic chamber ensemble. OK EG’s electronic set and LAPKAT’s art DJ-ing opened up the mix for plenty of movement, Cookin’ on 3 Burners mixed robust live performance with pre-recorded stems, and the event was capped off by a bit of Spoonbill eccentricity.
Throughout the show, artists used the system to varying effect. Some kept it simply as a localisation tool, others went wild auto-panning objects throughout the field.
“What we found in this whole discourse is that space is more important than movement,” said Tait. “The most important achievement these types of Wavefield Synthesis rigs gives us is not the ability to swing the hi-hat around at a million miles an hour. That overuse does no service to the artistic intent. The more important element is localisation. You have an amplified product a large audience can experience, but not restricted to hearing it through a left/right window. It’s a more aesthetically believable and comfortable listening experience.”
Another tip Tait had was to create a ‘Glue’ stem; basically ‘everything everywhere’. “Spoonbill [the last performer] was throwing stereo stems hard left and hard right, and it worked a treat. Balancing it all, he had his groove stems like kicks and bass running through the ‘Glue’ stem.
“You had to have some things everywhere. Otherwise you get a bass groove coming from an object. Time alignment is so important to groove. If that object is physically displaced from other things, for some audience members it’s going to be ahead, for others, it’s going to be behind. With that established, then the ornaments become the objects.”
TIMING ISSUES & PHANTOM SOURCES
As a system engineer, designing a spatial audio system presents a new set of challenges. The first is changing the way you think about sub timing. In short, says Tait, “there’s literally no point in timing your subs. It’s not unimportant, it’s just impossible. What are you going to time them to? You’re moving sources around in space, so the alignment with the sub becomes irrelevant.
“The answer to that is using top boxes with the lowest possible crossover point. If you can use top boxes that are good down to about 50 or 60Hz, that’s your best bet. We were crossing over those subs at 50Hz, because the DZR10s were pretty good down to there.”
Tait says that while you can do a simple frontal system, it’s preferential to have the array wrap around the audience, and even better if it remains as one unbroken ring.
Tait: “You need to render the side, just to try and complete the wavefield synthesis wave formation. When you break the array — either to the extreme left or right — you introduce artefacts, which is just inescapable physics. The reason is, if you want to synthesise a wave coming from a point just to the left of the end of the array. Some loudspeakers within the array have to fire first, then the ones at the very end of the array fire a little bit later. If you’re standing inside the array, instead of perceiving the properly synthesised wave, your ears might localise the speaker that fires first.”
UPSIDES TO BE GAINED
On the upside, there’s generally more gain before feedback available in a system like this explained Tait: “It becomes a totally different formula, evidenced by the fact that for Cookin’ on 3 Burners we had a full band with drum kit, guitar amps, and a screaming Leslie cabinet, and a singer who could hear herself perfectly. And that was all running through the PA coming from behind them.
“If you have a gain before feedback issue, it’s just a matter of moving the object a little bit away, so you’ve still got the same SPL reaching the audience, and the performer still gets everything they need.”
AT A VENUE NEAR YOU
Tait says the learning curve shouldn’t be a turn off: “I’ve immersed myself in it for a few months so I’ve become familiar with it; but the amount of skill required is akin to setting up a Q-Sys, or Crestron, or big concert PA. Anyone that can do that, can do this. The time has come where the cost of everything is low enough where you should be able to get someone clue-y to get the software working and get a proper spatial system.”
With readily available tools, Yamaha proved it doesn’t take a team of acoustical engineers and software programmers to build a spatial audio system; and that it doesn’t have to be bankrolled by an arena tour budget. “We want to see venues and rehearsal studios that are putting a left/right line array — that probably shouldn’t anyway — cost up a spatial system,” said Tait. “It’s going to use many more, but less expensive drivers and amp channels. You can do a lot more with it, and even simulate a stereo PA system… if you want to.”