PLAYSTATION VR: SOUNDTRACKING A NEW REALITY
Virtual Reality is experiencing a rebirth and Sony is right in the game. The pioneers of PlayStation VR audio give AudioTechnology tips on how to get it right.
Story: John Broomhall
Until recently, Virtual Reality has been most comfortable in the science-fiction realm. Much like anyone wearing a ye olde ‘futuristic’ VR headset, early attempts at actually creating fully immersive 3D worlds stumbled along, cut off at the knees by primitive graphics, horrific frame rates and appalling motion lag.
Slowly but surely, VR technology caught up to the future, though sky high costs rendered a consumer offering improbable — until now. Consumers are buzzing at the prospect of in-home VR; at CES punters were lining up around the block to don some snow goggle-sized headsets for a virtual tour of VR’s capabilities. Global corporations — unperturbed by 3D’s lack of wide scale take up — have also been digging deep, funding research and development to make Virtual Reality a reality at retail.
No surprise that gaming giant PlayStation has been, and remains at the forefront of this pioneering endeavour with their PlayStation VR (PSVR) system due to hit the streets during 2016.
Until you’ve personally donned the PSVR headset and some decent headphones, VR may seem like an interesting idea and something you might like. However, as soon as you try it for yourself be prepared to get broadsided by a massive paradigm shift. You instantly get it. You can look up, down and all around you, and the settings — like being underwater in Sony’s The Deep scuba encounter — feel breathtakingly expansive. PSVR delivers on VR’s necessary promise; it immerses you in a deeply compelling world that makes you feel like you’re leaving the real one far behind.
VR, A NEW DAWN FOR AUDIO
The sound, music and dialogue components of these experiences are crucial. They can subtly draw a player in or psychologically bump them out, so members of Sony’s London Studio — where the operation, originally dubbed ‘Project Morpheus’, was nurtured — have put plenty of thought into PSVR’s application of audio. Alongside them in the UK capital are members of Sony Computer Entertainment Europe’s Creative Services Group: a collective of experts servicing the music, audio and video requirements for a plethora of videogame titles and their marketing campaigns.
It’s primarily fallen to these pioneering sound designers, music creatives and software engineers to establish the platform’s audio requirements; create the technical train tracks for PSVR audio to run on; and explore what does and doesn’t work creatively, for a new dimension of interactive entertainment.
London Studio Director, Dave Raynard, reckons VR is putting the spotlight on audio again. Having to wear a visor and headphone combo to experience VR properly means game developers have the entire audio attention of the player. “VR is really audio’s day in the sun,” said Raynard. “Audio plays a huge role in taking people to another place, it’s half the experience. One aim of VR experiences is attract the player’s attention — to make them look around using the ‘VR-ness’ of it — audio is a great way to do that. If you get audio wrong, it’ll be very, very noticeable.”
Fortuitously, Sony has not only a long history in audio, Raynard says Sony has put a great deal of effort into developing binaural audio systems for PSVR.
Alastair Lindsay, Head of Music, says binaural audio is the perfect audio companion to VR: “It really helps create the illusion of a virtual 3D world. It convincingly reproduces the location of a sound: behind, ahead, above, or wherever else the sound is emitted from.”
Lindsay offered a concise explanation of binaural audio: “In short, this is achieved by taking a piece of audio and processing it such that it includes all of the key cues the brain uses to locate sounds in space.” Sony’s audio system uses head-related transfer functions (HRTF) to filter the sound emitters depending on their position in the world, taking into account things like ‘Interaural Time Difference’, i.e. the difference in time taken for a sound to reach either ear. “The small adjustments to the different sounds using this technique, as well as a number of other factors, create compelling positional audio,” said Lindsay.
Nick Ward-Foxton, Senior Audio Programmer, explains PSVR uses real-time binaural processing to achieve the most realistic-sounding and immersive experience possible: “Headphones are a great output format for us; the best option for delivering HRTF and binaural sound. Audio developers don’t need to worry about head tracking because the output is already aligned to head orientation.”
To create the overall soundscape in the most efficient way, the team developed options for the route a sound takes through the 3D audio system. Ward-Foxton described three of the basic paths they find really useful: Discrete 3D object, Surround Bed, and Straight To Ear.
Ward-Foxton: “Discrete 3D objects are the highest fidelity. The majority of sounds will be this type, and you may need a priority system to manage the number of voices playing versus the number of 3D voices available.
“We also have a 10-channel Surround bed made up of virtual speaker positions including height. It’s useful for lower priority positional sounds and still gives you a good sense of position. If your mix is busy, then once you hit the voice limit for 3D objects the lower priority sounds can mix down into this bed until a 3D voice is available.
“Thirdly, you can send a signal directly to the headphones, a really important option for non-diegetic music and abstract ambiences. We’re also using it as a kind of LFE channel where we send a low-passed copy of a 3D sound straight to the headphones for explosions and other big sounds. This feature also gives you the ability to play back binaurally-recorded material in the game which can give great results for certain sounds — e.g. an object striking a helmet of an in-game avatar.”
HOW TO MIX IT
While the origins of binaural recording can be traced back to over a century ago, creating and mixing real-time binaural audio for VR is still yet to become anything close to commonplace. The people involved in project’s like PSVR are still pioneering the format. Simon Gumbleton, Sony London Technical Sound Designer, says VR experiences require full immersion; a complete suspension of disbelief on behalf of the player. A state that must be maintained by not triggering the subconscious ‘reality testing’ our brains do in the background. “The reason our dreams often seem so real is because this ‘reality testing’ mechanism is effectively switched off when we sleep,” he explained. Gumbleton shared some of the hard fought lessons the team has learnt along the way to keep players from ‘reality testing’ VR.
WHAT IS PLAYSTATION VR?
PlayStation VR combines a landscape-scanning, custom 120Hz OLED display and specifically developed optical unit with an approximate 100-degree field of view to create a 3D space in front of your eyes.
The PlayStation visor has a high-sensitivity accelerometer and gyroscope run at a very high frequency, allowing PlayStation VR to detect your head’s movement with almost no latency. When used with a PlayStation camera, your head’s movement, and the movement of the controller can be tracked and reflected in the game’s images in real time.
To experience Playstation VR, all you need is the visor, a PlayStation4 and the PlayStation Camera.
APPROACH TO ASSET CREATION
Simon Gumbleton: “Choose audio content that’s as dry as possible and let the system add early reflections, late reverb and decide the balance between those and the dry signal. Fully anechoic material works best through HRTF filters and dynamic reverbs, but isn’t always practical or possible so aim to find or record material with minimum room or reflections baked in. That sonic information can end up giving the player incorrect cues and make spatial localisation difficult.
“Sounds should ideally be mono sources, so that our 3D audio system has full control over adding the spatial information. Design them how you want without worrying about the HRTF process, and don’t try to correct for it with EQ. Otherwise you’ll break the spatialisation cues added by the system.
“Humans are much better at localising sounds when they move their heads and get that change in content between ears, so you might design slightly longer sounds for situations that require precise localisation.”
SG: “Channel-based content is definitely still possible in VR, but it’s important to restrict it to non-positional and non-diegetic material. Mood stuff like abstract drones work fine in 2D, but positional content that ‘sticks’ to your head when you turn is distracting.
“Keep ‘player sounds’ subtle and neutral — a subconscious reinforcement of player actions in the virtual world. If they become too noticeable you’ll create a disconnect between player and avatar. The player will be distracted, conscious they’re not making that sound.
“Too much compression on dialogue can pull it out of the world and make it feel 2D. Use less compression than normal on dialogue to keep it feeling like it’s coming from the characters, rather than a phantom centre speaker somewhere in front of your face. When recording dialogue, capture a performance that relates to how it will be played back.”
SPECIFICALLY PLACE AUDIO
SG: “Location of sound sources in the world needs to be accurate. This means using more emitters in multiple locations. For the vast majority of sounds in a scene, you can’t just emit them from the root of an object.
“Respecting head tracking is very important. Sounds should move correctly in the world. Much audio that previously might have been piped in stereo will benefit from being positioned correctly in the world. You might treat almost all the elements of an ambience as positional sources. It really helps place the player in that space; they can move around, lean in, turn their head, and what they hear makes sense in the virtual world.
“If you’re using any sort of dynamic obstruction or reverbs, you need to have a good relationship with the environment creators on the project. You also need tools which allow physics authoring that works for audio. We’ve often found that ‘accurate’ physics doesn’t always give the best results from a design perspective — you need to find what works for your audio design.”
CASE STUDY: THE DEEP
Descending through ocean layers in a cage, The Deep diving simulation comes complete with a terrifying shark encounter. Joanna Orland, Senior Sound Designer talked about the audio vision and its implementation: “Before doing any hands-on sound design on a new project I investigate the audio identity I’d like to give it. It’s important not to create assets in isolation; every sound must fit into the audio world you’re building.
“For The Deep I listened to how films have portrayed underwater sound and found they tend to give space a similar treatment too. The Abyss, 20,000 Leagues Under the Sea, Gravity and Battlestar Galactica all recreate the feeling of being underwater or in space, not the reality.
“Sounds were somewhat muted and everything was quite soft. Object sounds were often vibration and movement rather than literal audio. All of this led me to define the audio of The Deep as:
Very ambient and motion-driven rather than realistic; you’ll hear movement of sea-life rather than overt animalistic qualities.
The deeper you get, the more muffled and also ethereal the sound will become.
The soundscape will be very ‘soft’.
Player sounds will be subtle but real (making a player sound realistic and believable facilitates an emotional attachment with the avatar, aiding immersion).
“Working alongside composer Joe Thwaites, our mission statement was: The sound will be felt, the music will be heard.
“We’re not trying to make something real per se. We want our virtual version of cage diving to have drama, evoke emotion and put people inside a ‘Hollywood movie’ version of this compelling underwater adventure. We push the sound design into hyper-real, especially when the reality can be rather underwhelming, break immersion or deflate the drama.
“For example, I didn’t want to hear constant breathing (even though that’s the reality) because your brain would filter it out after a while. We decided to filter it ourselves then bring it in for key moments to help create tension and fear.”
Once Orland knew her creative aim, she broke it down to figure out the exact approach. The sound itself would be realistic — breathing through apparatus — but not the emotionally manipulative implementation. The breathing would be heard at first to establish its presence, and the presence of the oxygen tank, but once the player was fully submerged, it would be subtle to the point of nearly inaudible.
“The player would feel like they’re breathing, but wouldn’t hear it as an obtrusive sound because that might ruin the ambience of the underwater experience and create distraction rather than immersion.
“We also decided to change the volume and pace of breathing at key points. For example, breath sound would be removed before a big shark attack and brought back at a higher intensity immediately after the attack. This enhances the player’s ‘scare experience’. The objective is for the player to take on some of the emotive experience internally through the avatar’s frightened breathing. Testing revealed it worked and players’ own breathing was often affected heightening their emotional engagement.
“To me, player/avatar sounds are the most subjective area of VR audio design — they can make or break immersion. We keep things very vague as people are used to their own body sounds, which vary person-to-person according to size, height, weight and gender.
“The great thing is there are no rules. With every sound now able to be in a binaural space, we have more room to play. That’s led to things like our FOCUS system, developed on The Deep, where some sounds only become audible when the player focuses on them intently for a while.”
MAKE AUDIO REACTIVE
SG: “It’s crucial the player feedback you provide with audio is believable, especially with object interactions. Design sounds for these interactions in a way that reacts believably to any player input. It really helps to be able to easily get player parameter values like speed, acceleration, rotation and angular velocity from any player input at any time so you can design reactive blend containers.
“Another key aspect of placing sounds is distance modelling – 3D audio systems don’t give you this ‘out of the box’ – you need to design it. Volume and basic filtering over distance is a great starting point and still works in VR. But there are some extras that really help sell distance. For example – dynamically driving the send levels to reverbs over distance. This works for simulating proximity as well. You can exaggerate certain properties at very short distances and drastically reduce the reverb send level to emulate very close sounds.”
SG: “With high fidelity azimuth and elevation, plus a 360-degree sound field, 3D audio systems give you more space in the mix. You need to move most of the run-time mixing controls out of the bus structure and into your sound object hierarchy. By using lots of side-chaining, meters, states and ducking to dynamically drive different sound objects, you can still create dynamic and reactive mixes.
“However, too much ducking quickly becomes obvious and can pull the player out. The trick is lots of side-chaining in small amounts. Where dialogue needs to be heard in a busy scene, having the dialogue a little louder is better than ducking other elements too aggressively.
“You also have the ability to influence the player’s focus, which can be really powerful. Thinking back to tracking player input parameters — you know exactly where the player is looking in the scene, so you can use that data to manipulate the audio focus of the scene, enhancing specific elements whilst suppressing others. “Decide on the focus and be aware of your ability to grab the player’s attention, particularly with respect to how new elements enter the soundscape.
“Mixing audio in VR is probably the biggest departure from our old workflows. This is primarily because 3D audio systems change the end point of our signal chain. With channels and busses, the end point is way down at the master fader, but in 3D audio systems, the end point is at each audio ‘object’.
“This has some implications for how you construct and sculpt what the player ultimately hears. Concepts like summing and buss processing don’t make sense in an object-based system. You can’t stick a multiband-compressor or EQ or tape saturation on the master fader any more — not only because you need to maintain the individual object signals — but you would also end up wrecking any ILD and HRTF filter cues.
“This means effects processing must be done at the object level which, depending on the number of objects you have, likely means an increase in real-time DSP. It also means group processing is more difficult — for example you can’t run all your vehicle sounds through a single compressor on a buss.
“By using traditional techniques such as side-chaining, states and meters, alongside a few VR specific systems we’ve built, we’ve found that even without a traditional mixer structure, we can still create immersive, dynamic and reactive mixes.”
A FRESH PERSPECTIVE
Perhaps the most valuable thing you can do, said Gumbleton, is experiment. The entire VR spectrum is relatively new, and a lot of the things that worked for the team were found by simply trying out ideas. “You’re working on totally new experiences and systems,” reminded Gumbleton. “You’ll face new and exciting challenges in VR that no-one else has faced.”
Despite the infancy of this generation of VR, some are already predicting its universality. Raynard is one of those. He’s been around for the debut of many game technologies in the past — camera tech, motion gaming — but this is the first time he’s seen a technology’s potential extend so far beyond games, and that makes him excited about its future: “There are film people interested, medical people, business to business. That’s why I think it will become established as a ‘medium’ — just like theatre, radio, TV, film — and that’s what makes it so exciting.” That said, he’s wisely not putting a timeline on its adoption. “It could surprise us and be really quick or there might be a longer tail,” mused Raynard. “When film went from silent movies to sound, it took 10 years for all the cinemas in America to change over. This is just the beginning.”
Alastair Lindsay, Head of Music, and Joe Thwaites, Composer & Music Systems Designer, talk about the role of music in virtual reality gaming, where it should sit in the mix, and the importance of musical interactivity: “There’s a common misconception that virtual reality implies simulation — and therefore non-diegetic music isn’t appropriate. But VR experiences come in many forms; abstract worlds as well as more realistic environments. Players suspend their disbelief if the content is both audibly and visually consistent and believably presented. Most traditional game and film music approaches are still pertinent in VR experiences.
“In general, provided the player is invested in the experience we’ve found that sending the music straight to the player’s ear (where the music position persists as you move your head) can work effectively, though not necessarily in every scenario.
“Presence is the single most powerful feeling created by VR. Maintenance of presence is of great importance, as it transcends the user from spectator to participant. But presence is a delicate construct easily disrupted unless all elements work well together to create a coherent experience.
“In most cases you need music to act on a subconscious level, to avoid drawing attention to itself and to support narrative. If it draws attention, there’s a risk the player’s subconscious will respond with a ‘reality check’. It’s like how you accept a dream as reality until something from outside the dream world, like an alarm, intrudes.
“We’ve found music affects presence most when it starts and stops. Getting music in and getting out are often the most jarring moments in audio continuity. Much of our prototyping has been addressing how music enters the experience without drawing attention.
“However, different levels of presence can be used to creative effect, adding a new layer of dynamics to an experience. In The Deep, a piece of music scoring the manta rays’ approach enters relatively boldly helping the player feel they’re a spectator of a magical moment. As it fades away with the departure of the manta rays, the reality (virtual reality) of the situation comes back into focus, making the imminent danger more pronounced. The music acts as the calm before the storm.
“Many of the challenges regarding making audio and music for VR experiences actually spring from the player’s heightened environmental awareness. If music-to-visual sync points don’t hit precisely, the music draws attention to itself much more than in a traditional game. We use many of the same techniques learnt from interactive music best practice in traditional games to address this, but we’ve also been exploring alternative approaches to push the flexibility of our interactive music. This even includes combining MIDI sample playback together with streamed audio stems which allows us to balance timing precision with audio quality.
“Being able to monitor and respond to what the player’s looking at is a totally new opportunity that VR enables. It allows us to customise the music to the player’s experience. For example, the manta ray moment is only triggered when you first see a manta ray. So if you’re looking in another direction when that animation starts, the ‘magical music reveal’ will wait until you spot the creature, making the moment much more powerful and cinematic.
“Another example is just looking around in The Deep. Look up to the light and the music is subtly high-passed and a new high sparkly music layer is introduced. Look down and the music is low passed with more bass frequency layers added. It’s subtle but really effective in creating a dynamic soundscape that ‘sells’ the environment — for instance, looking down gives many people a sense of dread. Overall, the combination of 3D audio and knowing where the player is looking allows us to create totally new audio experiences that take the experience of VR to another level.”