Read Next: Review: Zoom H6studio

Home
/
Issues
/
Issue 109
/
AI, DAWs & Audio 1

AI, DAWs & Audio 1

In the first instalment of this series, Greg Simmons dismisses the fictional doomsday AIs of the past and prepares a path forward for AI in audio...

12 March 2026

“Scan the selected region, map the dialogue with the background image, and remove the room sound…” She slipped in her AirPods while Saganai’s processing logo, a small rotating sphere, turned pale blue before shrinking behind the pixels. The first pass wasn’t perfect, but the artefacts won’t be heard under the new atmos.

“Remove the plane noise…”

Saganai’s processing logo shrank behind the pixels, and the plane noise was gone.

“Good. I need Australian bush atmos. Night time. Add the sound of a campfire in the centre at the same distance as the dialogue, and a small creek flowing left to right across the front about three meters in front of the dialogue. A cricket panned right after the first line of dialogue, another cricket panned left after the second line of dialogue. An owl hoot in the centre after the last cricket. The owl and the crickets must be species heard in Gippsland…”

Saganai’s processing logo shrank behind the pixels, and the atmos played through the AirPods. The species of the crickets and the owl appeared on screen as their sounds occurred: two bush-crickets and a Southern Boobook.

“Make the campfire more subdued, like it’s burning out, and make the creek flow slower. Put the crickets here and here [taps beneath the shrubs to the left and right of the background image]. Put the owl in the trees here [taps the screen again]. Extend the owl sound for another second beyond the region; it has to cross into the next scene…”

Saganai’s processing logo shrank behind the pixels, and the new atmos played through the AirPods.

“Great. I can work with that. Export each part as an immersive binaural track, 48k 24-bit wav. Add three second handles, load them into the region’s tracks, and label their channels.” She dragged the cursor over the next region while voice messaging downstream. “Almost done. Let me know if there are any issues against your music; I can tune the owl and crickets if necessary…”

SONGS FROM UNDER THE FLOORBOARDS

About a million years ago I worked for The Herald & Weekly Times Ltd – a large newspaper publisher that was then located on Flinders Street, Melbourne. I was apprenticed to a team of electronics technicians and industrial electricians, installing and maintaining what was the largest electronic publishing system in the Southern Hemisphere. Millions of dollars of technology – including dozens of Prime minicomputers (each a 19-inch EIA rack as high as a doorway), high-intensity bromide laser printers with pithy warnings saying ‘Do Not Look Into The Laser With Your Remaining Eye’, tape drives spinning left and right like steering wheels on skid pans, and hard drives the size of minibars using four 12-inch platters to collectively provide 80MB of storage – all installed in a temperature- and humidity-controlled computer room with smoke sensors that triggered Halon outlets to displace the room’s oxygen in seconds, extinguishing fires without extinguishers. All repairs that involved soldering (and therefore smoke) had to be done outside the room, obviously…

The Floorboards

The minicomputers were accessed by journalists, editors and layout artists via keyboards and screens wired throughout the building with coaxial cables. As an apprentice with small hierarchical and physical stature I spent many hours in overalls, headlamps and dust masks inching under the floorboards on elbows and toes like Andy Dufresne escaping Shawshank State Prison, navigating the straightest paths and widest curves possible while pulling cable guides behind me and stirring up decades of inky-black printers’ dust.

The technology we were installing was going to put a lot of people out of work; primarily the readers and the compositors. The readers proofread everything the journalists and editors typed [remember typewriters?], highlighting spelling mistakes and grammatical improvements with red scrawls that only those deeper in the printing process could understand – specifically the compositors who prepared the text and images into a ‘page’ by clamping alphanumeric character stamps (fonts) and halftone etchings (images) into a wooden frame called a galley that ultimately made the plates for the printing press. The new technology had sophisticated error-correction and grammar-checking that dramatically reduced the need for readers. It could also directly etch a halftone plate of the entire page, thereby eliminating the need for compositors and their galleys.

Thanks to the power of unions and a level of employer decency rarely seen today, each person whose job would be affected was offered a redundancy package or re-training for another job within the company – including jobs created by the new technology, where their previous experience and understanding of the work had meaningful refinement and debugging value for the software engineers. While the re-training negotiations were going on, journalists and editors lobbied together and received a pay rise for using VDTs instead of typewriters for eight hours per day – citing potential eye strain. VDTs? Visual Display Terminals. We now call them ‘screens’ and many of us willingly spend far more than eight hours per day in front of them, seven days per week. I’m using a screen to write this now, and you’re using a screen to read it now.

The Songs

Other than being an apprentice electronics technician I was mixing live gigs in my evenings, running a small 4-track recording service with a TEAC A3440 and a 6-in 4-out mixer in the back of a Corolla panel van, and building a professional recording studio in a warehouse in Cheltenham. I was mastering the skills of manually dropping-in and out (a pre-emptive process), editing with razor blades, cleaning and demagnetising tape heads, setting bias and EQ, and all those other things related to working with analogue tape. I was happy to invest hours of my life learning those skills and gaining an understanding of the work because, unlike the readers and the compositors, I knew that sound engineers could never be replaced by computers.

Ever…

No one gets to vote on whether technology is going to change our lives.

— Bill Gates, 1996

Erasing The Fictional Past

The reference to the Terminator franchise above is clearly overdramatic, which is why it’s there…

When AI finally became available to the masses, in the form of ChatGPT, it got a highly polarised reaction.

At one extreme, living-in-the-future wannabes wished it into correctness despite the fact it knew nothing beyond September 2021 and they knew nothing about its ‘hallucinations’ (scroll down to ‘AI Hallucinations’).

At the other extreme, the Luddites of the silicon-revolution wished it into the cornfield, hoping it would go away while simultaneously mongering way too much fear that it would take over everything – just like drum machines took over drummers (they didn’t) and sequencers took over keyboard players (they didn’t). So what’s their problem with AI? Try this…

Until ChatGPT the general public’s experience of AI was entirely fictional, influenced by a century of mainstream sci-fi entertainment that included everything from Metropolis (1927) to The Creator (2023) and beyond. In most cases this AI-based fiction follows the arc of Mary Shelley’s ‘Frankenstein’, where the creation invariably turns on the creator. “I’m sorry Dave. I’m afraid I can’t do that.”

Our reactions to new things are often based on what we’ve experienced in the past. With regards to AI those past experiences have been entirely fictional, and yet we emphatically use these fictional experiences as references because they’re the only references we’ve got, and we’ve been spoon-fed them all of our lives – even though there is no spoon.

Embracing The Non-Fictional Present

Before delving deeper we have to understand the current state of AI, which is classified into three types: Narrow AI, General AI and Super AI.

Narrow AI, also known as Artificial Narrow Intelligence (ANI) or ‘weak AI’, is designed to perform specific tasks with high efficiency. It operates under a limited set of constraints and contexts. Examples include voice assistants and facial recognition systems (which exist on the borderline between Machine Learning and AI), Siri, and recommendation algorithms on streaming platforms. Narrow AI excels in its designated domain but lacks the capability to perform beyond its programmed functions. All currently available AIs are forms of Narrow AI, doing a range of specific duties from performing cool generative tasks in photo editing apps to researching and writing complete essays in ChatGPT.

General AI, also known as Artificial General Intelligence (AGI) or ‘strong AI’, possesses the ability to understand, learn, and apply knowledge across a wide range of tasks, similar to human cognitive abilities. AGI can reason, solve problems, and make decisions without human intervention. While it remains a theoretical concept at present, AGI represents the goal of creating machines that can perform any intellectual task that a human can do.

Super AI is a hypothetical form of AI that surpasses human intelligence in all aspects – creativity, problem-solving, decision-making, and even emotional intelligence. Super AI could outperform the best human brains in practically every field. Discussions around Super AI often involve ethical considerations and potential risks to humanity due to its advanced capabilities.

General AI and Super AI are currently the stuff of fiction, but are also the catalysts behind the fears people have about AI because they’re the ‘all knowing, all seeing’ AIs seen in movies and TV shows. For now, at least, they are entirely fictional.

For the rest of this series we’ll focus on Narrow AI, which is the only current form of AI. There are three forms of Narrow AI that are finding their ways into our creative industries and workflows: ‘Large Language Models’ (LLMs), ’Generative’ and ‘Rule-Based’. How do they differ?

Large Language Models (LLMs) are advanced artificial intelligence systems designed to understand, process, and generate human-like text. Built using deep learning algorithms, they are trained on vast amounts of text data – enabling them to recognise patterns, predict words, and create coherent responses across various contexts. LLMs can perform tasks like translation, summarisation, content creation (such as this paragraph), and answering questions. They continuously improve as they process more data, showcasing remarkable adaptability. Examples include models like ChatGPT which excels in generating natural and contextually relevant language, enhancing user interactions in numerous applications

Generative AI is designed to create new content – such as text, images, music, or code – that mimics human-like creativity. Using large datasets and complex algorithms, these models learn patterns, styles and structures, enabling them to generate original outputs. Popular examples include chatbots, image generators, and generative fill in photo editing apps. Generative AI has applications across industries, enhancing content creation, design, entertainment, and problem-solving. Its potential is vast, but it also raises ethical considerations around authenticity, bias, and responsible use.

Rule-Based AI operates on predefined rules and logical statements set by humans. It uses ‘if-then’ conditions to make decisions, allowing it to process data and produce specific outputs based on those rules. It excels in structured environments where tasks are predictable, such as automated customer support or data validation. Unlike machine learning, rule-based AI doesn’t learn from data but relies entirely on programmed logic, making it straightforward yet limited in adaptability.

Some systems combine different Narrow AIs. Apple Intelligence is a good example. It uses an LLM to provide the Writing Tools option seen on the Edit menu in the Pages app, Notes app, Calendar app, Reminders app, and other apps that require text input. It uses Generative AI in the Photos app to remove unwanted objects from pics and fill in the space left behind them (i.e. generative fill) and to retouch pics (e.g. smooth out rough skin), and to create images in the Image Playground app. It uses Rule-Based AI and Machine Learning to deal with facial recognition, speech recognition, handwriting recognition, trackpad/keyboard/battery usage and other user interactions.

SESSION CALL

Of specific interest to recording musicians and composers, Logic Pro uses Apple Intelligence’s Generative AI, Rule-Based AI and Machine Learning models to create its ‘Session Players’ – currently a drummer, a bassist and a keyboardist – that are built into the app. These Narrow AI systems are trained on human session players, and create nuanced performances by analysing the song’s structure, tracking the chords, and, ultimately, creating a backing that mimics the feel, phrasing and techniques of human performers. Parameters such as instrument type, playing style and tone are user-adjustable, articulation and dynamics can be modified, and it’s all editable in Midi – which means the resulting performance can ultimately be applied to any software instrument in Logic Pro.

But the brightest jewel inside of me, glows with pleasure at my own stupidity.

— Howard Devoto, 1980

Drum machines didn’t replace drummers, sequencers didn’t replace keyboardists, and it’s unlikely Logic Pro’s Session Players will replace human session players – at least not in terms of paid work. The majority of musicians/composers who use these AI Session Players in their finished mixes will be the same musicians/composers who could never afford or justify the cost of human session players anyway – which means that, in reality, the human session players are only losing work they never had and payments they’d never collect. In contrast, the professional composer can use Logic Pro’s Session Players to fine tune the parts they want and then provide the sheet music for human session players to work off and perhaps embellish, for both recording and live performance. The composer doesn’t need them for the time-consuming grunt work of composing but rather for what they’re able to bring to an existing AI Session Player’s track. Real musicians thereby become the premium product; they’ll be able to charge the same total money for the studio call, but they’ll be spending less time waiting around as arrangements are debugged, more time adding their creative flair to the part, and possibly getting the job done faster for the same money. The Australian Musicians’ Union, for example, recommends rates for session musicians based on a minimum three hour session call. If a composer uses a tool like Logic Pro’s Session Players to work out the fundamentals of a part before calling in a human session player to add their creative flair, the human session player might only need to be on the job for an hour but will still get paid for three hours in union-recognised sessions. We can expect to see changes in the rules set forth by musician unions globally as AI enters the session player environment.

The Session Players feature is easy for Logic Pro to offer because Apple Intelligence is ‘on device’ (as explained in my ‘Intel to Apple Silicon’ article); it doesn’t need an internet connection or any external devices because it is integrated throughout the device’s operating system, hardware and apps. However, it won’t be long before other DAW manufacturers – especially those offering cross-platform apps – figure out how to integrate AI into their apps. We’ll look at how they might do that in a later instalment of this series, but first…

THE BUG IN THE BELLY BUTTON

As sci-fi author William Gibson accurately observed, at present AI is doing too much of the creative stuff and not enough of the ‘grunt work’ – a fact that reinforces the fears of the silicon-revolution Luddites, and is largely the fault of nerds showing what they can do with AI in exchange for that reassuring “aren’t you a clever boy” pat on the head that illuminated their blue LED adolescence.

Building AI-based Session Players into a DAW (as described above) is an interesting example. On the negative side, it is as Gibson said: AI doing more of the creative stuff and less of the grunt work. On the positive side it’s an excellent and collaborative feature for recording musicians and composers.

But is it providing undeniably useful ‘clever boy’ features that nobody actually asked for?

GET ON BOARD

We’re currently at the beginning of a long-anticipated intelligent evolution where AIs are accelerating the development of better AIs. We’ve never seen an intelligent evolution before and therefore we don’t know how to react to it, but one thing is for sure: it will happen much faster than expected. As I’ve quoted many times, “Technology moves through society like a steamroller. If you’re not part of the machine, you’re part of the road.” A steamroller powered by an intelligent evolution moves incrementally faster and further every day. If you don’t want to be part of the road you’ll need to get on board the machine as soon as possible, hold on tight, and try to influence its direction by remembering the non-fictional past of those readers and compositors mentioned earlier, and applying a healthy dose of consumer-level Darwinism (i.e. don’t support products or features you don’t want). Perhaps we can steer that intelligent evolution towards a better outcome for musicians, audio engineers, video pros and others involved in content creation. The ‘clever boys’ clearly have no idea.

In his book Future Shock (1970), Alvin Toffler wrote “Tomorrow’s illiterate will not be the man who can’t read; he will be the man who has not learned how to learn. Our moral responsibility is not to stop the future, but to shape it…” I cannot think of a more appropriate time to give that quote more prominence.

AI developers should be shifting their focus away from tools that nobody asked for and towards the needs of potential users; asking what is needed, what is wanted, and what we’re willing to pay for. I’m not interested in becoming “the most dangerous person in the room” (what a ridiculous marketing slogan that was, as if AI could give me the nuclear codes). I’m also not interested in an AI girlfriend who says she can satisfy all of my desires despite being nothing more than a pocket-sized Tinkerbell trapped in a glass and aluminium sandwich. What’s she going to do? Vibrate excitedly in my pocket? No amount of rubbing is going to let that genie out of the bottle. The ‘clever boys’ are clearly incels…

[The grey-coloured text you’ve read within this article was written by AI using Apple Intelligence’s ‘Writing Tools’ found under the Edit menu of their Pages word processor app. Each section took less than 10 seconds to prompt, create and paste into this document, and required no preliminary research on my behalf. I did some fact-checking afterwards (including checking for hallucinations), then edited the text just as I would do with anybody else’s writing to bring it in line with AudioTechnology’s ‘voice’. In each case I was exploiting AI’s strengths (information gathering) and avoiding its weaknesses (judgements and opinions).]

Our moral responsibility is not to stop the future, but to shape it.

— Alvin Toffler, 1970

Next Instalment: AI-assisted DAWs for Mixing.

DANGEROUS PERSON

The phrase ‘Become the most dangerous person in the room’ is a ridiculous incel-level marketing slogan that often pops up on social media advertisements promoting courses to develop your AI skills. Spoken in a firm authoritative voice, I find it is primarily effective at a) developing my swiping skills, and b) reinforcing the idea that AI is going to be bad. Whoever thought of that slogan deserves a Roger von Oech ‘whack on the side of the head’ from the most dangerous person in the room.

AI HALLUCINATIONS

The term ‘AI hallucinations’ refers to instances where AI systems, particularly those using LLMs or generative AI, produce results that are factually incorrect, nonsensical, or entirely fabricated. These errors can occur despite the AI presenting the information with a high degree of confidence and fluency, making it potentially misleading.

Hallucinations often arise because AI models generate responses based on patterns in the data they were trained on, rather than verifying facts or understanding context. They might fill in gaps with plausible-sounding details, especially when specific information is lacking or ambiguous. This phenomenon can affect applications like text generation, image creation and translation tools.

Understanding and mitigating AI hallucinations is crucial for ensuring the reliability and safety of AI systems, particularly in sensitive fields like healthcare, legal contexts, and news dissemination.

ABOUT THE LUDDITES

The Luddites were a group of English workers in the early 19th century, primarily active between 1811 and 1816, who protested against the rapid industrialisation that was transforming traditional industries. Originating in the textile regions of Nottinghamshire, Yorkshire and Lancashire, they were known for destroying machinery – particularly stocking frames, spinning frames, and power looms – which they believed threatened their jobs and livelihoods.

Named after the possibly mythical figure Ned Ludd, who symbolised their cause, the Luddites were not inherently opposed to technology itself but resisted how it was used to undermine skilled labour. Their actions were in response to economic hardship, poor working conditions, and declining wages brought on by industrial capitalism. They operated under a quasi-military structure, often conducting night-time raids on factories.

The British government responded harshly, deploying thousands of troops to suppress the movement and passing laws that made machine-breaking punishable by death. By 1816, the Luddite uprisings had largely been quelled.

Today, the term ‘Luddite’ is often used disapprovingly to describe someone who is simply resistant to technological change. However, the historical Luddites were complex figures whose struggle highlighted the social costs of industrial progress and the need for fair labour practices amidst technological advancement.