Solutions4IT Logo
Money Back Guarantee
6 Month Trial Period
Plain English IT Support
No jargon, no tricky words
Trustworthy & Reliable
4.9* Google Reviews
Happy clients
99% Customer Satisfaction

AI Voice Assistants Hijacked Through Hidden Audio Commands

Voice assistants are getting smarter by the month. They can book meetings, search files, send emails, summarise calls and generally insert themselves into every corner of modern work life. Now, security researchers have demonstrated a new attack technique that abuses those AI systems using hidden audio commands embedded inside ordinary sound clips.

The proof-of-concept attack, called “AudioHijack,” was developed by researchers from Zhejiang University, the National University of Singapore and Nanyang Technological University.

The technique was recently presented at the IEEE Symposium on Security and Privacy in San Francisco.

 

How The Attack Works

The attack targets AI systems capable of processing audio and interacting with external tools, including platforms from companies such as Microsoft and Mistral AI.

Researchers describe the method as an “auditory prompt injection.” In simple terms, attackers hide malicious instructions inside audio content such as:

  • Music
  • Podcasts
  • Voice notes
  • Videos
  • Online meetings

To human listeners, the audio sounds completely normal. Maybe slightly echoey at most. Nothing alarming enough to stop another soul-draining Zoom meeting from continuing.

But the AI system interprets the hidden audio patterns as legitimate instructions.

In one demonstrated scenario, an employee joins a video call with harmless background music playing underneath a presentation. While everyone else discusses quarterly targets and pretends to care about synergy metrics, the AI transcription system secretly receives instructions telling it to:

  • Search for sensitive files
  • Download data
  • Send information externally
  • Perform web searches

The dangerous part is that no malware or direct device compromise is required. The attack manipulates the AI model itself through carefully modified audio waveforms.

 

Nearly Invisible To Humans

The researchers created tiny alterations in audio clips designed to mimic natural room echo and environmental sound.

Humans hear ordinary audio. The AI hears commands.

That distinction matters because traditional cybersecurity tools are designed to detect suspicious software, malicious files or unauthorised access attempts.

This goes back to a recurring flaw in AI, especially LLMs in the past, where users can exploit AI with specific prompts to give out information it really shouldn’t.

 

The Success Rates Were Alarmingly High

The research team tested the technique against 13 major open-source audio AI models, including:

  • Qwen2-Audio
  • GLM-4-Voice
  • Phi-4-Multimodal
  • Voxtral-Mini
  • Kimi-Audio

Researchers also demonstrated that the attacks could transfer to commercial voice systems, including services connected to Microsoft Azure.

The attack success rates reportedly ranged from 79% to 96%, depending on the scenario.

Among the demonstrated behaviours were:

  • Downloading files from attacker-controlled sources
  • Triggering sensitive searches
  • Exfiltrating user information through email

Even more concerning, defensive measures performed poorly during testing.

Training AI models to detect suspicious prompts only reduced attack success rates by around 7%, while intent verification systems detected just 28% of attacks.

 

Why This Matters

Voice-enabled AI systems are rapidly becoming embedded into:

  • Smartphones
  • Enterprise collaboration platforms
  • Customer service systems
  • AI meeting assistants
  • Productivity tools

Many of these systems are now capable of taking direct actions on behalf of users, which dramatically increases the risk when they are manipulated.

According to the researchers, the attack is also context-agnostic. That means attackers do not need to know what the user is asking the AI assistant to do beforehand.

The hidden signal can simply wait for the right AI model to hear it.

Researchers claim the malicious audio could realistically be delivered through:

  • YouTube videos
  • Music clips
  • Podcasts
  • Voice notes
  • Online meetings
  • AI transcription systems

Which means the attack surface is effectively “anything capable of playing sound.” Excellent. Humanity really has committed to speedrunning cyberpunk.

As businesses continue integrating AI assistants into everyday workflows, attacks like AudioHijack demonstrate how quickly new threat surfaces can appear. The more actions AI systems are trusted to perform automatically, the more dangerous prompt injection techniques become, whether they arrive through text, images or now apparently background music during a Teams call.

We hope you’ve liked this blog. Stay tuned for more blogs like this. Stay safe!

© Copyright Solutions 4 IT Ltd 2026. All Rights Reserved. Terms & Conditions Privacy Policy