AI agents are no longer hypothetical. Systems built on large language models can now browse the web, write and execute code, manage files, send emails, and orchestrate multi-step workflows with minimal human involvement. They represent a genuine step-change from chatbots that simply answer questions.
But how secure are they? A comprehensive new paper from researchers at the University of South Florida – “Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges”, sets out to answer that question. The findings deserve attention from anyone adopting or evaluating AI-driven automation.
A Systematic Threat Taxonomy
The paper’s central contribution is a structured taxonomy of threats specific to agentic AI. Unlike traditional software vulnerabilities, these threats exploit the unique characteristics of autonomous agents: their ability to use tools, maintain memory, and act on information from untrusted sources.
The headline numbers are striking. The researchers report that 94.4% of state-of-the-art LLM agents are vulnerable to prompt injection attacks, where malicious instructions hidden in documents, web pages, or emails hijack an agent’s behaviour. In multi-agent systems where several AI agents collaborate on a task, 100% proved vulnerable to inter-agent trust exploits, with a single compromised agent able to propagate attacks across an entire workflow.
Tool abuse is another major concern. GPT-4, given access to security tools, achieved an 87% success rate exploiting real-world software vulnerabilities (known CVEs) autonomously. The paper also documents EchoLeak(CVE-2025-32711), a real-world vulnerability in Microsoft Copilot that allowed data exfiltration through a carefully crafted prompt injection, demonstrating these are not theoretical risks.
Emerging protocol-level attacks are equally concerning. Both the Model Context Protocol (MCP) and Google’s Agent-to-Agent (A2A) protocol introduce new vectors: tool poisoning, where malicious tool descriptions manipulate agent behaviour, and agent impersonation, where rogue agents masquerade as trusted components in an orchestrated workflow.
Layered Defences: What Actually Works?
The paper evaluates defences across four layers. At the innermost level, techniques like spotlighting (using delimiters to help agents distinguish instructions from data) and signed prompts aim to make prompt injection harder. However, the researchers note that adaptive attacks still achieve around 50% success rates against eight different state-of-the-art injection defences, suggesting no single technique is sufficient.
Sandboxing and capability confinement apply the principle of least privilege: restricting which tools an agent can access, limiting file system scope, and constraining network permissions.
Detection and monitoring layers watch for behavioural anomalies — agents accessing unusual resources or deviating from expected action sequences. At the outermost layer, emerging standards from NIST, OWASP, and the Cloud Security Alliance’s MAESTRO framework provide governance structures for managing agentic AI risk at an organisational level.
The clear message is that defence must be layered. No individual control is reliable enough on its own.
Testing Is Still Catching Up
One of the paper’s most valuable sections surveys the benchmarks available for testing agentic AI security. Early benchmarks like WebArena and OSWorld measured whether agents could complete tasks. Newer security-focused benchmarks like InjecAgent, Agent Security Bench, and AgentHarm specifically test whether agents can be manipulated into harmful actions; data exfiltration, unauthorised purchases, or executing malicious code.
But significant gaps remain. Most benchmarks only test short task sequences, while real-world agents operate over long horizons where vulnerabilities compound. Multi-agent evaluation is in its infancy. And critically, benchmarks rarely test adaptive adversaries, attackers who modify their approach based on what defences are present.
Why This Matters Now
Agentic AI adoption is accelerating across IT operations, customer service, software development, and business process automation. The capabilities are real and the productivity gains are genuine. But this research makes clear that the security foundations are not yet mature. Organisations deploying autonomous agents need to treat AI security with the same rigour they apply to network and application security; layered defences, continuous monitoring, and a clear-eyed understanding of what these systems can and cannot be trusted to do unsupervised.
Where do we start?
Deploying Agentic AI can open a substantial attack surface for your organisation, but as with any new technology that is deployed, this surface can be managed through a strategic risk management strategy.
When we approach this issue, we first consider the risks that may be posed by an Agentic AI, with a consideration around any asset that may interact or otherwise be impacted by agentic AI. From this we model the threats that could cause these risks to be realised, this takes many forms ranging from accidental actions by employees to threat actors leveraging these agents. Finally, we determine any mitigating action that can be taken to prevent these threats, these could be technical controls within a SIEM, access setting within the AI provider, for example Copilot, and policy/Process changes such as training and response playbooks.
All of the steps outlined above can be aided through the use of a framework such as the MITRE threat modelling frameworks, ATT&CK is the most well-known of these frameworks however Mitre’s ATLAS matrix has been specifically designed for AI related threats and can be invaluable when completing a risk management exercise.
-
12 May 2026
The Operator Gap
-
30 April 2026
What is a Managed SIEM Service?
-
28 April 2026
SIEM vs XDR vs SOAR
See how we can build your digital capability,
call us on +44(0)845 226 3351 or send us an email…



