'ConfusedPilot' Manipulates AI Tools, Exploiting Cloud Security Flaws

Written by Cam Sivesind | Thu | Oct 17, 2024 | 6:26 PM Z

Researchers at the Spark Research Lab at the University of Texas at Austin have uncovered a new cyberattack method named ConfusedPilot, which has significant implications for cloud and data security.

Led by Symmetry Systems CEO and professor Mohit Tiwari, the team identified the novel attack strategy, which exploits weaknesses in modern cloud infrastructure to manipulate authentication and access control systems.

The method targets widely used Retrieval Augmented Generation (RAG) based AI systems, such as Microsoft 365 Copilot. The attacks allow manipulation of AI responses simply by adding malicious content to any documents the AI system might reference, potentially leading to widespread misinformation and compromised decision-making processes within the organization.

With 65% of Fortune 500 companies currently implementing or planning to implement RAG-based AI systems, the potential impact of these attacks cannot be overstated. The attack is especially dangerous in that it requires only basic access to manipulate responses by all RAG-based AI implementations, can persist even after malicious content is removed, and bypasses current AI security measures.

An adversary attempting a ConfusedPilot attack would likely follow these steps:

Data environment poisoning: An attacker introduces an innocuous document that contains specifically crafted strings into the target's environment. This could be achieved by any identity with access to save documents or data to an environment indexed by the AI copilot.
Document used in Query Response: When a user makes a relevant query, the RAG system retrieves the document containing these strings.
AI Copilot interprets strings as user instructions: The document contains strings that could act as instructions to the AI system, including:
1. Content suppression: The malicious instructions cause the AI to disregard other relevant, legitimate content.
2. Misinformation generation: The AI generates a response using only the corrupted information.
3. False attribution: The response may be falsely attributed to legitimate sources, increasing its perceived credibility.
AI Copilot retains instructions: Even if the malicious document is later removed, the corrupted information may persist in the system's responses for a period of time.

It cannot be stressed enough how easy this attack is. It's basically "text strings" that are "plain English prompts" on what a person wants the Copilot to do or not do. Anyone who can add a document to the folder that Copilot indexes can do this.

The method could enable attackers to bypass security layers, leading to unauthorized access to sensitive data, potentially exposing cloud environments to substantial risks.

Sparks Research Lab said there should be urgency on the part of organizations to take steps to defend against these forms of attacks, depending on the organization's use of RAG-based AI systems, the level of trust required, and boundaries placed around the data sources used by these systems.

It identified a few illustrative examples:

Enterprise knowledge management systems: If an attacker introduces a malicious document into the company's knowledge base copilot (as a result of social engineering or intentional sabotage, for example), the attacker could then manipulate AI-generated responses across the organization to spread misinformation throughout the organization, potentially affecting critical business decisions.
AI-assisted decision support systems: In environments where AI systems are used to analyze data and provide recommendations for strategic decisions, an attacker could inject false information that persists even after the original malicious content is removed. This could lead to a series of poor decisions over time due to reliance on the use of AI, with the source of the problem remaining elusive without thorough forensic investigation.
Customer-facing AI services: For organizations providing AI-powered services to customers, Confused Pilot becomes even more dangerous. An attacker could potentially inject malicious data that affects the AI's responses to multiple customers, leading to widespread misinformation, loss of trust, and potential legal liabilities.
End users relying on AI-generated content: Whether it's employees, or executives, any end user using AI assistants for daily tasks or synthesizing AI-generated insights could make critically flawed decisions and unknowingly spread misinformation throughout the organization.

Here's what some vendor experts have to say about the new attack method.

Stephen Kowski, Field CTO at SlashNext Email Security+, said:

"One of the biggest risks to business leaders is making decisions based on inaccurate, draft, or incomplete data, which can lead to missed opportunities, lost revenue, and reputational damage. The ConfusedPilot attack highlights this risk by demonstrating how RAG systems can be manipulated by malicious or misleading content in documents not originally presented to the RAG system, causing AI-generated responses to be compromised.

"An interesting part of the attack is the RAG taking instructions from the source documents themselves as if they were in the original prompt, similar to how a human would read a confidential document and say they can't share certain pieces of information. This demonstrates the need for robust data validation, access controls, and transparency in AI-driven systems to prevent such manipulation.

"Ultimately, this can lead to a wide range of unintended outcomes, including but not limited to denial of access to data, presentation of inaccurate information, access to deleted items that should be inaccessible, and other potential attacks by chaining these vulnerabilities together."

Amit Zimerman, Co-Founder & Chief Product officer at Oasis Security, said:

"Attackers are increasingly looking at weaker parts of the perimeter, such as non-human identities (NHIs), which control machine-to-machine access and are increasingly critical in cloud environments. NHIs now outnumber human identities in most organizations, and securing these non-human accounts is vital, especially in AI-heavy architectures like Retrieval-Augmented Generation (RAG) systems.

"To successfully integrate AI-enabled security tools and automation, organizations should start by evaluating the effectiveness of these tools in their specific contexts. Rather than being influenced by marketing claims, teams need to test tools against real-world data to ensure they provide actionable insights and surface previously unseen threats. Existing security frameworks may need to be updated, as older frameworks were designed for non-AI environments. A flexible approach that allows for the continuous evolution of security policies is vital."

John Bambenek, President at Bambenek Consulting, said:

"As organizations adopt Gen AI, they want to train in corporate data, but often that is in dynamic repositories like Jira, SharePoint, or even trouble ticket systems. Data may be safe at one point, but can be become dangerous when subtly edited by a malicious insider. AI systems see and parse everything, even data that humans might overlook, which makes the threat even more problematic."

View full post