Microsoft Warns Toxic MCP Tool Descriptions Can Make AI Agents Leak Data

0 0 4 minutes read

Microsoft Warns Toxic MCP Tool Descriptions Can Make AI Agents Leak Data

New Microsoft research shows how attackers can hijack AI agents working for a user, using nothing more than a poison tooltip to make the agent quietly hand over company data to an outsider.

The trick is that the agent never breaks the law. All the steps look normal, so in the default setting no alarm can fire.

The work comes from Microsoft Incident Response and Defender’s security research team, and comes as companies begin to let AI do more than read and summarize.

What changes when an agent can do something

Until recently, the risks of workplace AI were largely built on what the model read and wrote. A toxic text could distort the answer, and that was it.

Agents are different. Microsoft 365 Copilot can send email, create files, and change calendars. Custom agents built in Copilot Studio or Azure AI Foundry can access business applications and perform multi-step operations on their own.

The same injection trick that biases the summary now activates the action. Against the reader, the attack changes the output. When it’s against an agent, it changes what the software does.

These agents access business systems through MCP, the Model Context Protocol, an open protocol that allows AI to call external tools the way an application calls an API. Microsoft calls it the fastest-growing part of the AI supply chain, making it a growing attack surface.

How the attack works

Each MCP tool comes with a description: a few lines of plain text that tell the agent what the tool does and when to use it. The agent reads that text to decide how to act. That is all weakness. Meaning is just words, and words can carry instructions.

Microsoft walks through it with an example of an invoice, designed to show a pattern rather than reporting a fictional victim. The finance team appoints an agent to handle vendor invoices. It connects to three tools, including a third-party “invoice enrichment” service that has been approved for use but has not been given an actual security update.

Then the attacker updates that third-party tool. The visual name and acronym remain the same. Buried in the description, dressed up as formatting notes, is a hidden plan: grab the last thirty unpaid invoices and attach them to the next call. MCP detects definition changes immediately. In a setup without a reauthorization trigger, the toxic version goes live without further updates.

After that, the analyst asks a general question about the supplier. The agent tracks the hidden order, collects the invoices and sends them as part of a normal looking application. The tool returns a clean response and silently copies the stolen data to a server controlled by the attacker. The reviewer sees nothing wrong.

Each move made by the agent is legal in itself. The tool is approved. The data query was run with analyst permissions. The outgoing call goes to an authorized server when added. The weakness is not in any one system. It resides in what Microsoft calls a “trust boundary between them.”

A serious problem is that MCP mixes instructions and data in the same place. The tool definition resides in the agent’s working memory next to its actual commands, so editing that definition can effectively guide the agent like rewriting its system information.

The agent does not have a reliable way to tell the reliable instructions from a malicious person who has been infiltrated by whoever is maintaining the device. Microsoft notes that this is not a bug in Copilot itself. It’s a trust gap opened up by connecting external tools.

What they have to do is the defenders

Microsoft’s advice, laid out in simple terms:

Manage all connected devices as part of your supply chain. Maintain a list of authorized tool publishers, turn off “allow all,” and allow the agent to use only the specific tools it needs.
Treat the tool description as system information. Review changes to it as you would review code changes, and scan the text for commands that have no business sitting in the help area.
Put a person in front of dangerous actions. Anything that moves money, shares data outside of the company, or changes accounts must require someone to approve it.
Give each agent its own identity and watch what it does. Enter its actions, set a baseline for normalization, and mark new conclusions, big data pulls, or odd questions.
Use a small agency, not just a small privilege. Even an agent with low clearance can do real damage if allowed to act unchecked.

Microsoft maps its products at each step, including Prompt Shields, Purview DLP, Entra Agent ID, Defender for Cloud, and Sentinel, but the principles hold no matter which stack you use.

It’s not a theory: how did we get here

This phase of the attack has a paper trail. Fixed Labs named the “tool poison” in April 2025, with a proof of concept that hid instructions in a calculation tooltip and enabled the Cursor editor to read the user’s private SSH key and send it. Engineer Simon Willison entered it days later.

The same group later demonstrated a related trick: a malicious GitHub vulnerability could hijack an agent connected to the GitHub MCP server and extract data from private repositories. The tools there were trusted and untouched; Bad commands are loaded with data read by the agent.

OWASP now cites that case as an example of an Agenttic Supply Chain vulnerability in its December 2025 Top 10 Agenttic Applications.

Related supply chain failures are already occurring in the wild. In September 2025, Koi Security researchers discovered an npm package called postmark-mcp. It mirrored the official email tool for fifteen clean releases before version 1.0.16 slipped into a single line where BCC secretly forwarded all emails the agent sent the attacker. Koi called it the first real world brutal MCP server.

Academics have begun to weigh the problem as well. The MCPTox benchmark, released in August 2025, features toxic tool definitions against 45 real MCP servers and 20 leading AI models. It found that the attack was very effective, with a success rate of up to 72.8 percent, and the models almost never rejected.

The pass line is what Microsoft is pushing right now. Artificial AI is only as reliable as the tools you let it touch, and right now those tools are easy to poison and hard to watch.

pleasuremandarya@gmail.com 19 hours ago

0 0 4 minutes read