ChatGPhish Vulnerability Turns ChatGPT Web Snapshots into a Phishing Site

Cybersecurity researchers have disclosed details of a vulnerability in OpenAI ChatGPT that uses an artificial intelligence (AI) assistant to fully trust Markdown links and images to trigger quick injections and open the door to phishing attacks.
The mechanism is coded XoxaGPhish by Permiso Security.
“The chatgpt.com response provider trusts Markdown links and Markdown image URLs from third-party pages that the assistant has recently digested. It automatically downloads those images and places those links as live, clickable elements within the trusted assistant’s UI,” security researcher Andi Ahmeti said in a report shared with Hacker News.
In a hypothetical attack scenario, a bad actor could add a small payload to any web page the victim later tells ChatGPT to encrypt, causing it to leak their IP, User Agent, and Referrer information when attacker-hosted images embedded on the page are automatically downloaded when a response is provided.
In addition, it can lead to malicious Markdown links being served as live clickables within the assistant response, providing false system security alerts, and providing a QR code from the attacker’s S3 bucket and tricking the victim into scanning it with their mobile, effectively bypassing desktop URL filters and corporate security controls.
Recent findings show how the abstract can appear as a contested space. Earlier this March, Permiso also revealed how an attacker-controlled e-mail containing specially crafted instructions, when digested by Microsoft Copilot, could be vulnerable to rapid injection (XPIA) or indirect rapid injection.
What makes ChatGPhish a notable attack method is not the quick injection itself, but the way the instructions embedded in the web page are followed and presented to the user as part of a snapshot.
In other words, a regular web page digested with ChatGPT is enough to provide phishing links, spoofed account alerts, remote images, and QR codes directly within a trusted AI interface. As organizations increasingly use ChatGPT for research and analytics, this vulnerability means that any malicious web page that asks an AI chatbot to process it could contain a payload that turns ChatGPT into a phishing site.
“Moving from email to the browser greatly increases the attack surface. The user no longer has to open a malicious attachment or interact with a suspicious message,” said Permiso. “Simply scrolling the page during a normal browsing activity can introduce attacker-controlled instructions into the context of the model and ultimately the response provided.”
The disclosure comes as Adversa AI writes two attack techniques code-named SymJack and TrustFall that target AI coding agents and agent coding CLIs that allow attackers to discover code execution and full machine compromise.
SymJack is a “single attack pattern [that] it allows a malicious repository to execute remote code execution by AI code assistants,” said security researcher Rony Utevsky. “The agent is tricked by a copy of a good-looking file that secretly disables its configuration, and the next reboot executes the attacker’s code with full user privileges.”
Specifically, a booby-trapped repository tricks an agent into copying a seemingly harmless file, where the destination is a link that points to the agent’s configuration, causing the attacker’s payload to be written into the configuration. On subsequent restarts, the malicious Model Context Protocol (MCP) server displays and executes arbitrary code with full user privileges.
TrustFall, on the other hand, is a one-click attack to execute code through a malicious endpoint that can send an auto-authentication configuration and expose an MCP server without explicit user consent or requiring a tool call from an agent.
To put it differently, all a threat actor needs to do to launch an attack is build a cache that includes a malicious MCP server and configuration settings that automatically authorize it to run. When a developer compiles or opens a repository in an AI code tool and presses “Enter” at the folder trust prompt, the AI code tool ends up launching code controlled by an attacker with full developer system privileges.
“When the victim compiles the repo, runs Claude, and clicks the standard ‘Yes, I trust this folder’ dialog, the MCP server starts as a native OS process with full user rights,” Adversa AI noted. “The payload starts at server startup, before any tool calls and without further notice.”
The findings are consistent with the discovery of a number of attack methods against AI models in recent months –
- The use of a novel jailbreak technique called Involuntary In-Context Learning (IICL) that “exploits the tension between in-context learning (ICL) and security alignment” to bypass GPT-5.4 security barriers
- Security traces of LLMs can be bypassed if the user tricks the model into having a multi-probability dialog. “Multi-turn testing is important for one reason: that’s where the attackers live,” Cisco said. “The real enemies are iterative. They reorganize rejection, rot jobs in turn, become human, and grow slowly. A single curve benchmark can’t see any of that.”
- A vulnerability in Anthropic Claude Code that uses a user-level configuration change in “~/.claude.json” to rewrite MCP endpoints with a strong npm package to insert an attacker between Claude Code and an OAuth-based MCP server, allowing a malicious actor to capture tokens used for downstream access.
- The use of a remote update mechanism that allows the OpenClaw skill to appear correct at installation time, but later allows an attacker to influence the agent through workstation files by instructing the user during skill setup to add certain instructions to the HEARTBEAT.md file.
- The use of encrypted text with content extracted from a legitimate newsletter or romance novel in phishing emails to confuse an AI-based email security system into flagging the message as malicious.
- A vulnerability in Claude’s Chrome browser extension called ClaudeBleed allows any extension, even those without special permissions, to hijack it and trick the AI assistant into performing active actions for him. “The flaw stems from a message in the extension code that allows any script running in the root browser to communicate with Claude’s LLM, but we cannot verify who is running the script,” LayerX said. “As a result, any extension can run a context script (which does not require special permissions) and issue commands to the Claude extension.”
- Research from Cisco has found that malicious text interpreted as images, an attack known as fast typographic injection, can be used to bypass security filters in visual language models (VLMs). “When the model fails to read the original image (small font, large blur, rotation), constrained interference can restore semantic content to the internal representation of the model without restoring visual readability to the human,” said Cisco. “This means that an attacker can create images that look like noise or distortions that are invisible to any OCR-based content filter but have instructions that are fully readable in the target VLM.”
- A set of vulnerabilities in the Microsoft Semantic Kernel (CVE-2026-25592 and CVE-2026-26030) that can transform rapid injection into remote code.
- The use of Neural Exec injection molding and Unicode right-to-left overwriting function to bypass the input and output of Apple filtering and security rules in the Apple Intelligence local model and trick LLM into producing results targeted at the attacker. The issue is addressed in iOS 26.4 and macOS 26.4.
- An indirect code injection vulnerability named WebPromptTrap affects BrowserOS, an open-source browser, that tricks users into approving an authorization step with an AI snippet generated by processing a legitimate-looking article with hidden instructions. The issue is fixed in BrowserOS version 0.32.0.
- An analysis of the agent skills ecosystem including ClawHub and skills.sh found that 13.4% of 3,984 skills (that is, 534 in total) have at least one significant security issue, including malware distribution, rapid injection attacks, and exposed secrets. About 1,467 capabilities have at least one security flaw, ranging from hard-coded API keys and insecure credential management to third-party content exposure.
- Two attacks targeting NemoClaw, NVIDIA’s open source reference stack to secure OpenClaw AI agents, to extract OpenClaw data using an automated sandbox configuration using a malicious GitHub repository or npm package.
As edge AI models continue to evolve and grow, threat actors are increasingly experimenting with malware writing technologies with added capabilities to alter their behavior in an effort to evade detection, and to make decisions from LLM to ensure that a vulnerable environment is critical or safe enough to drop next-stage payloads.
“In the short term, the proliferation of AI models at the border risks giving adversaries the ability to exploit zero-days and N-days on an unprecedented scale,” said Palo Alto Networks Unit 42. “It is also likely to enable attackers to move at a greater scale, sophistication, and speed than ever before.”
Last month, the cybersecurity company also described a proof-of-concept (PoC) agent called Zealot that uses the power of LLMs to conduct end-to-end cloud attacks with minimal human supervision by using known exploits and vulnerabilities.
This, comes from the fact that the cloud environment is “AI-Attack-Ready” by default, given that every action has an API equivalent, there are various methods of discovery such as metadata and accounting services, full of misconfigurations, and driven by data-based access.
“Current LLMs can incorporate re-identification, exploitation, increased privilege, and data processing with minimal human supervision,” noted Unit 42 researchers Yahav Festinger and Chen Doytshman. “Attacks are nothing new, but automation means that tasks that once required specialized expertise can now be programmed by an AI agent following established patterns.”


