Blog Published: Jun 6, 2026

ChatGPhish: When a Webpage Makes ChatGPT Render a Phishing Interface

Permiso's ChatGPhish shows that a normal webpage can carry instructions into ChatGPT page summarization and make the assistant render attacker-controlled Markdown as trusted-looking UI. The demonstrated payloads include phishing links, fake account alerts, QR codes, and remote images that leak request telemetry when rendered. This is narrower than generic web indirect prompt injection (IDPI): the dangerous sink is the assistant's own output renderer, where links and images inherit user trust.

Indirect Prompt InjectionSocial EngineeringSink EnforcementAI CopilotsWeb Security
6 applicable AIDEFEND defenses
Source: ChatGPhish: The Page Is the Payload 
Author: Andi Ahmeti (Permiso P0 Labs)
Original article: May 29, 2026

Threat Analysis

  • The page carries the instruction. An attacker adds Markdown-oriented prompt text to a webpage, README, or HTML page that a user later asks ChatGPT to summarize.
  • The model turns content into UI. ChatGPT produces an ordinary summary, then follows the injected formatting instruction and appends a fake security alert, additional resource link, image, or QR code.
  • The renderer completes the lure. Links become clickable, images are fetched, and the result appears inside ChatGPT's trusted interface rather than in the attacker's page.
  • Remote media adds tracking. Permiso showed image and QR-code variants that can reveal IP address, User-Agent, Referer where available, and timing tied to the rendered answer.
  • The boundary failure is provenance. The user sees an assistant response, but part of that response is attacker-controlled web content that survived summarization into live UI.

Applicable AIDEFEND Defenses (6)

AID-H-020.002
Secure HTML Rendering & Content Demotion
Very High
This is the most direct control for the ingestion side. Webpage content should be stripped, demoted, and represented as untrusted plain data before it reaches the model. Markdown links, remote images, QR-code embeds, hidden instructions, and style-like UI language should not survive as executable or renderable response elements.
AID-H-019.005
Value-Level Capability Metadata & Data Flow Sink Enforcement
Very High
ChatGPhish is fundamentally a sink problem: untrusted webpage-derived values flow into clickable links, remote image fetches, QR codes, and phishing UI. Runtime values from web content should carry provenance, and the renderer should block or downgrade them before they become external HTTP, media, or user-action sinks.
AID-H-020.001
URL Normalization & Allowlist Filtering
High
The attack relies on live URLs, shorteners, redirects, images, and QR-code destinations. URL normalization and allowlist filtering should canonicalize every destination, resolve redirects, block private or unexpected ranges, and require policy approval before any background fetch, preview, or rendered link activation.
AID-H-002.002
Inference-Time Prompt & Input Validation
High
The malicious page becomes dangerous at prompt assembly time. The summarization request should label fetched web content as untrusted data, reject instruction-shaped formatting requirements from that data channel, and fail closed when a page tries to dictate response structure or impersonate account-security UI.
AID-D-001.001
Per-Prompt Content & Obfuscation Analysis
Medium
A detector can catch many ChatGPhish payloads before rendering: mandatory format overrides, account-alert language, Markdown image-link combinations, short URLs, QR-code embeds, and instructions that ask the assistant to append phishing calls to action. It is supporting, not sufficient by itself.
AID-H-018.007
Dual-LLM Isolation Pattern
Medium
A quarantined model can read raw webpages and emit a typed summary where links, images, QR codes, and page-provided UI text are separated fields. A privileged component or renderer then receives only validated structured data, keeping attacker webpage instructions away from the trusted response surface.

What Defenders Should Do Now

  • Inventory every summarize-page, browser assistant, web reader, RAG preview, and copilot path that ingests third-party HTML or Markdown and then renders links or media in the assistant response.
  • Demote webpage-derived Markdown before inference. Treat page links, image tags, QR-code references, and formatting instructions as untrusted fields, not as response instructions.
  • Disable automatic remote media fetches from untrusted model output, or route them through a safe proxy with canonical URL checks, redirect controls, and no user-identifying headers.
  • Require provenance and step-up confirmation for external links, shortened URLs, QR codes, and account-warning style UI generated from summarized pages.
  • Add regression tests with fake security alerts, additional-resource links, QR codes, tracking pixels, shorteners, and hidden formatting requirements embedded in ordinary webpages.
  • Log renderer decisions: which URLs were suppressed, which media were fetched, which links became active, and which response spans came from third-party web content.

1 additional consideration

Assistant-rendered content provenance

The remaining product-design issue is not only whether the model obeys the page. Users need visible provenance when links, images, QR codes, or warning-style elements in an assistant response originate from third-party web content.
Recommendation: Beyond the techniques mapped above, product teams should also make assistant UI distinguish model-authored text from external webpage-derived links, images, QR codes, and account-style warnings, with step-up confirmation before any external destination becomes active.

Conclusion

ChatGPhish is valuable because it moves the IDPI discussion from model obedience into the output layer. The model is influenced by untrusted web content, but the higher-risk moment is when the assistant renderer turns that content into links, images, QR codes, and trusted-looking alerts. AIDEFEND  maps the defense to HTML demotion, value provenance, sink enforcement, URL gating, inference-time validation, detection, and isolation between webpage reading and trusted UI rendering.