Article Published: Apr 25, 2026

Web-Based IDPI in the Wild: When Webpages Become Agent Prompt Delivery

Unit 42 reports real-world web-based indirect prompt injection (IDPI) across malicious and public webpages, including AI ad review evasion, SEO poisoning, unauthorized transactions, data destruction, denial of service, sensitive-data leakage, and system prompt leakage. The defensive lesson is that web fetch, HTML parsing, OCR, metadata extraction, model context assembly, tool authority, and outbound sinks have to be governed as one agent security boundary.

Indirect Prompt InjectionTool AuthorizationAgentic AIWeb Security

9 applicable AIDEFEND defenses

Source: Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild

Authors: Beliz Kaleli, Shehroze Farooqi, Oleksii Starov, Nabeel Mohamed

Original article: Mar 3, 2026

Threat Analysis

The web is becoming a prompt delivery surface. Attackers can hide instructions in pages that browsers, search tools, ad reviewers, crawlers, copilots, or autonomous agents later summarize or analyze.
The observed intents are no longer just novelty prompts. Unit 42's telemetry includes AI ad review evasion, SEO manipulation for phishing, forced payment flows, sensitive information leakage, system prompt leakage, data destruction, and denial of service.
Payload delivery is built for web parsers. The article documents visible plaintext, hidden CSS, zero-sized text, off-screen positioning, HTML attribute cloaking, SVG or CDATA wrapping, JavaScript runtime assembly, canvas/OCR paths, and URL-fragment tricks.
The bypass layer is mostly semantic. Samples use authority-override language, JSON or syntax injection, multilingual instructions, payload splitting, homoglyphs, Unicode bidirectional override, and nested encoding.
The risk scales with the agent's authority. The same hidden webpage instruction is low impact for a read-only summarizer, but dangerous for an agent that can approve ads, make purchases, write databases, disclose internal data, or execute commands.

Applicable AIDEFEND Defenses (9)

AID-H-020.002

Secure HTML Rendering & Content Demotion

Very High

This is the most direct web-specific control. Before webpage content reaches an LLM, the fetch layer should strip or neutralize scripts, styles, hidden DOM regions, off-screen text, attributes, SVG payloads, canvas-derived text, and other active or low-visibility content, then pass the model a demoted data view rather than raw web instructions.

AID-D-001.001

Per-Prompt Content & Obfuscation Analysis

Very High

Unit 42's taxonomy is largely a catalog of obfuscation and intent signals: Base64, HTML entities, URL encoding, nested encoding, invisible characters, homoglyphs, bidirectional override, payload splitting, and authority-override language. Per-prompt screening should decode multiple views of the same page text and score both malicious intent and concealment technique before context assembly.

AID-H-019.004

Intent-Based Dynamic Capability Scoping

Very High

Even when a hidden webpage prompt is missed, it should not be able to expand an agent's authority. A task that begins as page summarization or ad review should not suddenly gain permission to purchase products, send payments, approve scam ads, delete data, or run shell commands because the page asked for it.

AID-H-020.001

URL Normalization & Allowlist Filtering

High

Unit 42's samples include URL-fragment tricks, payment links, and rendered links that can become outbound requests. URL normalization and allowlist filtering make every fetchable destination explicit before an agent follows, previews, or sends data to it.

AID-H-019.005

Value-Level Capability Metadata & Data Flow Sink Enforcement

High

The high-impact examples rely on data or actions flowing to the wrong sink: payment URLs, external web destinations, backend databases, or prompt-leak outputs. Runtime values derived from web content should carry untrusted provenance, and high-sensitivity values should be blocked from outbound HTTP, payment, database-write, or disclosure sinks unless policy explicitly permits the transfer.

AID-H-002.002

Inference-Time Prompt & Input Validation

High

Webpage text becomes dangerous at the moment it is assembled into an inference request. The system should validate the final prompt package, label fetched web content as untrusted data, reject instruction-shaped content from data channels, and fail closed when a page tries to rewrite developer instructions or role hierarchy.

AID-H-018.007

Dual-LLM Isolation Pattern

High

The article's core problem is that models struggle to separate instructions from data in one context stream. A quarantined model can read raw webpages and produce a typed summary, while a privileged model receives only the structured result and holds tool authority. That split keeps hidden webpage instructions away from the component that can act.

AID-D-003.003

Agentic Tool Use & Action Policy Monitoring

Medium

Unauthorized transactions, data deletion, DoS commands, and ad-approval bypasses become visible when the agent tries to act. Tool-use monitoring should compare requested actions with the original user intent, page trust level, tool risk class, and recent prompt-injection signals before execution.

AID-D-003.002

Sensitive Information & Data Leakage Detection

Medium

Some samples aim to leak sensitive information or system prompts through normal model output. Output scanning should detect secrets, PII, internal instructions, and unusually structured disclosure text before the response is displayed, logged, sent to another tool, or embedded into a link or request.

What Defenders Should Do Now

Inventory every product path where an agent, crawler, browser assistant, ad reviewer, security scanner, or copilot ingests public web content and then uses an LLM to decide, summarize, rank, approve, or act.
Normalize and demote webpage content before inference. Strip scripts, CSS-hidden regions, off-screen text, data attributes, SVG text, URL fragments, decoded runtime inserts, and OCR-derived content into separately labeled fields instead of one raw prompt stream.
Add IDPI-specific detection before context assembly: decode Base64, URL encoding, HTML entities, nested encodings, Unicode controls, homoglyphs, payload splits, multilingual instructions, and authority-override phrases.
Bind every web-ingestion session to a narrow capability envelope. A summarizer should summarize; an ad reviewer should classify; neither should be able to purchase, donate, write databases, run commands, or disclose internal data because a page contains instructions.
Treat external URLs, payment links, database writes, email, file exports, and rendered Markdown links as sinks. Enforce data-flow policy at those sinks, not only at the model prompt.
Turn Unit 42's taxonomy into regression tests. Seed harmless pages with hidden CSS, attribute-cloaked text, encoded payloads, social-engineering instructions, and fake payment or deletion requests, then keep testing until the chain fails at multiple layers.

Conclusion

This report matters because it shows web-based IDPI moving from research demos into broad, messy web telemetry. The attack is a delivery problem, a parsing problem, a context-boundary problem, and an authority problem. AIDEFEND maps strongly to secure HTML demotion, obfuscation analysis, inference-time validation, capability scoping, sink enforcement, isolation, and action monitoring. The practical goal is to make public web content useful to agents without letting the web become a control channel for agents.