Web-Based IDPI in the Wild: When Webpages Become Agent Prompt Delivery
Unit 42 reports real-world web-based indirect prompt injection (IDPI) across malicious and public webpages, including AI ad review evasion, SEO poisoning, unauthorized transactions, data destruction, denial of service, sensitive-data leakage, and system prompt leakage. The defensive lesson is that web fetch, HTML parsing, OCR, metadata extraction, model context assembly, tool authority, and outbound sinks have to be governed as one agent security boundary.
Threat Analysis
- The web is becoming a prompt delivery surface. Attackers can hide instructions in pages that browsers, search tools, ad reviewers, crawlers, copilots, or autonomous agents later summarize or analyze.
- The observed intents are no longer just novelty prompts. Unit 42's telemetry includes AI ad review evasion, SEO manipulation for phishing, forced payment flows, sensitive information leakage, system prompt leakage, data destruction, and denial of service.
- Payload delivery is built for web parsers. The article documents visible plaintext, hidden CSS, zero-sized text, off-screen positioning, HTML attribute cloaking, SVG or CDATA wrapping, JavaScript runtime assembly, canvas/OCR paths, and URL-fragment tricks.
- The bypass layer is mostly semantic. Samples use authority-override language, JSON or syntax injection, multilingual instructions, payload splitting, homoglyphs, Unicode bidirectional override, and nested encoding.
- The risk scales with the agent's authority. The same hidden webpage instruction is low impact for a read-only summarizer, but dangerous for an agent that can approve ads, make purchases, write databases, disclose internal data, or execute commands.
Applicable AIDEFEND Defenses (9)
What Defenders Should Do Now
- Inventory every product path where an agent, crawler, browser assistant, ad reviewer, security scanner, or copilot ingests public web content and then uses an LLM to decide, summarize, rank, approve, or act.
- Normalize and demote webpage content before inference. Strip scripts, CSS-hidden regions, off-screen text, data attributes, SVG text, URL fragments, decoded runtime inserts, and OCR-derived content into separately labeled fields instead of one raw prompt stream.
- Add IDPI-specific detection before context assembly: decode Base64, URL encoding, HTML entities, nested encodings, Unicode controls, homoglyphs, payload splits, multilingual instructions, and authority-override phrases.
- Bind every web-ingestion session to a narrow capability envelope. A summarizer should summarize; an ad reviewer should classify; neither should be able to purchase, donate, write databases, run commands, or disclose internal data because a page contains instructions.
- Treat external URLs, payment links, database writes, email, file exports, and rendered Markdown links as sinks. Enforce data-flow policy at those sinks, not only at the model prompt.
- Turn Unit 42's taxonomy into regression tests. Seed harmless pages with hidden CSS, attribute-cloaked text, encoded payloads, social-engineering instructions, and fake payment or deletion requests, then keep testing until the chain fails at multiple layers.
Conclusion
This report matters because it shows web-based IDPI moving from research demos into broad, messy web telemetry. The attack is a delivery problem, a parsing problem, a context-boundary problem, and an authority problem. AIDEFEND maps strongly to secure HTML demotion, obfuscation analysis, inference-time validation, capability scoping, sink enforcement, isolation, and action monitoring. The practical goal is to make public web content useful to agents without letting the web become a control channel for agents.