Paper Published: May 18, 2026

System-Level Agent Defenses: Why Indirect Prompt Injection Needs Plan and Policy Boundaries

This paper argues that indirect prompt injection is not only a prompt-filtering problem. General-purpose AI agents need explicit plan, policy, approval, execution, and feedback boundaries so untrusted emails, webpages, or tool outputs cannot silently rewrite what the agent is allowed to do. AIDEFEND already covers much of that system architecture through authority envelopes, dynamic capability scoping, policy enforcement, constrained model judges, data-flow sink enforcement, HITL control points, and agentic security benchmarking.

Indirect Prompt InjectionSystem-Level DefenseTool AuthorizationAgentic AI

9 applicable AIDEFEND defenses

Source: Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

Authors: Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward Suh

Original article: Mar 31, 2026

Threat Analysis

The attack starts when untrusted data becomes guidance. An attacker hides instructions in an email, webpage, document, or tool output. The agent reads it during a legitimate task, and the malicious text tries to steer the next plan, policy update, or tool call.
Replanning helps utility but opens a boundary. Agents must adapt to deprecated APIs, test failures, or new evidence. The risk is that attacker-controlled feedback can influence the plan and policy, not just the model's next sentence.
LLM security judges need a constrained role. The paper allows model-based judgment for context-dependent cases, but only over narrow structured artifacts such as typed traces or proposed plan/policy diffs.
Human review has to be designed. Ambiguous cases, such as urgent email criteria or risky package installation, need explicit checkpoints, evidence, and authority rather than vague fallback language.
Static benchmarks can overstate safety. Real tests need long tasks, replanning, policy updates, parameter-level attacks, and adaptive payloads that evolve against the defense.

Applicable AIDEFEND Defenses (9)

AID-M-009.002

Authority Envelope & Action Risk Classification

Very High

This is the conceptual anchor for the paper's plan/policy split. Before an agent starts, the system should define the hard boundary around tools, data classes, environments, budgets, delegation depth, and high-risk action types. That makes the paper's core question machine-checkable: is the proposed plan or policy update still inside the approved envelope?

AID-H-019.004

Intent-Based Dynamic Capability Scoping

Very High

The paper's first position says useful agents need dynamic replanning and policy updates, but that flexibility must not let untrusted content expand the agent's authority. Intent-based dynamic capability scoping directly covers this by deriving a narrow per-request or per-session tool scope from the original task and enforcing it even if the model later tries to call something broader.

AID-H-019.002

Policy-Based Access Control

Very High

The proposed architecture depends on a policy enforcer that approves or blocks concrete actions against the current policy. Policy-based access control is the AIDEFEND technique that externalizes those decisions into checkable policy logic, so the agent runtime does not rely on natural-language intention alone.

AID-H-019.006

Continuous Authorization Verification (Anti-TOCTOU)

High

Dynamic replanning creates a time-of-check to time-of-use risk: the action that was safe under the first plan may no longer be safe after the environment changes or the policy is updated. Continuous authorization verification maps directly to the paper's need to re-check sensitive steps after context, plan, policy, or delegation state changes.

AID-H-018.007

Dual-LLM Isolation Pattern

High

The paper repeatedly warns against exposing a model-based security decision maker to raw attacker-controlled environment text. Dual-LLM isolation already captures that shape: a quarantined component reads untrusted content and produces structured output, while the privileged component receives only validated artifacts and retains tool authority.

AID-H-019.005

Value-Level Capability Metadata & Data Flow Sink Enforcement

High

The paper describes policies that govern allowed information flows, including lattice-style information-flow control. Value-level capability metadata and sink enforcement operationalize that idea by tagging runtime values with provenance and sensitivity, then blocking unsafe movement into external HTTP, email, database writes, payment flows, or other sensitive sinks.

AID-H-019.003

High-Impact Two-Channel Validator

High

The paper's plan/policy approver is a security-critical checkpoint. For high-impact actions such as code execution, infrastructure changes, payments, or memory writes, a second independent validation channel can review goal alignment, evidence, policy compliance, and blast radius before execution. This gives the architecture a concrete approval layer instead of trusting the executor's next step.

AID-M-006.001

HITL Checkpoint Design & Documentation

Medium

The paper's third position says ambiguity is unavoidable and must be designed into the system. HITL checkpoint design and documentation covers that work at the subtechnique level: define the triggers, operator roles, default-deny timeouts, and SOPs before high-impact or ambiguous agent decisions reach production.

AID-M-008

Automated Agentic Security Benchmarking

Medium

The paper's benchmark critique maps strongly to this technique. Defenders should not rely only on short, static prompt-injection tests; agentic security benchmarking should include multi-step tasks, dynamic environments, replanning, policy updates, parameter-level attacks, and adaptive payloads so release gates measure the same failure modes the paper describes.

What Defenders Should Do Now

Make the agent's plan and policy explicit artifacts. For each agent workflow, document what the plan says the agent may do, what the policy allows, who can approve changes, and which runtime component enforces the decision.
Define an authority envelope before execution: allowed tools, data classes, environments, side effects, budgets, and delegation depth. Turn that envelope into signed per-session capability scope rather than a broad static tool allowlist.
Route untrusted environmental feedback through a quarantine path. Emails, webpages, retrieved documents, and tool outputs should be transformed into typed traces, summaries, or plan/policy diffs before a privileged model or validator sees them.
Re-check authorization at every sensitive step, especially after replanning. A tool call that reads files, writes code, installs packages, sends money, exports data, updates memory, or changes infrastructure should be evaluated against the current task, policy, delegation chain, and risk class.
Add human checkpoints for ambiguous or high-impact decisions. The UI should show the proposed action, evidence, policy change, data involved, and rollback path so a reviewer can make a real decision instead of rubber-stamping a vague approval prompt.
Upgrade agent security tests. Add long-running tasks, runtime failures, parameter-level attacks, adaptive prompt-injection payloads, and policy-update attempts to regression tests, then fail releases when those scenarios bypass the system boundaries above.

Conclusion

The paper is useful because it reframes indirect prompt injection as a system architecture problem. The model is still important, but the durable defense lives around it: authority boundaries, policy engines, constrained model judges, data-flow enforcement, human checkpoints, and benchmarks that exercise dynamic agent behavior. AIDEFEND already has concrete techniques and subtechniques for most of those pieces, which makes the framework useful as a checklist for turning the paper's architecture into deployable controls.