Apple Intelligence Hijack: Prompt Injection Against an OS-Level Local LLM
RSAC researchers combined a Neural Exec adversarial input with Unicode right-to-left override to bypass Apple Intelligence's local LLM filters and internal guardrails. Apple has since shipped hardened iOS and macOS releases, with no reported in-the-wild exploitation. The broader lesson is that an OS-level local LLM needs input canonicalization, output validation, per-app capability scoping, and client-side isolation before app data and functions become model-accessible.
Threat Analysis
- The attack targets an OS-managed local model. Apple Intelligence exposes an on-device LLM through Foundation Models, so third-party apps can use a system-level model without controlling weights or runtime.
- The bypass combined model steering and filter evasion. RSAC researchers used Neural Exec-style adversarial input to push the model toward an attacker-chosen task, then used Unicode RLO to hide offensive text from input and output filters.
- The measured success rate was material. The researchers say it succeeded on 76% of 100 random prompts before Apple's hardening.
- The practical risk comes from app context. A compromised LLM-enabled app could expose or manipulate data and functions already available to that app.
- The fix landed at the platform layer. Apple hardened the affected systems in iOS 26.4 and macOS 26.4; users should upgrade, and app teams should still reduce what local model calls can see and do.
![RLO rendering example showing the underlying string invoice_2026_[U+202E]fdp.exe and the visually misleading result that appears more like invoice_2026_exe.pdf](../../../assets/aia/rlo-rendering-example.png)
U+202E changes display order. The string still contains .exe, but can appear as .pdf.Applicable AIDEFEND Defenses (7)
What Defenders Should Do Now
- Upgrade managed Apple devices to iOS 26.4 and macOS 26.4 or later, and flag older versions as exposed to the pre-hardening behavior described by RSAC.
- Inventory apps that use Apple Intelligence or the Foundation Models framework, then classify what data each app can pass into local LLM calls.
- Normalize and inspect model inputs before inference. Treat bidirectional Unicode controls, hidden directionality, nested encodings, and adversarial gibberish as high-risk signals that should fail closed or require additional review.
- Validate outputs after rendering-normalization, not only as raw strings. Block responses that become unsafe after Unicode rendering or that attempt to drive app actions outside the user's original intent.
- Limit local model calls to the minimum app capabilities needed for the task. Sensitive data and mutating actions should require explicit user or policy approval before being exposed to model context or model-directed workflows.
1 additional consideration
Endpoint-level permission transparency for local LLM apps
Conclusion
This case is a useful warning about on-device AI becoming a platform security boundary. Local execution reduces some cloud exposure, but it also puts the model close to app data, user files, and OS-mediated functions. AIDEFEND maps well to input canonicalization, output validation, capability scoping, client-side isolation, and data-use gates; the operational goal is to make a compromised local LLM boringly constrained, not broadly useful to the attacker.