Paper Published: Apr 26, 2026

Real Attackers Don’t Compute Gradients: Operational Threat Modeling for ML Security

This paper argues that many adversarial ML evaluations over-focus on model-level, gradient-style attacks while real attackers often bypass the whole ML system with simpler, cheaper, domain-specific tactics. Its practical lesson is not to ignore adversarial examples, but to threat-model the full pipeline: preprocessing, access, activity patterns, feature transformations, model outputs, human review, feedback loops, and attacker economics.

Adversarial MLEvasionThreat ModelingOperational SecurityML Security
8 applicable AIDEFEND defenses
Source: "Real Attackers Don’t Compute Gradients": Bridging the Gap Between Adversarial ML Research and Practice 
By Giovanni Apruzzese, Hyrum S. Anderson, Savino Dambra, David Freeman, Fabio Pierazzi, Kevin Roundy · Original article: Dec 2022

Threat Analysis

  • The model is only one component. The paper separates an ML model from the larger ML system around it: input collection, preprocessing, feature extraction, decision logic, human review, logging, and delayed feedback.
  • Real evasion often bypasses several layers. The Facebook abuse-fighting case frames defense as a funnel across automation, access, activity, and application layers.
  • The phishing case studies are deliberately practical. In one commercial detector, evasive pages used blurry logos, cropping, missing brand names, unusual backgrounds, and form-image tricks. In the MLSEC competition, teams used domain expertise and time, not only query count.
  • The paper challenges cost-blind threat models. Query count alone misses human effort, domain knowledge, implementation work, operational feedback, and whether simple defensive heuristics can defuse the attack.
  • The recommendation is precise, system-level threat modeling. Instead of vague box terminology, defenders should state the attacker’s goal, knowledge, capabilities, strategy, and cost for each ML-system component.

Applicable AIDEFEND Defenses (8)

AID-M-004
AI Threat Modeling & Risk Assessment
Very High
This is the paper's central defensive lesson. Teams should define attacker goals, knowledge, capabilities, strategies, and costs across the whole ML system, not only against the model weights or classifier API. That includes what the attacker can observe, which pipeline stages they can influence, what feedback they receive, and which non-model bypasses are cheaper than adversarial-example optimization.
AID-M-001.002
AI System Dependency Mapping
Very High
The paper repeatedly argues that a deployed ML system is more than a model. Dependency mapping makes that concrete by documenting data sources, preprocessing, feature stores, models, downstream decision services, human review paths, feedback loops, and external APIs so threat models cover the components real attackers can actually touch.
AID-H-002.004
Feature Pipeline Integrity & Transformation Audit
High
Several practical evasions exploit the gap between raw inputs and the features a model consumes. For phishing, small visual and layout changes can defeat detector assumptions without computing gradients. Auditing transformation logic, feature extraction, training-serving consistency, and post-transformation validation directly addresses that pipeline-level attack surface.
AID-M-005.002
Configuration Baseline Definition & Posture SLOs (Service Level Objectives)
High
The paper’s abuse-fighting funnel maps well to measurable posture baselines: automation checks, access controls, activity heuristics, application classifiers, review thresholds, and escalation behavior. Versioned baselines and SLOs let teams decide whether a new ML defense actually raises attacker cost across the system, not just model accuracy on a benchmark.
AID-M-003.002
Performance & Operational Metric Baselining
High
Operational baselines are needed to see when attackers are exploiting cheap, practical shifts: unusual input distributions, confidence-score changes, higher manual-review uncertainty, spikes in near-threshold samples, or detector-specific degradation. This supports the paper's point that production security depends on live system behavior, not only validation-set accuracy.
AID-D-002.001
Input / Output Distribution Drift Monitoring
Medium
The paper's real-world examples include inputs that are not elegant adversarial examples, but still change the distribution the detector sees. Drift monitoring can flag attacker adaptation, new phishing templates, shifted visual features, changed malware packaging, or abuse campaigns that move around the model instead of directly attacking it.
AID-D-005.002
Security Monitoring & Alerting for AI
Medium
Because practical attacks unfold across automation, accounts, activity, model decisions, and analyst feedback, defenders need security telemetry that joins those layers. Alerts should not be limited to model-confidence anomalies; they should correlate account behavior, query patterns, review queues, blocked actions, and post-decision harm.
AID-H-001
Adversarial Robustness Training
Partial
Adversarial robustness still matters when the system-level threat model shows model-level evasion is realistic and cost-effective. In this paper's framing, however, robustness training is one control inside a broader funnel, not the default answer to every ML security problem.

What Defenders Should Do Now

  • Draw the full ML system, not just the model. Include input sources, preprocessing, feature extraction, model calls, post-processing, policy logic, human review, feedback loops, logs, and external dependencies.
  • For each component, write the attacker’s goal, knowledge, capability, strategy, and cost. Be explicit about read versus write access, what output the attacker sees, and whether the attacker can influence training data, live input, preprocessing, or only downstream behavior.
  • Re-run your evasion analysis using practical attacker tactics: simple input transformations, domain-specific layout changes, account or access abuse, automation, activity-shape changes, and cheap feedback loops.
  • Measure both attacker and defender cost. Track not only query count, but human effort, implementation work, time-to-success, operational visibility, false-positive cost, review burden, and whether simple heuristics can defuse the attack.
  • Turn production traces and near misses into regression tests. Include uncertain samples, analyst-escalated cases, new campaign templates, and benign-looking inputs that caused the ML system to lose confidence.
  • Use adversarial robustness training where it fits, but do not let it replace system controls such as access checks, feature-pipeline validation, abuse monitoring, and human-review workflows.

Conclusion

The paper is useful because it reframes ML security as operational security around a system, not only robustness around a model. Real attackers may compute gradients when it is cheap and useful, but they often get farther by exploiting preprocessing assumptions, access paths, activity patterns, or simple domain tricks. AIDEFEND  maps this lesson to threat modeling, dependency mapping, feature-pipeline integrity, posture baselines, operational metrics, drift monitoring, and AI security alerting. The defensive aim is to make the whole ML system expensive to evade, not just one classifier harder to perturb.