Blog Published: May 26, 2026

ChromaToast: ChromaDB Pre-Auth RCE Through Malicious Hugging Face Model Loading

HiddenLayer disclosed CVE-2026-45829, a ChromaDB Python FastAPI server flaw where an unauthenticated collection-creation request can download and execute attacker-controlled Hugging Face model code before authentication runs.

The lesson is direct: model loading is code execution. Vector databases need early authentication, restricted model provenance, unsafe-loader blocking, and isolated first-use execution.

Remote Code ExecutionMalicious ModelsRuntime IsolationVector DatabaseRAG Security
6 applicable AIDEFEND defenses
Source: ChromaToast Served Pre-Auth 
By Esteban Tonglet (HiddenLayer) · Original article: May 18, 2026

Threat Analysis

  • The endpoint appears protected. ChromaDB marks collection creation as authenticated, but an unauthenticated request can still carry embedding-function configuration that points to an attacker-controlled Hugging Face model.
  • The dangerous flag is trust_remote_code. When set through request-controlled kwargs, it tells the model loader to fetch and run Python code from that model repository.
  • The ordering is the bug. The Python server instantiates the embedding function before the authentication check, so the model is downloaded and executed before the request is rejected.
  • The failed API call can still compromise the server. The response may look like an error, but the attacker-controlled code has already run inside the ChromaDB process.
  • The blast radius is the server process. Environment variables, API keys, mounted secrets, local data, and reachable internal services can all become exposed.

Applicable AIDEFEND Defenses (6)

AID-H-004.002
Service & API Authentication
Very High
This is the clearest break point. ChromaDB did have an authentication check, but it ran after configuration loading and model execution. Service and API authentication must run before any request-controlled model configuration is parsed, fetched, instantiated, or executed. If the requester is not authorized to create a collection, the server should never touch the referenced model.
AID-H-003.006
Model SBOM & Provenance Attestation
Very High
The exploit depends on a public model name and client-controlled loader flags reaching runtime without a provenance decision. A model SBOM and attestation should record the approved model bytes, digest, source, format, tokenizer, loader commit, and flags such as trust_remote_code. The ChromaDB server should load only models whose provenance and policy state were approved before the request arrived.
AID-H-026.001
Dangerous Construct Detection & Blocking
Very High
The high-risk construct here is not a shell command typed into the API. It is a loader path that lets trust_remote_code: true and attacker-controlled kwargs flow into AutoModel.from_pretrained(). Detection should fail closed on unsafe model-loading flags, untrusted remote-code execution, unsafe serialization paths, shell callbacks, and other code-execution behavior in model artifacts before the model is instantiated.
AID-H-003.002
CI/CD Release Gating, Model Artifact Signing & Secure Distribution
High
Production vector databases should not fetch arbitrary models directly from public Hugging Face namespaces because a client supplied a name. A release gate should promote only scanned, signed, digest-pinned, internally mirrored model artifacts into the runtime allowlist. Collection creation can then choose from approved model identifiers rather than turning each request into a live supply-chain decision.
AID-I-001.002
MicroVM & Low-Level Sandboxing
High
If a system must evaluate a new or third-party embedding model, the first load should happen in a stronger isolation boundary than the long-running vector database process. A microVM or low-level sandbox with no production secrets and no shared host filesystem reduces the chance that unexpected model code can reach the ChromaDB host, stored vectors, mounted credentials, or internal services.
AID-I-001.004
Sandbox Network Egress Restrictions
High
HiddenLayer's exploit path gives the attacker code execution inside the server process. Default-deny egress for model-loading sandboxes and vector database workloads can block reverse shells, credential exfiltration, and follow-on downloads even if malicious model code starts running.

What Defenders Should Do Now

  • Inventory every ChromaDB deployment, especially Python FastAPI servers with network-reachable ports. Record version, deployment path, exposed interface, authentication layer, and whether the server can reach public Hugging Face.
  • Prefer the Rust-based deployment path or a patched release when available. If the Python FastAPI server is still in use, restrict the ChromaDB port to trusted clients only and place it behind network policy, service authentication, and an API gateway or private service path.
  • Move authentication before any collection configuration loading, embedding-function construction, or model download in custom forks or compensating controls. A rejected request should not be able to instantiate an embedding model as a side effect.
  • Block request-controlled model names, kwargs, and trust_remote_code from production collection-creation paths. If users need configurable embeddings, offer an allowlist of approved internal model identifiers instead of raw public model references.
  • Route approved models through an internal registry or mirror with scanning, signatures, digest pinning, and loader-policy checks. Treat a new embedding model the way you would treat a new executable dependency.
  • Run first-use model loading in a short-lived sandbox with no production secrets, no shared home directory, and default-deny outbound network access.

Conclusion

ChromaToast is a clean example of how AI infrastructure can turn configuration into execution. The vulnerable request is not asking the server to run a command; it is asking the server to create a collection with a chosen embedding model. But if that model reference can pull remote code, and the server does it before authentication, the collection API becomes a pre-auth RCE path. AIDEFEND  maps the practical defense to early service authentication, model provenance, unsafe loader blocking, secure model distribution, runtime isolation, and egress control.