ChromaToast: ChromaDB Pre-Auth RCE Through Malicious Hugging Face Model Loading
HiddenLayer disclosed CVE-2026-45829, a ChromaDB Python FastAPI server flaw where an unauthenticated collection-creation request can download and execute attacker-controlled Hugging Face model code before authentication runs.
The lesson is direct: model loading is code execution. Vector databases need early authentication, restricted model provenance, unsafe-loader blocking, and isolated first-use execution.
Threat Analysis
- The endpoint appears protected. ChromaDB marks collection creation as authenticated, but an unauthenticated request can still carry embedding-function configuration that points to an attacker-controlled Hugging Face model.
- The dangerous flag is
trust_remote_code. When set through request-controlledkwargs, it tells the model loader to fetch and run Python code from that model repository. - The ordering is the bug. The Python server instantiates the embedding function before the authentication check, so the model is downloaded and executed before the request is rejected.
- The failed API call can still compromise the server. The response may look like an error, but the attacker-controlled code has already run inside the ChromaDB process.
- The blast radius is the server process. Environment variables, API keys, mounted secrets, local data, and reachable internal services can all become exposed.
Applicable AIDEFEND Defenses (6)
trust_remote_code. The ChromaDB server should load only models whose provenance and policy state were approved before the request arrived.trust_remote_code: true and attacker-controlled kwargs flow into AutoModel.from_pretrained(). Detection should fail closed on unsafe model-loading flags, untrusted remote-code execution, unsafe serialization paths, shell callbacks, and other code-execution behavior in model artifacts before the model is instantiated.What Defenders Should Do Now
- Inventory every ChromaDB deployment, especially Python FastAPI servers with network-reachable ports. Record version, deployment path, exposed interface, authentication layer, and whether the server can reach public Hugging Face.
- Prefer the Rust-based deployment path or a patched release when available. If the Python FastAPI server is still in use, restrict the ChromaDB port to trusted clients only and place it behind network policy, service authentication, and an API gateway or private service path.
- Move authentication before any collection configuration loading, embedding-function construction, or model download in custom forks or compensating controls. A rejected request should not be able to instantiate an embedding model as a side effect.
- Block request-controlled model names,
kwargs, andtrust_remote_codefrom production collection-creation paths. If users need configurable embeddings, offer an allowlist of approved internal model identifiers instead of raw public model references. - Route approved models through an internal registry or mirror with scanning, signatures, digest pinning, and loader-policy checks. Treat a new embedding model the way you would treat a new executable dependency.
- Run first-use model loading in a short-lived sandbox with no production secrets, no shared home directory, and default-deny outbound network access.
Conclusion
ChromaToast is a clean example of how AI infrastructure can turn configuration into execution. The vulnerable request is not asking the server to run a command; it is asking the server to create a collection with a chosen embedding model. But if that model reference can pull remote code, and the server does it before authentication, the collection API becomes a pre-auth RCE path. AIDEFEND maps the practical defense to early service authentication, model provenance, unsafe loader blocking, secure model distribution, runtime isolation, and egress control.