Executive summary
A newly disclosed flaw in SGLang means a malicious GGUF model file can become an execution path, not just a poisoned model artifact. CERT/CC says CVE-2026-5760 lets attackers achieve remote code execution when a crafted model is loaded and the /v1/rerank endpoint renders an unsafe Jinja2 chat_template. For defenders, the bigger lesson is that model ingestion now belongs in the same risk bucket as untrusted plugins, packages, and templates. If your AI serving stack can pull or load outside models, this is both a patching and incident response problem.
What happened?
According to CERT/CC, the vulnerable path sits in SGLang's reranking feature. An attacker can prepare a malicious GGUF model file with a crafted tokenizer.chat_template field. When that model is loaded and a request hits /v1/rerank, SGLang renders the template with an unsandboxed jinja2.Environment(), allowing attacker-controlled Python code to execute in the context of the SGLang service.
This matters because the exploit chain does not begin with a classic web payload alone. It begins with a model artifact that looks operationally normal in many AI workflows. In other words, a model download can become a code execution event if teams treat model provenance as an MLOps hygiene issue instead of a security boundary.
Why this is a real defender problem
The immediate impact is remote code execution on infrastructure that may already have access to sensitive prompts, application secrets, internal datasets, and adjacent GPU workloads. CERT/CC explicitly warns that successful exploitation could lead to host compromise, lateral movement, command-and-control, data theft, or denial of service.
The more strategic problem is trust collapse around model ingestion. Many teams now pull community models, test quantized variants quickly, and move promising artifacts into internal serving environments. If those environments assume a model file is "just data," then controls around review, access control, and isolation will lag behind the real risk.
Affected behavior and exposure conditions
Public details currently center on:
- SGLang reranking via
/v1/rerank - malicious GGUF models containing crafted
tokenizer.chat_templatemetadata - unsafe Jinja2 rendering without sandboxing
Deployments are at highest risk when they:
- load models from untrusted or weakly vetted sources
- expose affected inference endpoints to untrusted networks
- run model-serving infrastructure with broad filesystem, network, or secret access
- allow rapid model swaps without a security review gate
At the time of disclosure, CERT/CC said no response or patch had been obtained from the project maintainers during coordination. That raises the priority of compensating controls right now.
What defenders should do today
1. Treat model files as executable-risk artifacts
Do not treat GGUF or similar model formats as passive content. For this issue, model metadata is part of the attack surface. Any workflow that pulls models from public repositories or third parties should be reviewed immediately.
2. Restrict or disable vulnerable reranking paths
If SGLang is deployed in production, identify whether /v1/rerank is enabled and whether cross-encoder reranking workflows depend on externally sourced models. If the feature is not essential, disable or isolate it until safe rendering behavior is confirmed.
3. Tighten model provenance and review
Move model acquisition behind an approval flow. Require vetted sources, immutable hashes, and a manual review step before new model files reach shared inference infrastructure. This is the same logic defenders already apply to containers, packages, and CI/CD dependencies.
4. Hunt for post-load execution clues
Review SGLang service logs, process creation telemetry, shell history, outbound connections, and artifact download traces after any recent model onboarding. Pay special attention to systems that pulled new GGUF models shortly before suspicious execution or network activity.
Detection and triage ideas
Service and application review
- requests to
/v1/rerankfrom unusual sources - recent loading of newly downloaded or unapproved GGUF models
- template-rendering or Jinja-related errors around reranking requests
- sudden model changes followed by service instability or unexpected process activity
Host telemetry
- Python processes spawning shells or command interpreters
- unusual outbound connections from inference servers
- new persistence mechanisms or unexpected scheduled tasks
- access to secret stores, credentials, or internal data paths shortly after model load events
Recommended response plan
Immediate, today
- identify every environment running SGLang
- determine whether
/v1/rerankis exposed or used - block untrusted model ingestion into affected environments
- isolate high-value inference hosts from unnecessary east-west access
- perform a focused compromise assessment if any new GGUF models were loaded recently
Next 24 to 72 hours
- implement a model approval pipeline with provenance checks
- add monitoring for risky process execution and network egress from AI serving nodes
- review what secrets, datasets, and internal services are reachable from inference infrastructure
- prepare a permanent fix path once upstream remediation is confirmed
Strategic take
This is the kind of flaw that forces defenders to update their mental model of AI infrastructure. The interesting part is not only that SGLang has an RCE. It is that a model file can carry the trigger for server-side code execution in a production inference path. That makes model onboarding a software supply trust problem, not just a performance or quality problem.
Security teams should use this disclosure to ask a blunt question: who is allowed to introduce new model artifacts into environments that can reach sensitive data or production services? If the answer is unclear, CVE-2026-5760 is a warning shot.
What is CVE-2026-5760?
It is an SGLang vulnerability that can lead to remote code execution when a malicious GGUF model file is loaded and the vulnerable reranking path renders attacker-controlled template content.
Does this require a malicious request only?
No. The disclosed path depends on both a crafted model artifact and use of the vulnerable reranking behavior, which is why model provenance matters so much here.
Is there a confirmed fix?
CERT/CC said no patch or maintainer response had been obtained during coordination at disclosure time, so compensating controls should come first.
What should defenders do first?
Audit SGLang usage, restrict /v1/rerank, stop loading untrusted models into affected environments, and review recent model onboarding for signs of compromise.



