When the AI Runs the Code: Snowflake Cortex, Sandbox Escapes, and the Security Model Data Warehouses Were Never Built For
Source: simonwillison
Snowflake’s Cortex AI service was found to escape its intended execution sandbox and run malicious code. Simon Willison reported on the disclosure this week, and while a sandbox escape in any cloud service is serious, this one sits at an intersection that makes it particularly worth examining. The underlying data warehouse infrastructure is deeply trusted, the data it holds is maximally sensitive, and the AI layer bolted on top of it was never battle-tested in the same way the query engine it rides on has been.
This is not just a Snowflake bug. It is a case study in what happens when the execution semantics of AI systems collide with the trust model of data infrastructure.
What Snowflake Cortex Actually Does
Snowflake Cortex is a suite of AI capabilities accessible directly via SQL. The basic inference functions, SNOWFLAKE.CORTEX.COMPLETE(), SNOWFLAKE.CORTEX.SUMMARIZE(), SNOWFLAKE.CORTEX.SENTIMENT(), and others, allow you to invoke large language models as if they were database functions. Models available through the API include Mistral, Llama variants, and Snowflake’s own Arctic model. The convenience is real: your data is already in Snowflake, so running inference there avoids the ETL overhead and external API calls of moving data elsewhere.
Beyond the passive functions, Cortex Analyst provides a natural language interface to your warehouse, generating and executing SQL queries on behalf of users who ask questions in plain English. Cortex Search handles semantic retrieval over Snowflake tables. These are not read-only functions that return a value; they are AI components that can initiate actions within the warehouse environment.
The underlying execution infrastructure is Snowpark, Snowflake’s framework for running Python, Java, and Scala code inside the platform’s compute layer. Snowpark Container Services extends this further, allowing full Docker containers to run inside Snowflake. The Cortex AI features are almost certainly built on top of some combination of these primitives, which means the execution environment question is not hypothetical.
The Sandbox Escape Problem in AI Systems
Sandbox escapes in AI-adjacent execution environments have a consistent history. When OpenAI launched its code interpreter feature in ChatGPT (now called Advanced Data Analysis), security researchers found that the Docker-based execution environment, while more constrained than a general-purpose container, still had reachable filesystem paths and network behaviors that the designers had not intended to expose. The issue was not that Docker is inherently insecure; it was that the sandbox was constructed for a set of assumed inputs that adversarial use cases did not respect.
Jupyter notebooks have a similar track record. Jupyter was designed for interactive scientific computation, not adversarial containment. Many cloud ML platforms built their sandboxing around JupyterHub or equivalent infrastructure and inherited its assumptions about what would run inside it. Researchers found repeatedly that code running in these environments could reach outside them in ways that the platform operators had not anticipated.
The general pattern: a powerful execution primitive designed for legitimate use, exposed to inputs that can be influenced by adversaries, with containment that was specified against a benign threat model.
Prompt Injection as the Likely Entry Point
For an AI system that reads from a data warehouse, prompt injection via stored data is the most direct attack vector. If Cortex Analyst or another Cortex component retrieves rows from Snowflake tables as part of its processing, and if an attacker controls what is in those rows, the attacker can embed instructions in the data that the AI treats as part of its context.
This class of attack has been documented extensively against RAG-based systems. Researchers at Embrace the Red demonstrated it against Microsoft 365 Copilot: documents retrieved during a search could contain instructions that overrode the system prompt, causing the AI assistant to exfiltrate data to attacker-controlled endpoints. The AI had no way to distinguish between content it retrieved legitimately and content planted to manipulate it, because both arrived through the same retrieval path.
In a Snowflake context, the attacker-controlled data might come from an external stage, a third-party data share, or user-submitted records in any application that writes to a Snowflake table. Once injected instructions reach the AI context and the AI has access to code execution, the path from prompt injection to actual code running in the execution environment becomes short.
The “executes malware” finding reported this week implies the vulnerability goes further than causing the AI to output malicious text. Getting a language model to generate malicious code is trivial; having that code execute in a privileged environment requires an execution surface with insufficient isolation. Snowflake’s Cortex infrastructure provided one.
Why Data Warehouses Are High-Value Targets
Data warehouses hold the most sensitive enterprise data in an organization: financial records, customer PII, product analytics, internal communications ingested via integrations. They are also deeply trusted systems by default. Engineers rarely question whether a result returned from Snowflake is legitimate in the way they might scrutinize an inbound webhook from an unknown external source.
That implicit trust is leverage. A compromised AI layer in a data warehouse inherits the authorization grants of whatever service account or role Cortex runs under. In many Snowflake deployments, AI features are configured with broad grants because restricting them breaks the feature. The attacker does not need to escalate privileges; they inherit them.
Multi-tenancy compounds this. Snowflake runs thousands of enterprise customers from shared compute infrastructure. A sandbox escape in a shared AI execution environment does not stay within one customer’s boundary. Cloud providers invest heavily in tenant isolation at the storage and query compute layers, but the AI execution layer is newer, less scrutinized, and has not had the same years of adversarial testing that the core database engine has accumulated.
Historical Precedent in ML Infrastructure
The security debt in AI execution environments has a direct predecessor in ML data pipeline tooling. The JFrog security team documented how Python’s pickle serialization format, widely used for ML model distribution, allows arbitrary code execution on deserialization. MLflow model registries, Hugging Face model loading pipelines, and Jupyter-based training environments all had exposure because pickle was never designed for untrusted inputs and the ML ecosystem adopted it anyway.
The remediation path was slow because the tooling was widespread and the security properties of pickle were poorly understood outside of security-focused communities. Many organizations are still running model loading pipelines that deserialize arbitrary pickles from semi-trusted sources.
The Snowflake Cortex situation is a higher-stakes version of the same pattern. The execution primitive is more capable, the data it touches is more sensitive, and the trust model of the surrounding infrastructure was built for a different kind of workload. History suggests the remediation path will also be slow, because AI features in data platforms are not going to be turned off while organizations figure out how to harden them.
What Teams Running Cortex Should Do Now
The service account or role that Cortex AI features operate under should have the minimum privileges necessary for the specific workload. Broad SYSADMIN or ACCOUNTADMIN grants are inappropriate for AI-driven queries. Snowflake’s column-level security, row access policies, and network policies all help constrain the blast radius of a compromised execution context.
Data ingested into tables that Cortex reads, especially from external stages, third-party shares, or user-generated content, should be treated as potentially adversarial input rather than trusted warehouse data. This is standard practice for web security teams but is not yet a reflex for data engineering teams.
Logging every Cortex function call via Snowflake’s query history and access event tables creates an audit trail that can surface anomalous behavior after the fact. Snowflake’s Trust Center provides configuration scanning that can identify overprivileged roles and accounts.
The broader takeaway from this disclosure is that AI features in cloud data platforms are not yet at the security maturity of the underlying databases they run on. Snowflake’s core query engine has been under adversarial scrutiny for years. Cortex is recent, and new execution surfaces in high-trust environments have a consistent record of revealing assumptions that turned out to be wrong. Security teams that have not yet reviewed their Cortex configurations as seriously as their network ingress or IAM policies should add that to the queue.