LiteLLM and the Problem With Putting Your API Keys Behind a Proxy

A post from Simon Willison flagged a security incident involving LiteLLM that affected around 47,000 users. If you run a LiteLLM proxy, or work at a company that does, now is a good time to audit what’s deployed and what credentials it holds.

LiteLLM occupies a specific and increasingly load-bearing position in the AI toolchain. Understanding why it’s a valuable target, and what the attack surface looks like, matters beyond this particular incident.

What LiteLLM Actually Is

LiteLLM, maintained by BerriAI, started as a Python library that translates OpenAI-format API calls into the format expected by 100+ model providers. You write code against the OpenAI SDK, point it at LiteLLM, and it handles the translation to Anthropic, Cohere, Mistral, Google, Bedrock, and dozens of others. That’s the library side.

The proxy is a different beast. It’s a FastAPI server that sits in front of your LLM providers and gives you a centralized point for routing, cost tracking, rate limiting, budget management, and key rotation. Organizations run it so individual teams can get an API key that scopes to a budget and a set of allowed models, without handing out raw provider credentials.

# litellm proxy config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: sk-...
  - model_name: claude-3-7-sonnet
    litellm_params:
      model: anthropic/claude-3-7-sonnet-20250219
      api_key: sk-ant-...

litellm_settings:
  master_key: sk-internal-...

That config file is exactly what makes the proxy both useful and dangerous. Every upstream provider key lives in one place, managed by one process, often running in a container with network access to internal services.

Why AI Proxies Concentrate Risk

The original promise of a proxy like this is credential isolation. Teams get scoped virtual keys; the real provider keys never leave the proxy. In practice, this creates a different concentration problem: compromise the proxy and you have all the keys, billing access, and potentially the ability to make requests that look legitimate to every downstream provider.

LiteLLM’s proxy also stores state: spend logs, user mappings, routing rules. Depending on configuration, this lands in a PostgreSQL or SQLite database. A deployment that’s been running for a year has months of conversation metadata, model usage patterns, and request logs sitting in that database.

The proxy runs with enough trust to call out to external APIs, which makes Server-Side Request Forgery a natural concern. CVE-2024-35576 documented an SSRF vulnerability in earlier LiteLLM proxy versions where crafted model names could be used to make the proxy issue requests to arbitrary URLs, including internal cloud metadata endpoints. On AWS, that means http://169.254.169.254/latest/meta-data/ and the instance credentials sitting there.

The Attack Surface

LiteLLM’s proxy exposes a lot by default. The /health endpoint is unauthenticated in many configurations. The admin UI, when enabled, provides a dashboard for managing keys and viewing spend. The OpenAI-compatible /v1/models endpoint often leaks which providers are configured.

The YAML-based configuration is parsed with PyYAML, which historically supported arbitrary object deserialization via the !!python/object constructor unless you use yaml.safe_load. Any code path that loads user-influenced YAML is a potential vector.

The proxy also integrates with a callback system for logging and alerting. Callbacks are pluggable and some integrations, when misconfigured, can be induced to send request bodies to attacker-controlled endpoints.

None of this is unique to LiteLLM. Any sufficiently featureful proxy with plugin support ends up with a wide attack surface. The issue is that LiteLLM gets deployed quickly, often by teams who are moving fast on AI features and treating the proxy as configuration plumbing rather than a security boundary.

How Deployments End Up Vulnerable

A common pattern: a developer spins up LiteLLM in Docker Compose to consolidate LLM access for their team. The master_key is set, but the container is on a network segment that’s too permissive. Or the proxy is exposed on a public port because the developer needed external webhook callbacks to work. Or the version running is six months old and hasn’t been updated since initial deployment.

# The compose setup that seemed fine at the time
docker run -d \
  -p 0.0.0.0:4000:4000 \
  -v ./config.yaml:/app/config.yaml \
  ghcr.io/berriai/litellm:main-latest \
  --config /app/config.yaml

Binding to 0.0.0.0 on a misconfigured VPC, combined with an unpatched CVE or a weak master key, is how incidents like this happen at scale. The 47,000 figure suggests systematic scanning and exploitation rather than targeted attacks, which is consistent with how automated tools probe for exposed management interfaces.

What Changed in Recent Versions

LiteLLM has moved quickly on security in response to researcher reports. Recent releases added LITELLM_MODE=PRODUCTION environment variable support that disables several debugging endpoints. The default configuration now requires authentication on more routes. Secret management via environment variable references (os.environ/MY_KEY) rather than inline values in config files has been in the documentation for a while but wasn’t enforced.

The project’s security posture is better than it was in mid-2024, but it’s still a complex codebase with a large attack surface, and the pace of feature development means new code gets added constantly.

If You Run LiteLLM

First, check whether your proxy is publicly accessible when it shouldn’t be. A quick curl http://your-proxy:4000/health from outside your expected network boundary tells you quickly.

Second, audit the version. The LiteLLM changelog is detailed about security fixes. Compare your running version against current.

Third, assume your provider keys may be compromised if you were running a vulnerable version and rotate them. OpenAI, Anthropic, and the other major providers all have key management dashboards. Rotating a key takes a few minutes; the cost of not rotating is potentially months of unauthorized usage billed to your account.

Fourth, consider what the proxy can reach. If it’s running in the same network segment as your databases or internal services, that’s a lateral movement risk even if the proxy itself is now patched. A proxy that only needs outbound internet access for LLM APIs should be network-isolated accordingly.

# Better: reference env vars, don't inline keys
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY

The Broader Pattern

This incident is part of a trend worth naming. AI infrastructure tooling is being deployed rapidly by teams who are not primarily thinking about security, into environments where a compromise has compounding consequences. The proxy holds provider credentials. The database holds conversation data. The network access is broad because AI features are being bolted onto existing systems quickly.

Tools like LiteLLM, LangChain, and similar AI middleware are not inherently insecure, but they accumulate trust quickly because they’re positioned as plumbing. Plumbing that holds API keys to a dozen cloud services, has an HTTP interface, and runs in your production VPC is not just plumbing; it’s infrastructure that deserves the same scrutiny as a database or an auth service.

The 47,000 number is large enough to suggest this wasn’t just a few misconfigured deployments. It reflects how broadly and quickly this tooling has been adopted without corresponding security practices. That gap is worth closing deliberately, before the next incident is larger.