Privacy as a Training Constraint: What OpenAI's Pipeline Actually Does

OpenAI published a post walking through how ChatGPT learns from user interactions while trying to keep personal data out of model weights. The piece is light on implementation details and heavy on principles, which is the genre you get when a company is talking to regulators and journalists at the same time. That is fine, but it leaves the interesting question untouched: what does it actually take to train a frontier model on user data without baking sensitive information into the parameters, and how does the OpenAI approach compare to what other labs have published?

I run a Discord bot that logs conversations to disk, so the question of “what should I keep, for how long, and what should I scrub before it touches a model” is one I think about at hobbyist scale. The frontier-lab version of the same problem is the same shape with six more orders of magnitude.

What the post actually says

The OpenAI post commits to a handful of concrete things. Users can opt out of having their conversations used for training through the data controls in settings, and Team, Enterprise, and API traffic are excluded by default. Conversations that are used go through automated filters that strip personal information before training. There is human review under restricted access, retention windows are bounded, and there is a stated goal of reducing the volume of personal data in training sets over time.

The thing the post does not say is how any of that is implemented. There is no mention of differential privacy, no description of the PII detection stack, no numbers on what fraction of training data comes from user conversations versus licensed corpora and the open web. Compare this to the GDPR transparency obligations the company faces in the EU, where the Italian Garante’s 2023 ruling forced disclosures about lawful basis and data subject rights. The public-facing post is downstream of those obligations rather than a technical paper.

The primitives behind the marketing copy

When a lab says “we remove personal data before training,” there is a stack of techniques sitting behind that sentence. None of them are perfect, and the combination matters.

PII detection and scrubbing. The simplest layer is regex and named-entity recognition over training text. Microsoft’s Presidio is the open-source reference for this kind of pipeline, combining spaCy NER with pattern matchers for things like credit card numbers, phone numbers, and SSNs. The well-known failure mode is recall: a model that has seen “my number is five five five, one two three four” written out longhand will happily emit it on prompt, and pattern matchers miss the long tail of how people actually write sensitive information. Google published a paper in 2022 showing that even aggressive deduplication and scrubbing leave detectable memorization in large models.

Deduplication. Lee et al. 2021 showed that training-set memorization scales superlinearly with the number of times a sequence appears. Deduplicating training data, even with cheap minhash, dramatically reduces the rate at which models regurgitate specific strings. This is one of the most cost-effective privacy interventions available and is presumably standard at every frontier lab.

Differential privacy. The strong-guarantee approach. DP-SGD adds calibrated noise to gradient updates so that the contribution of any individual training example is bounded. Google’s federated learning work on Gboard uses this, and Apple’s Private Cloud Compute takes the architectural version, doing inference in attested enclaves with no persistent logging. The reason you do not see DP applied to frontier pretraining is that the noise budget at GPT-4 scale would be ruinous to capability. It is realistic for fine-tuning runs on narrow data, not for the base model.

Memorization audits. Carlini et al. demonstrated that you can extract training examples from production language models with surprisingly little effort. The followup work on quantifying memorization gave labs a methodology for auditing how much a given model has memorized, which is the lever you actually pull on when you want to claim privacy improvements between model versions.

The OpenAI post is consistent with a stack that uses scrubbing, deduplication, and post-hoc memorization testing, with DP reserved for specific subsystems. None of that is stated, so you are reading the tea leaves of what is technically feasible at their scale.

How this compares across the labs

Anthropic’s usage policy and privacy documentation states that consumer Claude.ai conversations are not used to train models by default, and API traffic is excluded unless the customer opts in. That is a stronger default than OpenAI’s, where the consumer ChatGPT default is opt-out rather than opt-in. The trade-off is data volume: OpenAI gets a much larger training corpus from user conversations because most users never touch the toggle.

Google’s approach with Gemini, documented here, keeps human-reviewed conversations for up to three years even after a user disables activity, on the grounds that review samples are detached from accounts. This is the most aggressive retention policy of the three labs, and it has drawn the most regulatory attention.

Apple, with Apple Intelligence, takes the architectural route. The on-device models are small enough to run locally; the larger requests go to Private Cloud Compute, which is designed so that Apple itself cannot inspect the requests. This sidesteps the training-data question by mostly not having one, at the cost of a much smaller model menu.

Meta’s approach with Llama is the inverse: weights are open, training data is mostly undisclosed, and the privacy story is downstream of whatever the deployer does with the model. The EU regulators have pushed back on Meta’s plans to train on European user posts, which is the regulatory pressure equivalent of what is happening to OpenAI through a different door.

The honest trade-off

The interesting tension in the OpenAI post is not stated explicitly. Training on user conversations is genuinely valuable for capability. The hardest queries, the ones where the model is weakest, are the ones users actually ask. Refusing to learn from them, the Anthropic default, means leaving real capability gains on the table. Learning from them, the OpenAI default, means accepting that no scrubbing pipeline is perfect and that some fraction of personal data will leak into weights. There is no third option that gets you both.

The regulatory question is whether opt-out is consent. GDPR Article 6 requires a lawful basis for processing, and OpenAI relies on legitimate interest for training, which is the same argument Meta tried and was forced to pause on. The recent EDPB opinion on AI models gave labs some room here but explicitly noted that the legitimate-interest test is fact-specific. Expect this to be the pressure point for the next year.

What would I want to see in a follow-up post? A specific number for what fraction of GPT-5 training came from user conversations. The PII recall rate on a held-out test set. The memorization rate measured the way Carlini et al. measure it, compared between GPT-4 and the current model. Anything that lets you check the claim against reality. The current post is a values statement, which is necessary but not sufficient. The technical evidence is what would make it land.