Mistakes Are the Best Data You Will Never Collect

Sebi’s post Allow me to get to know you, mistakes and all frames AI personalization from the system’s perspective: asking to learn from the user holistically, errors included. That framing is worth taking seriously, because the industry default runs in exactly the opposite direction. Most AI systems treat corrections as embarrassments to route around rather than the richest signal the user ever sends.

This is an architectural decision with consequences. Getting it right requires thinking carefully about how corrections flow through three distinct layers: in-context learning during a session, retrieval-augmented personalization across sessions, and gradient-level updates that permanently shift model behavior. Each layer has different cost structures and different failure modes.

Why Corrections Are Privileged Signal

Most user behavior is ambiguous. A long dwell time on a recommendation could mean the user liked it, or that they had the tab open while making coffee. A click confirms interest but not degree. A purchase on an e-commerce platform tells you something, but not why the user bought it or what they almost bought instead.

A correction is different. When a user says “that’s not right” or fixes an AI output, they are doing two things simultaneously: providing the correct answer and implicitly labeling the model’s prior output as wrong for this user in this context. That is a labeled training example that the user generated without being asked, motivated by their own need for accuracy rather than by a reward survey they will fill out in exchange for a coupon code.

The information density is higher than most other implicit signals. The user has specified what they wanted, and the system has a comparison point: what it said versus what the user corrected it to. That delta carries information about the user’s model of the domain, their preferences, their vocabulary, and often their unstated priors.

The reason systems do not privilege this signal is not technical. It is organizational and incentive-based. Corrections feel like product failures. They show up in dashboards as error rates and correction rates, which product teams want to minimize. A system optimized to suppress the appearance of errors will smooth over corrections rather than learn from them.

In-Context Learning: The Immediate Layer

The simplest form of mistake-aware personalization happens within a single session. Modern transformer-based systems have large context windows, and there is nothing stopping a session from accumulating the full correction history as it develops.

Consider a writing assistant. If the user says “don’t phrase it that way, I prefer more direct sentences,” the correction can be appended to the system context:

[User preference, session correction at turn 7]
User dislikes complex sentence structures with embedded relative clauses.
Prefers short, direct sentences with explicit subjects.

This is in-context learning in the strictest sense: no weights change, but the behavior changes because the generation is conditioned on this accumulated preference record. It works reasonably well within a session. The limits are predictable. The context window is finite. If the session is long and generates many corrections, early preferences can fall out of the window. More subtly, in-context preferences lack the durability of learned representations; they can be contradicted by the base model’s strong priors if the user’s stated preference runs counter to the model’s training distribution.

There is also a coherence problem. A user might correct the system at turn 3, and by turn 15, the system might be generating outputs inconsistent with that correction because the generation is primarily driven by the current turn’s context. Corrections need to be explicitly structured and weighted in the context, not just appended as conversation history, to reliably influence later outputs.

Retrieval-Augmented Personalization: The Cross-Session Layer

In-context learning does not persist. The harder problem is building a user model that accumulates across sessions without requiring full-model fine-tuning for every user.

Retrieval-augmented generation provides a practical architecture here. The idea is to maintain a per-user store of preference records and correction history that gets retrieved and prepended to context at the start of each new session. The retrieval system selects the most relevant records based on the current query or task type.

The correction store might look something like this:

{
  "user_id": "u_abc123",
  "corrections": [
    {
      "timestamp": "2026-02-14T09:23:00Z",
      "context": "writing_assistant",
      "model_output": "It is worth noting that, in the context of...",
      "user_correction": "Just say: the context shows...",
      "inferred_preference": "prefers_direct_over_hedged",
      "confidence": 0.85
    },
    {
      "timestamp": "2026-03-01T14:07:00Z",
      "context": "code_review",
      "model_output": "You might consider refactoring this function",
      "user_correction": "Just tell me what's wrong directly",
      "inferred_preference": "prefers_direct_critique",
      "confidence": 0.90
    }
  ]
}

At query time, the system retrieves the top-k corrections most relevant to the current context and includes a distilled preference summary in the system prompt. This gives the model access to a persistent preference history without requiring a context window that can hold every correction the user has ever made.

The tricky engineering is in the retrieval step. Naive semantic similarity retrieval over correction records can surface stale or contradictory preferences. Users change. A preference about writing style from three years ago might not reflect how the user writes today. The store needs temporal weighting and conflict resolution. When two corrections contradict each other, you need a policy: prefer the most recent one, ask the user to clarify, or represent both and let the model navigate the ambiguity.

There is also the matter of what gets stored as a correction versus what gets stored as a stated preference. Explicit statements (“I prefer you not to use jargon”) should be weighted differently from corrections inferred from the delta between model output and user edit. The inferred kind can be wrong; the user might have been correcting a factual error, not expressing a style preference. Keeping these distinct in the store matters for retrieval quality.

User-Level Fine-Tuning: The Permanent Layer

For high-value, high-frequency users, there is a third option: actually updating model weights on the basis of a user’s correction history. This is expensive but produces more robust personalization than retrieval alone, because the preferences are encoded in the model’s weights rather than injected as context at inference time.

RLHF (reinforcement learning from human feedback) is the mechanism behind most current model alignment work, and it can be scoped to individual users. The correction signal maps naturally to the preference pair format RLHF uses: the model’s original output is the rejected sample, the user’s correction is the preferred sample. Over enough corrections, a fine-tuned user-specific adapter learns to weight the user’s preferences in its generation.

Parameter-efficient fine-tuning approaches like LoRA make this more practical. Rather than fine-tuning the full model for each user, you train small rank-decomposition adapter matrices that are merged at inference time. A user-specific LoRA adapter trained on 500 corrections might be a few megabytes, feasible to store and serve per-user at scale.

Base model weights (frozen)
    +
User LoRA adapter (trained on user_corrections)
    =
User-personalized model

The failure mode here is overfitting. If the user’s corrections are inconsistent, or if the correction distribution is narrow (all style corrections, no factual corrections), the fine-tuned adapter will reflect those biases. The model will become very good at satisfying the user’s stated preferences while potentially degrading on things they did not correct, because fine-tuning on a narrow distribution tends to hurt performance on the distribution it was not trained on.

There is also the question of whether users should know this is happening. A model that has been modified by your correction history is a different product than one that responds to contextual preference signals. The former can develop idiosyncratic behaviors that are hard to diagnose and reset. Exposing the user-level fine-tuning history and giving users a way to revert or inspect it is not just an ethical consideration; it is a practical necessity for debugging when the model starts doing something unexpected.

The Design Philosophy Question

Underneath all of this is a question about what errors mean in a human-AI interaction. There are two defensible positions.

The first treats errors as failures to be minimized. Under this view, the goal of the system is to produce correct outputs on the first try, corrections represent defects, and the success metric is the correction rate going to zero. This is the view that shapes most enterprise AI tooling dashboards.

The second treats errors as the primary mechanism through which the system learns who the user is. Under this view, the first interaction is almost always going to miss something, because the system does not know the user yet. Corrections are not defects; they are the conversation through which the system and user develop a shared understanding. The goal is not to eliminate corrections but to ensure that each correction makes the next one less necessary.

These two views lead to genuinely different architectures. The first optimizes the base model heavily and provides minimal per-user personalization infrastructure. The second invests heavily in the correction pipeline: storage, retrieval, weighting, conflict resolution, feedback loops. The first view produces systems that are correct on average and frustrating for users whose needs diverge from the average. The second view produces systems that start rough and improve as the user invests in them.

For a Discord bot, this is more than a philosophical question. The users who give the most corrections are often the most engaged users, the ones who care enough to say “that’s not what I wanted.” Building a system that treats their corrections as first-class feedback rather than noise is a way of rewarding that engagement. The bot improves for them specifically. Over time, the interaction feels less like using a tool and more like working with something that has learned your particular way of thinking about problems.

The harder version of Sebi’s framing, “allow me to get to know you, mistakes and all,” is not just the AI system asking permission to learn. It is a commitment on the system’s side: that the investment the user makes in correcting outputs will be retained, respected, and used to make the next interaction better. Most current systems do not make that commitment because storing and acting on correction history is expensive and requires organizational willingness to treat corrections as product inputs rather than product failures.

That is the actual design choice worth making explicit: not whether your system can learn from corrections, but whether your team treats the correction pipeline as load-bearing infrastructure or as a future enhancement to ship later.