When the Scaffolding Becomes the Structure

A paper out of arxiv has been making rounds in developer communities this week, and the headline is blunt: AI assistance reduces persistence and hurts independent performance. People who used AI help on a task gave up sooner when problems got hard, and they performed measurably worse when the assistance was later removed. Neither finding is surprising if you have spent time in learning science literature, but that doesn’t make it less worth understanding.

The finding that matters most isn’t the performance gap itself. It’s the persistence gap. When you know help is available, you tolerate struggle for less time before reaching for it. That behavioral shift, compounding across months of daily use, is where the real skill erosion accumulates.

The Productive Struggle Problem

Robert Bjork coined the term “desirable difficulties” in 1994 to describe the counterintuitive finding that learning conditions that feel harder in the moment tend to produce better long-term retention and transfer. Spacing practice over time, interleaving different problem types, testing yourself before you feel ready, all of these slow down apparent progress while dramatically improving actual skill acquisition.

The mechanism isn’t mysterious. When retrieval is hard, your brain does more work to find the answer, and that work strengthens the memory trace. When retrieval is easy, the strength of the trace barely budges. An AI assistant, by definition, makes retrieval trivially easy. You don’t struggle to remember how to configure a rate limiter or write a particular bit of regex; you ask and get the answer. The answer is correct. You move on. The mental trace never forms.

This is the generation effect at scale. Research going back to Slamecka and Graf in 1978 showed that people remember words far better when they generate them (given the first letter and a definition) than when they simply read them. The same principle applies to code patterns, API shapes, and debugging strategies. If you never generate the thing yourself, you never really learn it in the durable sense.

You’ve Seen This Before

The debate around calculators in math education is instructive here. When calculators became widespread in classrooms, the concern was that students would stop learning arithmetic. The research outcome was more nuanced: calculators hurt basic fact retention when introduced before students had established foundational fluency, but had minimal negative effect, and sometimes positive effect, when used by students who already had those foundations. The tool wasn’t neutral, but context determined whether it was harmful.

GPS navigation provides a closer analog. A 2020 study published in Nature Communications found that passive GPS use is associated with reduced hippocampal gray matter volume and worse spatial navigation performance compared to people who navigated without GPS. Crucially, the association was strongest for people who had used GPS longest and most passively. Navigation app users who actively tried to understand their route before following turn-by-turn instructions showed less degradation.

The mechanism generalizes: passive consumption of correct outputs, even from highly reliable sources, does not build the internal model that active struggle builds.

Betula Sparrow’s team published a related finding in Science in 2011, showing that people remember information less well when they believe they can look it up later. The mere expectation of future access changes how the brain encodes information in the present. Knowing you have Claude or Copilot open in a tab is already affecting how you engage with problems, before you’ve typed a single prompt.

Where This Actually Bites Developers

The persistence finding maps onto something I notice in how I build things. When I’m writing Discord bot logic or a small systems utility, and I hit a genuinely unclear error, my behavior is different depending on whether I have an AI assistant open. Without it, I’ll read the docs, trace the call stack, look at related issues. I build a better mental model of the system because I have no other option. With it, I describe the error and get a plausible fix in twenty seconds. Often I don’t understand why the fix works. I note that, feel slightly uneasy, and move on.

The problem isn’t the twenty seconds. It’s that each of those unresolved encounters is a skipped opportunity to encode something durable. After a thousand of those interactions, what exactly have I learned? I’ve shipped a lot of working code. That has real value. But my independent debugging ability has advanced less than it would have if I’d built those thousand features with more friction.

For developers early in their careers, this is where the stakes are highest. Expertise in software development is largely a function of accumulated mental models: how network stacks behave under pressure, what database query plans actually do, when and why certain patterns fail at scale. Building those models requires encountering problems, being confused, forming hypotheses, testing them, and sometimes being wrong. AI assistance short-circuits that loop at precisely the moment it matters most.

The Fluency Illusion

Larry Jacoby’s work on fluency illusion is the other mechanism at play. When information processing feels easy, people routinely mistake that ease for knowledge. Reading AI-generated code that solves your problem feels like understanding. It often isn’t. The code is legible, well-structured, and correct, so your brain files it as learned material. But you haven’t solved anything. You’ve read a solution.

This is why the performance gap in the arxiv paper materializes when the assistance is removed. Participants experienced fluency throughout the assisted phase, and that fluency felt like competence. When tested independently, the gap between feeling competent and being competent becomes measurable.

A Tool Calibration Problem

None of this means AI assistance is uniformly bad or that the right response is to stop using it. That would be a bad reading of the research and also impractical advice. The correct response is to be deliberate about when you use it and how.

There’s a useful distinction between production work and learning work. When I’m building something I’ve built variants of before, and the goal is to ship correctly and quickly, AI assistance is appropriate. The skill is already encoded. Using a tool to move faster doesn’t degrade what I already know.

When I’m encountering a class of problem I haven’t solved before, or when I need to actually understand a system rather than just produce working output, AI assistance should be used much more carefully. Use it to check a hypothesis rather than to generate the hypothesis. Use it after struggling, not before. Treat the answer as a reference to verify against rather than a solution to copy.

The persistence finding from the paper suggests a specific practice: set a struggle timer before reaching for AI. Not as a puritanical exercise, but because ten to fifteen minutes of genuine independent effort, even failed effort, primes the kind of encoding that makes the AI’s answer actually stick. You’re not just getting the answer; you’re getting the answer in a context where your brain has been actively asking the question.

The research also points toward the value of periodic AI-free practice. Deliberate sessions of working through problems without assistance aren’t about rejecting the tool. They’re about verifying that the tool hasn’t become load-bearing in a way you haven’t noticed. If you sit down to build something in a familiar domain and find yourself reaching for AI immediately, on every small decision, that’s diagnostic information worth having.

The paper’s finding is real, the mechanism is well-understood, and the calibration is something developers can actually act on. The scaffolding is useful. Just check occasionally that you can stand without it.