· 6 min read ·

The Persistence Tax: What Research Keeps Finding About AI Tools and Skill Formation

Source: lobsters

A paper circulating on arxiv presents findings that should be familiar to anyone who has followed cognitive science and human-computer interaction research over the past two decades: AI assistance reduces persistence when users hit difficult problems and leads to worse independent performance afterward. The researchers found that participants who had access to AI help were less willing to persist through challenges on their own, and when the AI was removed, their performance suffered relative to those who had worked without it.

This isn’t a shocking result. It is, in fact, exactly what decades of related research would predict. The more interesting question is why we keep being surprised by it, and what specifically about AI tools makes this effect worth studying again.

The Pattern Goes Back Further Than GPT

In 2011, Betsy Sparrow and colleagues published “Google Effects on Memory” in Science, showing that people who expected to have access to information via search engine were less likely to encode that information themselves. They remembered where to find things rather than the things themselves. This was dubbed the “Google Effect” and generated the same cycle of concern and counter-argument that AI assistant research does now.

Navigation research produced stronger results. A 2017 study in Nature Communications compared London taxi drivers, who memorize the city’s street network, with GPS-dependent drivers, finding significant differences in hippocampal engagement during navigation tasks. Dahmani and Bohbot’s 2020 work in Scientific Reports found that lifetime GPS users showed reduced spatial memory and measurable differences in hippocampal gray matter volume. Aviation regulators have documented pilot skill degradation from autopilot reliance for years; the FAA’s 2013 safety alert explicitly warned carriers that over-reliance on automated flight management was eroding manual flying ability.

The pattern across domains is consistent: when a tool reliably handles a cognitively demanding task, the neural and procedural machinery for doing that task yourself weakens from disuse.

Desirable Difficulties and Why Struggle Matters

The mechanism behind these findings is well-understood in learning science under the label “desirable difficulties,” a framework developed by Robert Bjork in the 1990s and refined extensively since. The core insight is that conditions that make learning feel harder in the short term, spaced practice, interleaved problem types, retrieval practice without notes, generate better long-term retention and transfer than conditions that make performance feel smooth.

Struggling with a problem is not just an unpleasant side effect of the learning process. It is a large part of the mechanism. When you hit an impasse and have to search memory, reconsider your model, try a different approach, and eventually work through to a solution, you are doing the cognitive work that makes the knowledge stick and generalize. When an AI fills in that gap for you, the impasse resolves, but the work doesn’t happen.

This is distinct from looking something up in documentation. When you search for a function signature or read a spec, you still have to understand the answer, integrate it into your mental model, and apply it. AI completions and AI-generated solutions often bypass that integration step entirely. The code appears, it works, and you move on. The model in your head remains shallow.

Why Coding AI Makes This Worse Than Previous Tool Dependency

GPS dependency affects spatial navigation and little else. Calculator dependency affects arithmetic fluency. Both are meaningful, but neither sits at the center of a complex, hierarchical skill like software engineering.

Coding is deeply compositional. Understanding why a particular piece of code works the way it does depends on having mental models of the language’s memory model, its concurrency primitives, its standard library conventions, the performance characteristics of the data structures involved, and how all of these interact. These models are built through repeated, effortful engagement with the material. You write the thing, it breaks, you figure out why, you fix it, and your model gets more accurate.

AI coding assistants compress this loop aggressively. A junior engineer who has Copilot or Claude writing their code from natural language descriptions may produce working software without ever building the models required to reason about it independently. They pass the test suite, ship the feature, and their internal representation of what they built stays vague.

Senior engineers are not immune. The effect may be smaller because they have existing models to fall back on, but research on expertise and skill maintenance consistently shows that skills not practiced deteriorate. A principal engineer who has spent two years having AI write all their boilerplate, debug all their type errors, and draft all their architecture documents has been practicing less than they would have otherwise.

The Confound That Makes This Complicated

The counterargument, usually offered quickly in response to this kind of research, is that tool-assisted output is often better and faster, so the relevant question is not whether unaided performance declines but whether overall productivity improves. This is a legitimate point that deserves a serious answer.

The problem is that it conflates two different populations and two different timescales. For an experienced practitioner with well-developed underlying skills, AI assistance can amplify existing capability without much atrophy cost. The skills are there; the AI helps apply them faster. For someone still developing those skills, the situation is different. The assistance is replacing the practice that would have built the skills in the first place.

The second problem is that “productivity” is measured on current tasks but skills are required for future tasks, including the ones you can’t anticipate. A developer who can only work effectively with an AI copilot is in a fragile position. If the tool changes, if the context isn’t amenable to AI assistance, if they’re debugging a subtle race condition at 2 AM and need to reason carefully from first principles, the gap shows up. The persistence finding in the current paper is particularly telling here: reduced willingness to struggle through hard problems is exactly the wrong disposition for novel or poorly-specified problems, which are the ones that matter most.

The Educational Context Is the Most Urgent Part

Most of the concern about this research focuses on professional developers, but the stakes are highest in educational settings. The Prather et al. findings from computing education research showed novice programmers using AI assistants had trouble explaining their own code. This is not a peripheral concern. The ability to explain code you’ve written is a proxy for having actually understood it, and understanding is what transfers to new problems.

Teaching programming to beginners with AI assistance available from day one is roughly analogous to teaching arithmetic with calculators from the start: you might produce students who can get correct answers on familiar problem types while leaving them without number sense, estimation ability, or the capacity to recognize when an answer is implausible. The parallel isn’t perfect because arithmetic is more separable from its tools than programming is, but the directional concern is the same.

The design question for educators is not whether to ban AI tools, which is both practically difficult and likely counterproductive over a four-year curriculum, but where to place the scaffolding. Bjork’s desirable difficulties framework suggests the answer: introduce AI assistance after foundational skills are established, not before. Use it to extend what learners can do, not to bypass the effortful work of building their models.

What to Actually Do With This

For practicing developers, the research suggests some specific habits worth considering. Working through a problem yourself before reaching for AI assistance, even when the AI would be faster, preserves the practice that maintains skill. Treating AI-generated code as a draft that you understand deeply enough to rewrite from scratch, rather than as a solution you ship, keeps the integration step in the loop. Deliberately spending time on problems without AI assistance, the way athletes train for weaknesses, addresses the atrophy concern directly.

None of this requires rejecting these tools. It requires using them with an accurate model of what they cost and what they provide. The research in this paper, like the GPS and autopilot research before it, is not an argument for technological abstinence. It is an argument for understanding the terms of the trade you’re making when you hand cognition to a tool.

The persistence finding, specifically, is worth sitting with. The willingness to keep working on hard problems when progress stalls is one of the most valuable things a developer can have. If AI assistance is systematically eroding that disposition, the cost compounds over time in ways that aren’t visible in short-term productivity measurements.

Was this interesting?