The Failure Mode Intuitions That LLMs Cannot Hand You

The Joshi article published on Martin Fowler’s site in November 2025 identifies a structural problem with how LLMs interact with the learning process. The argument is framed around Kolb’s experiential learning cycle: LLMs interrupt the loop between concrete experience and reflective observation, substituting output for understanding. Looking back at that piece now, a few months on, the framing is accurate, but the argument gains force when you move from mechanism to specifics.

Not all technical knowledge is equally learnable from explanation. Some knowledge transfers well through documentation, blog posts, or an LLM explaining a concept. A different category only forms through the specific experience of building something that fails at scale, tracing why it failed, and updating your mental model in response. LLMs are structurally biased to prevent exactly this second category from forming, because they generate code that avoids common failure modes without creating the conditions under which those failure modes would manifest and teach you something.

Declarative vs. Experiential Knowledge

Cognitive psychologists distinguish between declarative knowledge, knowing that something is true, and procedural or experiential knowledge, knowing how to recognize it, act on it, or anticipate it. In technical domains, this distinction maps onto a specific asymmetry: declarative knowledge is transmissible; experiential knowledge is not.

“I know that N+1 query problems occur when lazy loading generates one query per related record” is declarative knowledge. You can acquire it from any ORM tutorial. The pattern recognition that lets you spot an N+1 query in a code review from fifteen lines of context, without a profiler, without seeing the query log, because something in the data access pattern looks familiar is experiential. It comes from having deployed code that generated hundreds of identical queries under production load, waited while the slow query log filled up, and traced the problem back to how the ORM was loading relationships.

The declarative description and the experiential recognition are not the same thing, and one does not reliably produce the other.

The Failure Modes That Teach Most

Some failure modes are particularly well-suited to building experiential knowledge because they share a property: they pass tests reliably, manifest only under realistic load, and leave symptoms that can only be traced with prior knowledge of the failure mode itself. An engineer who has never encountered them has no reason to look for them. An engineer who has debugged them recognizes the symptoms from the first sign.

Consider the N+1 query pattern:

def get_user_orders(user_ids):
    users = User.query.filter(User.id.in_(user_ids)).all()
    return [
        {'user': u.name, 'orders': [o.total for o in u.orders]}
        for u in users
    ]

This code is logically correct. It will pass unit tests that do not instrument the database. Under production load with thousands of users, it generates one query to fetch the users and one additional query per user to fetch their orders. The fix requires understanding that the ORM loads u.orders lazily on access, which means using eager loading strategies like joinedload:

from sqlalchemy.orm import joinedload

def get_user_orders(user_ids):
    users = User.query.filter(User.id.in_(user_ids)).options(
        joinedload(User.orders)
    ).all()
    return [
        {'user': u.name, 'orders': [o.total for o in u.orders]}
        for u in users
    ]

An LLM generating the first version will produce code that passes tests. An LLM generating the second version may or may not explain why. In neither case does the developer build the recognition pattern that comes from having caused the problem, investigated it under time pressure, and found the fix. That recognition is what surfaces reliably in code reviews years later.

Event Loop Blocking and the Async Trap

The same structure appears in async code. Python’s asyncio and Node.js share a fundamental failure mode: synchronous CPU-bound work blocks the event loop, starving I/O-bound tasks of scheduler time. The result is latency spikes under load with nothing anomalous in the logs, because the work is completing, just not yielding.

async def process_batch(items):
    results = []
    for item in items:
        # Blocks the event loop while running.
        processed = cpu_intensive_transform(item)
        results.append(processed)
    return results

Under light load, this behaves correctly. Health check endpoints respond; incoming requests are handled. As batch sizes grow with real data or request concurrency increases, the event loop becomes congested. Requests queue. Timeouts begin appearing with no correlation to external services or database load. The diagnosis requires knowing that synchronous code in an async context blocks the loop, which means knowing that async does not mean parallel.

from concurrent.futures import ProcessPoolExecutor
import asyncio

async def process_batch(items):
    loop = asyncio.get_event_loop()
    with ProcessPoolExecutor() as pool:
        results = await asyncio.gather(*[
            loop.run_in_executor(pool, cpu_intensive_transform, item)
            for item in items
        ])
    return results

The fix is documented in the asyncio event loop reference. The ability to recognize “this service has intermittent latency spikes correlated with request concurrency, not I/O” as event loop blocking comes from having been confused about exactly this failure mode long enough to rule out the more obvious candidates.

Connection Pool Exhaustion and the Scale Threshold

Connection pool exhaustion is another failure mode that only manifests above a threshold of concurrency. In development and staging environments that do not simulate production concurrency, it is invisible.

def process_payment(user_id, amount):
    with db.transaction():
        user = db.query(User).filter_by(id=user_id).first()
        # Database connection held open during this external HTTP call.
        result = payment_gateway.charge(user.card_token, amount)

        if result.success:
            db.add(Transaction(user_id=user_id, amount=amount))
    return result

This code holds a database connection open across an external HTTP call. Under concurrent load, when the payment gateway response time increases even slightly, connections accumulate in the pool. The pool exhausts. New requests queue. Response times climb. The error that eventually appears, something like “connection pool timeout,” points to the symptom, not the cause.

The fix separates the external call from the transaction boundary, releasing the connection during the network wait. Understanding why this matters requires having watched the pool metric drop to zero during an incident and traced back through the transaction lifecycle to find where connections were being held.

These are the ordinary failure modes of systems that work in development and fail in production. They cluster around the gap between logically correct and correct at scale, the specific gap that unit tests do not cover and LLMs do not address.

What the Learning Loop Was Building

Joshi’s argument is that LLMs shortcut the loop between concrete experience and reflection. The more specific version of that claim is that the loop was building a failure mode taxonomy, accumulated through incidents, through hours of confused debugging, through being wrong about the cause before finding the right one.

The GitClear 2024 analysis of 150 million lines of code found that code churn, code committed and then substantially revised or removed within two weeks, increased alongside AI tool adoption. Code that works in isolation but degrades under real conditions is exactly what you would expect from generation that optimizes for passing tests, not for the failure modes that tests don’t reach.

The developers who use LLMs most effectively are the ones who have already accumulated a failure mode taxonomy. They can evaluate generated code against a mental model of how it will behave under load, not just whether it handles documented inputs correctly. That model forms from the accumulated hours of being wrong about production behavior.

The practical implication from Joshi’s piece applies here specifically. Use LLMs for the declarative parts: the syntax, the fix, the conceptual explanation. Preserve the conditions that build the experiential part. That means running under realistic load before shipping, instrumenting your database queries, profiling under concurrency before concluding you have solved a performance problem. LLMs can generate the correct version of every code example above. They cannot substitute for having needed to know why.