· 7 min read ·

The Temporal What: Side Effects and Ordering in the Age of LLM-Generated Code

Source: martinfowler

Back in January, Martin Fowler published a conversation between himself, Unmesh Joshi, and Rebecca Parsons about how LLMs reshape the abstractions in software development. The core framing, the “what/how loop,” describes the iterative process of specifying intent at one level of abstraction and filling it in with implementation at the next level down. The argument is that LLMs change where this loop sits by automating more of the “how” generation, shifting human effort toward specifying the “what” more precisely.

This framing is sound. But it systematically underweights a dimension of the “what” that is particularly hard to specify and particularly easy for LLMs to get wrong: the temporal dimension.

What We Mean When We Say “What”

When developers talk about specifying what a system should do, they usually mean something like a function signature, an interface, or a brief natural language description. “This function takes a user ID and returns their last five activity records, sorted by timestamp descending.” That is the structural what. It describes inputs, outputs, and type constraints.

The structural what is the part that type systems and interfaces capture well. A TypeScript interface tells an LLM exactly what shape the implementation must have:

interface ActivityService {
  getRecentActivity(userId: string, limit: number): Promise<Activity[]>;
  recordActivity(userId: string, type: ActivityType): Promise<void>;
}

Hand this to an LLM and it will generate a plausible implementation. Hand it a TypeScript compiler and the types will verify the structural contract.

But there is another dimension of the what that this interface does not capture: the temporal contract. What happens when recordActivity is called concurrently for the same user? Does getRecentActivity guarantee freshness after a write? If recordActivity fails after writing to the database but before updating the cache, what state does the system present to subsequent callers? In an event-driven system, what is the ordering guarantee between a user’s actions and the activity records they produce?

None of this is in the interface. All of it matters. And for any system with concurrent users, network calls, or persistent state, the temporal contract determines whether the system is actually correct.

The Gap That Shows Up in Practice

I run a Discord bot for a small community. When I have used AI assistance to add features, the structural code is nearly always right: commands parse correctly, database schemas match the data model, API calls use the correct endpoints. The failures appear in the temporal dimension.

One case: a feature to send a daily summary message. The structural what was clear, “query activity from the last 24 hours, format it, send it to a channel.” The generated implementation was structurally correct. What it lacked was handling for the case where the bot restarted between scheduling the summary and sending it. The scheduled job used an in-memory timer. On restart, the timer reset. The summary was never sent. The structural specification never mentioned persistence or restart behavior, so the implementation never considered it.

Another case: a welcome message for new server members. The generated implementation called the message sending API and then recorded the welcome in the database. On network failure, the message sometimes sent but the database write failed, leaving a state where the user appeared unwelcomed in the records but had already received the message. On retry, they received it twice. The structural what, “send a welcome message when a user joins,” did not capture the ordering constraint: the database write must precede or be atomic with the message send.

These are not exotic failure modes. They are standard correctness concerns for any system with side effects, concurrency, or persistence. But they live in the temporal specification, and the temporal specification is almost never written down.

Why the Temporal What Is Harder to Specify

The Fowler conversation frames the what/how separation in terms of cognitive load: abstractions work by hiding classes of “how” decisions so you can reason about the “what” without holding the “how” in mind simultaneously.

The temporal what resists this kind of abstraction for a structural reason. Temporal properties are about sequences of events and state transitions over time, which means they are inherently relational: they describe how parts of the system interact, not just what each part does in isolation. A single service interface cannot capture a temporal property that involves two services interacting under concurrent load.

Formal methods researchers have developed notation for precisely this problem. TLA+ lets you express and model-check temporal properties of distributed systems. Alloy provides relational logic for specifying structural invariants across state transitions. These tools exist because natural language descriptions of concurrent behavior are consistently ambiguous or incomplete.

(* TLA+ spec fragment: welcome message system *)
WelcomeAction(user) ==
  /\ user \notin WelcomedUsers
  /\ SendMessage(user, WelcomeText)
  /\ WelcomedUsers' = WelcomedUsers \cup {user}

The spec makes the constraint explicit: SendMessage and the set update are defined as a single atomic action. Any implementation must treat them atomically or provide compensating mechanisms. The natural language version, “send a welcome message when a user joins,” does not impose this constraint. It does not even imply it.

Most teams do not use TLA+ for their Discord bots or web APIs. The investment is substantial relative to the problem scale. But the gap that formal methods fill, the gap between what natural language can express and what a correct implementation requires, does not disappear because you are not using formal tools. It is still there; it just means that the LLM has to infer temporal contracts from context that is almost never in the prompt.

What This Means for the Fowler Framing

The Fowler conversation’s practical implication is that as LLMs make “how” generation cheaper, investment in specifying the “what” precisely becomes the rate-limiting work. The posts from Birgitta Böckeler on harness engineering and Rahul Garg’s design-first collaboration on the same site both point toward making the structural what more machine-readable: typed interfaces, module boundaries, descriptive naming.

The temporal what requires the same investment, but the tools are less established in most development workflows. A few practical approaches that work without requiring formal specification:

Idempotency by convention. Design all write operations to be safe to execute multiple times with the same input. This converts a failure-ordering concern into a design principle the LLM can apply consistently without the full temporal specification.

async function recordWelcome(userId: string): Promise<void> {
  await db.welcomes.upsert({
    where: { userId },
    create: { userId, welcomedAt: new Date() },
    update: {}, // no-op if record already exists
  });
}

State-before-side-effect ordering. Establish and enforce the convention that durable state changes (database writes, cache updates) precede external side effects (API calls, message sends). On failure, this produces a state where the effect has not happened rather than a state where it happened but was not recorded. Code review checklists or custom lint rules can enforce this mechanically.

Explicit event logs over implicit derived state. In event-driven systems, storing the event (“user joined”) rather than the derived state (“user welcomed”) makes replay and temporal reasoning significantly easier. The event log is the temporal record; everything else is a projection of it. This is the core insight behind event sourcing, which Fowler documented on the same site more than a decade ago. LLMs can follow this pattern if you establish it as the structural convention; they cannot infer it from a prompt that says “track which users have been welcomed.”

Property-based testing for temporal invariants. Libraries like fast-check in TypeScript or proptest in Rust let you express invariants that must hold across arbitrary sequences of operations:

import fc from 'fast-check';

test('no user receives more than one welcome message', () => {
  fc.assert(
    fc.property(
      fc.array(fc.record({ userId: fc.string(), event: fc.constant('join') })),
      async (events) => {
        const messagesSent = await simulateEventSequence(events);
        const counts = messagesSent.reduce((acc, m) => {
          acc[m.userId] = (acc[m.userId] ?? 0) + 1;
          return acc;
        }, {} as Record<string, number>);
        return Object.values(counts).every(count => count <= 1);
      }
    )
  );
});

This tests a temporal property over an arbitrary space of event sequences rather than a fixed set of scenarios. An LLM generating welcome message logic can be checked against this property without requiring the temporal spec to be in the prompt.

The Broader Point

The Fowler conversation is right that LLMs change where the what/how loop sits. The work that remains for developers shifts toward specifying the what more precisely. The posts and practices building up around harness engineering all point in this direction: make the structural what machine-readable through types, interfaces, and naming.

But the temporal what has never been part of the written specification. It has lived in the heads of developers who built the system, in the incident postmortems that documented the failure modes, in the unwritten rule that says “always write before you send.” LLMs make its absence consequential in a way that writing code by hand often obscured, because a developer writing code incrementally resolves temporal ambiguity in the act of implementation. A developer reviewing LLM-generated code has to reconstruct whether those temporal contracts were honored without having written the code themselves.

The investment the Fowler conversation implicitly calls for, specifying the what with enough precision that LLM translation is reliable, has to extend to the temporal dimension. The structural tools are mature: type systems, interfaces, module boundaries. The temporal tools are available but underused: idempotency conventions, state-before-side-effect ordering, event sourcing, property-based testing. The what/how loop does not end; it expands to include the parts of the specification that were always load-bearing but rarely written down.

Was this interesting?