Byte Order at the Boundary: The Endian Debate Misses the Real Question

The dalmatian.life piece on endian wars and anti-portability opens with “this again” in the title, and that weariness is earned. Danny Cohen framed this as a holy war in his 1980 USC/ISI paper, borrowed Gulliver’s Travels to name the factions (the Big-Endians who crack their eggs at the large end, the Little-Endians who crack at the small end), and ended with a plea for peace. Forty-six years later, the war still has combatants, and the arguments are nearly identical.

The reason it keeps recurring is that both sides are right about different things, and the debate almost always conflates them.

How the War Started

The original split was architectural. Motorola’s 68000, the IBM System/360, and most minicomputers of the 1970s stored multibyte integers with the most significant byte first: big-endian. Intel’s x86 line and DEC’s VAX stored them least significant byte first: little-endian. These machines ran the same C code and produced different binary layouts for the same int, which was silent, catastrophic, and very hard to debug when programs moved data across machines or read files written elsewhere.

The NUXI problem is the canonical early illustration. Early Unix porting work found that the string “UNIX” stored as two 16-bit words on a PDP-11, then read as 16-bit words on a big-endian machine, came out as “NUXI”. The bytes were all there. The order was wrong. The program did not crash or signal anything; it just silently produced garbage.

Cohen’s paper settled the network question by mandate: TCP/IP protocols would use big-endian on the wire. This became “network byte order,” and POSIX codified the htons(), htonl(), ntohs(), ntohl() functions to convert between host representation and network representation. Big-endian won the protocol layer in 1980, and that decision has never been revisited.

The Pragmatic Position and Its Actual Cost

The modern anti-portability argument goes roughly like this: big-endian hardware is extinct in general-purpose computing. x86-64 dominates servers and desktops. ARM runs little-endian on every phone and most embedded systems. RISC-V specified little-endian as the base ISA, with big-endian as an optional extension that essentially no one implements. Writing endian-conversion code for a codebase that will never run on big-endian hardware is YAGNI overhead that makes the code harder to read.

That argument has merit for application code. A Discord bot, a web service, a game engine targeting x86 and ARM: writing portable endianness handling throughout is genuine busywork if you have no big-endian deployment target and no plans to acquire one.

The problem is that this reasoning leaks into two places where it causes real damage: serialization code and library code.

If your application reads or writes binary files, network packets, or any format defined by a spec that specifies byte order, you need explicit conversions. Not because the hardware might change. Because the format says what the bytes mean and your code has to agree with every other implementation of that format. A PNG file has its chunk lengths in big-endian. A WAV file has its sample rate in little-endian. A DNS packet has its question count in big-endian. Getting these wrong on the current platform is a bug today, not a future portability risk.

Library code is the other landmine. Code you write that other people depend on carries your assumptions into contexts you did not design for. A parsing library that casts uint8_t * to uint32_t * and reads the result works on x86, works on ARM LE, and silently misreads data on IBM Z mainframes running Linux, on certain network ASICs, and on any future architecture that chooses big-endian for domain-specific reasons.

The Portable Idiom Is Free

The strongest argument against the anti-portability position is that the correct approach has no performance cost on modern hardware. The portable way to read a little-endian 32-bit integer from a byte buffer is:

uint32_t read_le32(const uint8_t *buf) {
    return (uint32_t)buf[0]
         | (uint32_t)buf[1] <<  8
         | (uint32_t)buf[2] << 16
         | (uint32_t)buf[3] << 24;
}

On x86-64 at -O2, GCC and Clang both compile this to a single mov instruction loading 4 bytes. No shifts execute at runtime. The compiler recognizes the pattern and emits the direct load. The readable, portable, correct version and the raw cast version produce identical machine code.

The memcpy idiom for type-punning works the same way:

uint32_t x;
memcpy(&x, buf, sizeof(x));
// then swap bytes if needed based on known byte order

This avoids undefined behavior from strict aliasing violations that *(uint32_t *)buf technically triggers. Compilers optimize it to a single load. There is no tradeoff here: the safe, portable version is both correct and fast.

For byte-swapping itself, C++23 standardized std::byteswap in <bit>:

#include <bit>
uint32_t swapped = std::byteswap(value);

This compiles to a single bswap on x86-64 or rev on AArch64. Before C++23, compiler builtins did the same job: __builtin_bswap32 on GCC and Clang, _byteswap_ulong on MSVC.

What C++20 and C++23 Actually Settled

C++20 added std::endian in <bit>, giving the language a standard way to query host byte order at compile time:

#include <bit>

if constexpr (std::endian::native == std::endian::little) {
    // fast path: data is already in the right order
} else {
    // swap
}

Before this, you were stuck with compiler-specific macros like __BYTE_ORDER__ on GCC or the morass of platform-detection headers. std::endian made the check portable across compilers, which is a mild irony: the standard solution to the portability problem required adding portability overhead to query the byte order, but at least it’s a single consistent way to do it.

std::byteswap in C++23 completed the picture. Between the two, modern C++ code has no excuse for either undefined behavior casts or non-standard builtins.

The Kernel’s Approach: Making Endianness a Type Property

The Linux kernel has the cleanest solution to the whole problem: make byte order part of the type. The kernel headers define __le32, __be32, __le16, __be16, and so on as annotated integer types. sparse, the kernel’s static analysis tool, understands these annotations and flags any code that mixes endianness without an explicit conversion:

__le32 value = cpu_to_le32(some_integer); // explicit conversion
__be32 net_value = cpu_to_be32(some_integer);

This approach turns endianness from a runtime convention into a type-checked property. You cannot accidentally assign a big-endian value to a little-endian field without the analyzer catching it. The conversions expand to no-ops on the native byte order and to bswap instructions on the opposite order.

Nothing stops application developers from adopting the same pattern outside the kernel, but almost nobody does. It requires discipline about annotating every serialized field, which is friction that the YAGNI argument tends to win against in application codebases.

Why This Keeps Coming Back

Big-endian is not actually dead. It is niche. IBM Z mainframes are in production data centers running real workloads. Network ASICs and certain telecom silicon still run big-endian internally, because the protocols they process are big-endian and zero-copy processing is more efficient when the byte order matches. Embedded targets with unusual requirements still exist.

More importantly, the argument is not primarily about hardware availability. It is about what assumptions you are encoding into code that others depend on. A library that works by coincidence on little-endian hardware and silently breaks on big-endian is a reliability failure regardless of market share. The Linux kernel has caught endianness bugs in drivers written by people who “only targeted x86” and then had their code ported or tested on MIPS or PowerPC.

The distinction that ends the argument: anti-portability reasoning is fine for the parts of your code that process data entirely in memory and never touch a defined binary format. It is wrong for serialization, protocol parsing, file format handling, and any library that might be reused. The cost of getting it right is zero on modern compilers. The cost of getting it wrong ranges from nothing, if your code never leaves x86, to a subtle, hard-to-diagnose data corruption bug in someone else’s deployment six months from now.

Danny Cohen wanted peace. The closest thing to it is deciding once, per codebase, where the byte-order boundaries are, using the standard tools at those boundaries, and not touching the question elsewhere.