· 7 min read ·

Byte Order Portability: The Gap the Standard Library Left Open for Fifty Years

Source: lobsters

Danny Cohen coined the terms big-endian and little-endian in his 1980 paper “On Holy Wars and a Plea for Peace”, borrowing from Swift’s Gulliver’s Travels, where two factions went to war over which end of a boiled egg to crack first. The joke was that arguing about byte order was equally absurd. Forty-six years later, this piece on dalmatian.life revisits the argument under the banner of “anti-portability,” and it’s still running.

The persistence of the debate is the interesting part. The endian question never went away because the standard library never actually solved it. Developers keep rediscovering the problem, reaching for the wrong tools, and then writing blog posts about it.

What C Gave You

The C standard has never mandated byte order. int, uint32_t, and friends are stored in whatever order the host machine uses. The standard is deliberately silent on this, and the compilers are correct to follow suit.

POSIX filled in one specific gap: htons, htonl, ntohs, and ntohl. These four functions convert between host byte order and network byte order, where network byte order is defined as big-endian per RFC 1700. Every Berkeley sockets program uses them. They work. But they only give you big-endian conversion, and the function naming is opaque enough that developers routinely misread them.

For little-endian wire formats, which many modern protocols and nearly all binary file formats use because the target hardware is predominantly little-endian, POSIX offers nothing. The htobe32/be32toh/htole32/le32toh family exists on Linux via <endian.h> and on BSDs via <sys/endian.h>, but it is not in POSIX. macOS exposes equivalent functionality through <libkern/OSByteOrder.h> under different names. Windows provides nothing comparable in its standard headers.

The result is that any code needing portable little-endian or explicit big-endian conversion outside of the sockets context has required a compatibility shim for decades:

#if defined(__linux__)
  #include <endian.h>
#elif defined(__APPLE__)
  #include <machine/endian.h>
  #include <libkern/OSByteOrder.h>
  #define htobe32(x) OSSwapHostToBigInt32(x)
  #define htole32(x) OSSwapHostToLittleInt32(x)
  #define be32toh(x) OSSwapBigToHostInt32(x)
  #define le32toh(x) OSSwapLittleToHostInt32(x)
#elif defined(_WIN32)
  /* roll your own */
#endif

This boilerplate appears across the source trees of SQLite, OpenSSH, libpng, and most other projects that handle binary data portably. It is not exceptional engineering; it is mandatory plumbing that the standard should have provided.

The Type Punning Trap

Because the vocabulary for endian conversion was missing or scattered, developers fell back on whatever felt intuitive. The most common mistake is pointer casting:

/* undefined behavior in C and C++ */
uint32_t read_u32(const uint8_t *buf) {
    return *(uint32_t *)buf;
}

This violates the strict aliasing rule. The compiler is permitted to assume that a uint32_t * and a uint8_t * never alias the same memory, which lets it reorder or eliminate reads in ways that silently corrupt the result. GCC and Clang both apply this optimization by default at -O2. The code will work in debug builds and break unpredictably in release builds, which is an especially bad class of bug.

The union approach has a similar reputation:

/* legal in C11, undefined behavior in C++ */
union {
    uint32_t val;
    uint8_t bytes[4];
} u;
u.val = 0x01020304;

C11 explicitly permits reading from a union member other than the one last written. C++ does not. Using unions for type punning in C++ is undefined behavior, though most compilers handle it as an extension.

The actually correct approach uses memcpy, which compilers recognize and handle specially without introducing aliasing violations:

uint32_t load_u32_le(const uint8_t *buf) {
    uint32_t val;
    memcpy(&val, buf, sizeof val);
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
    val = __builtin_bswap32(val);
#endif
    return val;
}

Or, without relying on compiler builtins, use explicit bit shifts, which are always well-defined for unsigned integers:

uint32_t load_u32_le(const uint8_t *buf) {
    return (uint32_t)buf[0]
         | (uint32_t)buf[1] << 8
         | (uint32_t)buf[2] << 16
         | (uint32_t)buf[3] << 24;
}

A modern compiler targeting x86-64 or AArch64 will recognize the bit-shift pattern and emit a single load instruction, with a byte-swap instruction added on big-endian targets. The abstraction costs nothing at runtime.

What C++ Eventually Added

C++20 shipped std::endian in <bit>, which provides compile-time detection:

#include <bit>

if constexpr (std::endian::native == std::endian::little) {
    // little-endian compile-time path
} else if constexpr (std::endian::native == std::endian::big) {
    // big-endian compile-time path
}

This is detection only. It tells you what byte order the current host uses, which lets you branch, but it provides no conversion functions.

C++23 added std::byteswap in the same header:

#include <bit>

uint32_t swapped = std::byteswap(uint32_t{0x01020304}); // 0x04030201

Combined, these give C++ a standard, portable vocabulary for endian handling for the first time. The gap between “C has htonl” and “C++ has std::byteswap” is roughly fifty years. For a problem as fundamental as reading a multi-byte integer from a byte buffer, that is a long wait.

std::byteswap does not by itself replace the load pattern. You still need to combine it with a memcpy-based load or the bit-shift idiom, then conditionally swap based on std::endian::native. But at least the swap itself is now standard.

The “Just Assume Little-Endian” Argument

The anti-portability position holds that, for application software in 2026, worrying about big-endian hosts is a waste of time. x86-64 and AArch64 in little-endian configuration have captured essentially the entire desktop, server, and mobile market. SPARC, PA-RISC, and big-endian MIPS systems, which made endian portability a practical concern in the 1990s, are gone from most deployment targets.

This is largely correct as a statement about hardware. If you are shipping a web service that runs on cloud infrastructure, every instance is little-endian, and writing code that handles big-endian hosts adds complexity without adding real-world robustness.

The problem with the argument is that it conflates hardware byte order with serialization format. Even in a completely little-endian fleet, your code still reads and writes data that travels across networks or gets stored to disk. Those formats have their own byte order, determined when the format was designed, and they do not change to match your hardware.

DNS uses big-endian throughout. ELF binaries can be either endianness, and a little-endian toolchain processing big-endian ELF for a cross-compilation target must handle this correctly. SQLite stores integers in big-endian on disk by default. PNG uses big-endian for all multi-byte integers. Protocol Buffers uses little-endian for its fixed-width fixed32 and fixed64 wire types. CBOR uses big-endian. USB descriptors use little-endian. MIDI uses big-endian. The list is long and inconsistent.

The endianness question for formats and protocols will not go away regardless of what hardware you run on. Declaring that byte-order portability does not matter because your servers are all x86 conflates two different concerns.

The NUXI Problem as Diagnostic

The classic demonstration of endian bugs is the NUXI problem from early UNIX portability work. On the PDP-11, with its unusual word-swapped 32-bit storage, a four-byte sequence encoding the ASCII string “UNIX” came out as “NUXI” when read on another architecture. The name became shorthand for an entire class of latent bugs: code that works on the development machine and silently produces corrupted output on a target with different byte order.

The NUXI problem looks like a historical curiosity until you hit it in practice. The usual modern form involves a binary file format written on a little-endian machine and read on a big-endian embedded target, or a network protocol parsed with assumptions that match the author’s hardware but not the spec. The bug often lives dormant for years because the same endianness is used end-to-end throughout development and testing.

The correct defense is to make the byte order explicit at every serialization boundary. Named load and store functions (load_u32_le, store_u16_be) are more resistant to this class of bug than casting a struct pointer over a raw buffer, which encodes the host’s byte order as an invisible assumption.

Why the Argument Keeps Coming Back

The “this again” framing in the source article captures something real. The conversation happens in cycles because the practical situation keeps changing in one direction while the fundamental problem does not.

Every few years, developers encounter endianness while writing network code or a binary parser. The correct solutions are not obvious. The wrong solutions look reasonable. The standard library provided no portable vocabulary for this until C++23, which means developers who learned the problem and its solutions before that have passed down their own shims and idioms, often without explaining the underlying issue. The combination produces recurring rediscovery.

The hardware monoculture argument amplifies this. When most development happens on x86-64 or little-endian ARM, it is easy to write code that handles byte order implicitly and correctly by accident. That same code will fail when it encounters a big-endian serialization format, or visibly when compiled for a network appliance or embedded system where big-endian targets still appear. The bug is invisible until it is not.

Practical Guidance

For new code in 2026: use std::byteswap combined with std::endian::native for swap logic in C++23. For older C++, use platform headers behind a compatibility shim. For C, use the platform-specific headers or write explicit bit-shift load and store functions.

Never cast a pointer to read multi-byte integers from a byte buffer. Use memcpy or bit shifts. Document the byte order of every binary format field at the declaration site, not in a comment buried in a README.

The endian wars are not really a debate about which byte order is superior. The interesting question is why the C and C++ standards left this problem to application code for so long, and what that gap says about how slowly foundational tooling catches up to the problems that practitioners have already solved with varying degrees of correctness on their own.

Was this interesting?