· 5 min read ·

Why Endian-Portable Code Is Often Neither

Source: lobsters

The debate surfaces every few years, someone posts a take on endian portability, and the argument restarts from roughly the same positions. The most recent entry lands the same point its subtitle suggests: this again.

The tired quality of the debate is worth examining on its own terms. Danny Cohen coined the term “endian” in his 1980 paper “On Holy Wars and a Plea for Peace”, borrowing from the Big-Endians and Little-Endians of Swift’s Gulliver’s Travels. Cohen was arguing for standardizing on big-endian for network protocols and mostly won that argument. TCP/IP, DNS, and most binary network formats are big-endian to this day. But the hardware went the other way entirely.

The Hardware Settled Long Ago

x86 has been little-endian since the 8086. ARM, despite being bi-endian in its architecture specification, ships in little-endian mode almost universally, including Apple Silicon. RISC-V defaults to little-endian. The architectures that dominated the big-endian side, including Motorola 68k, SPARC, and older PowerPC, have either disappeared from new hardware or converged. When IBM’s POWER9 and POWER10 run Linux, they run it in little-endian mode, a shift IBM made in 2014 precisely to reduce the portability overhead on a platform that was otherwise incompatible with the broader Linux ecosystem.

This means that when a developer writes “endian-portable” code today, they are writing defensively against a threat that exists on an increasingly narrow slice of hardware. That is not inherently wrong; a compiler targeting embedded systems with genuinely mixed deployments has legitimate reasons to care. But the cost of that defensive posture is not always acknowledged, and the code it produces is not as honest as it looks.

What Portable Endian Code Actually Looks Like

The standard POSIX approach in C gives you four functions: htons, htonl, ntohs, ntohl. These convert between host byte order and network byte order (big-endian). On a big-endian host, they are no-ops. On little-endian x86, they emit byte-swap instructions.

#include <arpa/inet.h>

struct packet_header {
    uint16_t type;
    uint32_t length;
};

void serialize_header(const struct packet_header *h, uint8_t *buf) {
    uint16_t type_n   = htons(h->type);
    uint32_t length_n = htonl(h->length);
    memcpy(buf,     &type_n,   2);
    memcpy(buf + 2, &length_n, 4);
}

This is correct. It also has a failure mode: the moment someone writes h->type directly into the buffer without calling htons, the code silently works on big-endian hosts and silently corrupts data on little-endian ones. The portability machinery works when you use it and fails when you forget to, and the failures are silent. That asymmetry is the core of the anti-portability argument.

The alternative is to declare explicitly which endianness you are targeting, write conversion code at boundary layers, and let the compiler optimize it away where it is a no-op:

static inline uint32_t read_le32(const uint8_t *p) {
    return (uint32_t)p[0]
         | ((uint32_t)p[1] << 8)
         | ((uint32_t)p[2] << 16)
         | ((uint32_t)p[3] << 24);
}

GCC and Clang both recognize this pattern and emit a single mov on little-endian targets, with a bswap added on big-endian. The code is explicit about its byte order. You can read the significance of each byte directly from the source, and the optimization is available without any magic.

Rust Made an Explicit Design Decision Here

Rust’s standard library added endian-aware byte conversion directly onto integer primitives in Rust 1.32. You get methods like u32::from_le_bytes, u32::to_be_bytes, u64::from_ne_bytes (native-endian), and the full matrix of combinations. These accept and return [u8; N] fixed arrays, which means the conversion is type-checked and cannot be confused with a direct memory read.

fn deserialize_header(buf: &[u8; 6]) -> (u16, u32) {
    let msg_type = u16::from_be_bytes(buf[0..2].try_into().unwrap());
    let length   = u32::from_be_bytes(buf[2..6].try_into().unwrap());
    (msg_type, length)
}

The critical design decision is that there is no implicit conversion. There is no “read this integer from memory” operation that silently uses the host byte order. Every boundary crossing is spelled out in the source. The byteorder crate on crates.io took the same approach before the methods landed in stdlib, and its API design demonstrably influenced what was standardized. The crate is largely redundant now, which is exactly the outcome good API design should produce.

This is the anti-portability argument made concrete: rather than writing code that tries to handle both byte orders transparently, you pick one, state it explicitly in your API contracts, and do the conversion at the boundary. If you are little-endian throughout, those conversions are no-ops and the compiler eliminates them. If you ever run on big-endian hardware, the conversions appear exactly where the code says they will, not silently everywhere that was missed.

Where the Argument Has Limits

The case for genuine endian portability narrows to specific domains. Binary file formats that predate the little-endian consolidation, TIFF being a clear example, support both byte orders and encode which one is in use in the file header (“II” for Intel/little-endian, “MM” for Motorola/big-endian). Reading those formats correctly requires runtime dispatch on byte order, and that dispatch is load-bearing. Network protocol implementation at the kernel level needs to handle hardware variation for NIC driver authors. Emulators cross endian boundaries by design.

Outside those domains, most code encountering endianness is either doing network I/O, where big-endian is the protocol-specified wire format and conversion is required regardless of the host, or storage serialization, where you chose a format and should be consistent about it. The POSIX htons family is the right tool for the first case. An explicit serialization strategy is the right tool for the second. Neither case benefits from implicit, deferred endian handling.

C23 Acknowledges the Problem Without Fixing It

C23 added <stdbit.h> with bit manipulation utilities and standardized the endianness detection macros __STDC_ENDIAN_BIG__, __STDC_ENDIAN_LITTLE__, and __STDC_ENDIAN_NATIVE__. Previously these existed as platform-specific extensions in <endian.h> on Linux and <sys/endian.h> on BSD systems, with different macro names and no standard guarantees. Standardizing them is useful for compile-time detection.

But the underlying problem, that the binary representation of a struct in memory is implementation-defined in C, remains unchanged. fwriteing a struct and freading it on the same machine works. Sending it over a network to a different-endian machine, or reading a file written on different hardware, produces silent garbage. The standard does not define how integers map to bytes; it defines their numeric values and the behavior of arithmetic on them. The solution is to serialize field by field with explicit byte order, not to write the struct out whole and hope.

Go’s encoding/binary package has held this position since the beginning. You pass a binary.ByteOrder, either binary.BigEndian or binary.LittleEndian, and read or write integers one at a time. There is no shortcut that lets you pretend struct layout is consistent across machines. The verbosity is the point.

What the Recurring Argument Is Really About

The reason this discussion keeps coming back is that the portability ideal is genuinely appealing. Write once, run anywhere carries obvious value, and endian portability sounds like part of that. The problem is that endian portability requires correct usage at every single boundary crossing, and the cost of missing one is a silent bug that only manifests on hardware that is increasingly rare. The anti-portability argument, read charitably, is an argument for making that discipline unnecessary: commit to an explicit byte order, write it into your API contracts, and let the compiler handle the no-op elimination on matching hardware.

You get correctness from explicit code and performance from zero-cost conversions. The hardware already made its choice. Code that acknowledges that choice directly is easier to reason about than code that pretends the choice is still open.

Was this interesting?