· 7 min read ·

USB from the Software Side: Descriptors, Transfer Types, and the Platform Tax

Source: hackernews

The USB spec is long enough that most developers approach hardware communication either through an OS abstraction layer or by hoping the vendor ships a usable SDK. WerWolv’s introduction to userspace USB drivers makes a case that the protocol is more legible than its reputation suggests, particularly from the software side rather than the electronics side. After spending time with the spec and codebases like OpenOCD and libfreenect, the most useful starting point is the transfer type system, because that choice shapes nearly every other decision in how a driver is structured.

Descriptors as the Contract

Before any data moves, the host needs to understand what it’s talking to. USB accomplishes this through a hierarchy of descriptors: binary structures that the device returns in response to GET_DESCRIPTOR control requests during enumeration. The hierarchy is Device Descriptor → Configuration Descriptor → Interface Descriptor → Endpoint Descriptor, and each level adds constraints that a driver must respect.

The Device Descriptor’s 18 bytes contain the vendor ID (VID) and product ID (PID) the OS uses to select a driver, along with bDeviceClass (which can defer class assignment to the interface level), bMaxPacketSize0 (read before the host knows anything else, to size the first transaction), and bcdUSB encoding the USB specification version. The Configuration Descriptor adds the total descriptor length so the host can fetch the entire sub-tree in one request, the number of interfaces, and bMaxPower in units of 2 mA.

The Endpoint Descriptor is where the transfer type lives. bmAttributes bits 1:0 encode the type: 0 for control, 1 for isochronous, 2 for bulk, 3 for interrupt. bEndpointAddress encodes both the endpoint number in bits 3:0 and direction in bit 7, where 1 means IN (device to host). wMaxPacketSize and bInterval complete the picture.

Reading these with libusb gives you a complete endpoint map before writing a single transfer:

struct libusb_config_descriptor *config;
libusb_get_active_config_descriptor(dev, &config);

for (int i = 0; i < config->bNumInterfaces; i++) {
    const struct libusb_interface_descriptor *iface =
        &config->interface[i].altsetting[0];
    for (int e = 0; e < iface->bNumEndpoints; e++) {
        const struct libusb_endpoint_descriptor *ep = &iface->endpoint[e];
        printf("EP %02x: type %d, maxpacket %d\n",
               ep->bEndpointAddress,
               ep->bmAttributes & 0x03,
               ep->wMaxPacketSize);
    }
}
libusb_free_config_descriptor(config);

This is the first thing worth writing when picking up an unfamiliar device. The endpoint map tells you what the firmware designer intended, and it constrains the transfer types available to you.

The Transfer Type Is the Architecture

The four transfer types are not interchangeable. Choosing the wrong one either leaves performance on the table or introduces correctness problems.

Control transfers are the most structured. Every transaction has a SETUP stage (8 bytes: bmRequestType, bRequest, wValue, wIndex, wLength), an optional DATA stage, and a STATUS stage. They use endpoint zero exclusively. Every device supports them, the OS uses them for enumeration, and userspace code uses them for device configuration and class-specific commands. libusb exposes them via libusb_control_transfer(), which blocks until the transaction completes or times out.

Bulk transfers prioritize throughput over latency. Full-speed endpoints carry up to 64 bytes per packet; high-speed goes to 512; SuperSpeed to 1024. The host schedules bulk transfers with whatever bandwidth remains after periodic transfers are serviced, so there is no latency guarantee. Bulk does guarantee delivery, though: the hardware retries until it receives an ACK. Mass storage devices, debug probes, and SDR dongles use bulk endpoints. OpenOCD, which implements drivers for dozens of JTAG and SWD adapters, does nearly all its work through bulk transfers, typically 64 bytes per packet in both directions, with transfer durations measured in single-digit milliseconds.

Interrupt transfers are not interrupt-driven at the hardware level. They are polled transfers where the host guarantees a maximum polling interval encoded in bInterval of the endpoint descriptor. Low-speed devices poll every 1 to 255 milliseconds; high-speed devices can go as fast as every 125 microseconds. Keyboards, mice, game controllers, and most HID devices use interrupt endpoints. The maximum payload is 64 bytes at full speed and 1024 bytes at high speed. The guarantee here is bounded latency, not throughput.

Isochronous transfers reserve bandwidth in every USB frame (1 ms at full speed) or microframe (125 µs at high speed). There are no retries and no acknowledgment: dropped packets are dropped. This is the right choice for real-time continuous data streams where occasional loss is tolerable but latency variance is not. USB audio, UVC webcams, and the Kinect’s depth sensor are the canonical examples. libfreenect accesses the Kinect’s video stream via isochronous transfers running at 30 frames per second; the RTL-SDR project uses isochronous transfers to pull raw I/Q samples from RTL2832U-based dongles at up to 3.2 million samples per second.

The practical decision: use control transfers for configuration and commands; bulk for large data without timing requirements; interrupt for low-latency small payloads; isochronous for real-time streams that can tolerate loss. Most custom devices use bulk and control only, which is the simplest combination to implement correctly.

The Platform Tax

This is where userspace USB programming diverges from the comfortable fiction that libusb abstracts everything. The library’s API is uniform across platforms; the setup required to reach it is not.

On Linux, the kernel exposes every USB device through usbfs at /dev/bus/usb/. The nodes are owned by root by default. For non-root access, a udev rule is the standard fix:

SUBSYSTEM=="usb", ATTR{idVendor}=="1234", ATTR{idProduct}=="5678", MODE="0666"

Place that in /etc/udev/rules.d/99-mydevice.rules and reload with udevadm control --reload. If the device already has a kernel driver attached (any HID device or USB audio device will), you also need to detach it before claiming the interface: libusb_detach_kernel_driver(handle, interface_number). libusb can do this automatically if you set LIBUSB_OPTION_LOG_LEVEL, but the explicit call makes the lifetime clearer.

On Windows, there is no usbfs equivalent. Every USB device needs a kernel-mode driver, and by default devices either have a vendor-supplied one or Windows has loaded a class driver. To access a device via libusb, you need to install WinUSB, Microsoft’s generic USB kernel driver. The standard tool for this is Zadig, which replaces the existing driver with WinUSB in a few clicks. libusb’s Windows backend then calls WinUsb_ControlTransfer(), WinUsb_ReadPipe(), and WinUsb_WritePipe() internally; the libusb API remains identical, but you have replaced the vendor driver for that device on that machine. That change persists until manually reversed through Device Manager, which matters for devices like game controllers or audio interfaces that Windows also wants to claim.

On macOS, IOKit provides USB access, but the permissions model tightened significantly in macOS 10.15 and again in Ventura. Applications now often need an explicit IOUSBHost entitlement in their code-signing profile, and the behavior of kernel driver detachment differs depending on the security context. Unsigned command-line tools tend to work without the entitlement in development, but distribution through the Mac App Store or notarization changes the picture.

HID Changes the Calculus

HID (USB device class 0x03) warrants separate treatment because every major OS ships a built-in HID driver. By the time your userspace code runs, the kernel has already claimed the HID interface. Using libusb directly requires detaching that kernel driver, which disrupts any other consumer of the device. On Linux this is particularly disruptive because the kernel HID driver also exposes the device through /dev/input/event*, which desktop environments use for input handling.

hidapi solves this cleanly by routing around libusb entirely for HID. On Linux it reads from /dev/hidraw*, which coexists with the kernel HID driver without disrupting it. On Windows it uses the HidD_* and HidP_* Win32 API functions directly, which means no driver swap is needed at all. On macOS it uses IOHIDManager. The API is small and focused:

hid_device *dev = hid_open(0x1234, 0x5678, NULL);

unsigned char buf[65] = {0x00}; // first byte is report ID, 0 if unused
buf[1] = MY_COMMAND;
hid_write(dev, buf, 65);

hid_read(dev, buf, 64); // blocks until data arrives
hid_close(dev);

Projects like yubikey-personalization, various Trezor bridge tools, and a long list of custom sensor applications use hidapi precisely because it sidesteps the Windows driver installation problem. The rule is straightforward: if the device is HID and you only need interrupt and control transfers, use hidapi. If the device is not HID, needs bulk or isochronous transfers, or you want a single code path that spans HID and non-HID devices, use libusb and accept the platform setup work.

sigrok and its companion application PulseView illustrate the libusb path well. They support dozens of logic analyzers and oscilloscopes, most of which use Cypress FX2-based hardware with custom bulk endpoints, no OS class driver, and firmware that gets loaded at runtime via a control transfer. hidapi would not reach these devices at all.

The Async Path

libusb’s synchronous API covers most use cases, but anything with sustained throughput or multiple simultaneous endpoints benefits from the asynchronous transfer path. The pattern: allocate a libusb_transfer struct with libusb_alloc_transfer(), fill in the endpoint, buffer, callback, and timeout, then submit with libusb_submit_transfer(). Drive completion by calling libusb_handle_events() in a loop, which processes any completed transfers and fires their callbacks.

libfreenect uses exactly this for continuous video: it pre-allocates a ring of isochronous transfer buffers and resubmits each buffer from within its completion callback, keeping the pipeline full without the overhead of per-transfer allocation. For bulk devices with high throughput, the same ring-buffer pattern applies, and the callback structure makes it natural to measure per-transfer latency and detect stalls.

The synchronous functions are wrappers around this same machinery. Understanding the async substrate is worth it once you hit throughput limits or need to multiplex transfers across multiple endpoints without blocking the application thread.

Putting It Together

The USB descriptor hierarchy, the transfer type system, and the platform gap are the three things worth internalizing before writing a userspace driver. The descriptors tell you what the device can do and how it expects to be addressed. The transfer type choice determines throughput, latency, and error behavior in ways that the higher-level API does not abstract away. The platform setup is genuinely different per OS, and ignoring it leads to code that works on Linux but requires a support article on Windows.

The actual libusb calls, once the setup is in place, are straightforward. The decisions that happen before the first libusb_open() call are the ones worth spending time on.

Was this interesting?