· 7 min read ·

Under the Hood of USB: Descriptors, Transfer Types, and Writing Userspace Drivers

Source: hackernews

For most software developers, USB is just a port. Plug something in, drivers appear, the abstraction disappears. This works well until you need to build tooling for a custom microcontroller, write a driver for a device with no official Linux support, or understand why an existing tool talks to hardware the way it does. At that point the abstraction lifts, and you find yourself reading the USB specification, staring at endpoint addresses and descriptor fields that look nothing like the plug-and-play experience you expected.

This introduction to userspace USB development covers the core protocol mechanics well. What it opens up is worth exploring further: specifically, how the descriptor tree works as a self-describing protocol, how to read setup packets, and how to approach a device you have no documentation for.

The Descriptor Tree

Every USB device carries a tree of descriptors in its firmware. These are not just metadata for the OS; they are the formal specification of how the device expects to be communicated with. The hierarchy is fixed: one device descriptor at the root, one or more configuration descriptors beneath it, one or more interface descriptors per configuration, and one or more endpoint descriptors per interface.

The device descriptor carries the VID (vendor ID) and PID (product ID) that operating systems use to match drivers. It also carries bDeviceClass, bDeviceSubClass, and bDeviceProtocol. If bDeviceClass is 0x00, the class is defined per-interface rather than per-device, so you need to look further down the tree. If it is 0xFF, the device is vendor-specific and no standard protocol applies.

The interface descriptor is where most of the interesting classification happens. A USB audio interface has class 0x01; HID devices have class 0x03. Mass storage uses class 0x08 with subclass 0x06 (SCSI transparent command set) and protocol 0x50 (Bulk-Only Transport). These class codes are defined and published in the USB class code registry, and checking them is the first step when approaching any unfamiliar device.

Endpoint descriptors define the communication channels. Each endpoint has an address (a 4-bit number plus a direction bit), a transfer type, and a maximum packet size. The direction bit is 0x80 for IN (device to host) and 0x00 for OUT (host to device). Endpoint 0 is always a control endpoint, bidirectional, used during enumeration and configuration before any other endpoints are active.

Reading the full descriptor tree from Linux requires no custom code:

$ lsusb -v -d 1234:5678

  bDeviceClass        255 Vendor Specific Class
  bNumConfigurations    1

  Configuration Descriptor:
    bNumInterfaces      1

    Interface Descriptor:
      bInterfaceClass   255 Vendor Specific Class
      bNumEndpoints       2

      Endpoint Descriptor:
        bEndpointAddress  0x01  EP 1 OUT
        bmAttributes         2
          Transfer Type      Bulk
        wMaxPacketSize    0x0200  512 bytes

      Endpoint Descriptor:
        bEndpointAddress  0x81  EP 1 IN
        bmAttributes         2
          Transfer Type      Bulk
        wMaxPacketSize    0x0200  512 bytes

This tells you almost everything you need to start writing a driver: the device is vendor-specific, it exposes two bulk endpoints (one IN, one OUT), and the maximum packet size is 512 bytes. The 0x81 address for the IN endpoint is the 0x80 direction bit ORed with endpoint number 0x01.

The Setup Packet

The setup packet is the structure at the heart of every control transfer. It deserves careful attention because control transfers handle both standard operations like reading descriptors and vendor-specific initialization commands.

A setup packet is exactly 8 bytes:

+------------+----------+--------+--------+---------+
| bmReqType  | bRequest | wValue | wIndex | wLength |
|   1 byte   |  1 byte  | 2 bytes| 2 bytes| 2 bytes |
+------------+----------+--------+--------+---------+

bmRequestType is a bitfield: bits 6:5 encode the type (standard=00, class=01, vendor=10), bits 4:0 encode the recipient (device=00000, interface=00001, endpoint=00010), and bit 7 encodes direction (0=OUT, 1=IN). A value of 0xC0 means direction=IN, type=vendor, recipient=device. A value of 0x40 is identical but direction=OUT.

In libusb, libusb_control_transfer() takes these fields directly:

unsigned char buffer[64];

int ret = libusb_control_transfer(
    handle,
    0xC0,    // bmRequestType: IN, vendor, device
    0x01,    // bRequest: vendor-defined command
    0x0000,  // wValue
    0x0000,  // wIndex
    buffer,  // data buffer (filled on IN transfers)
    64,      // wLength
    1000     // timeout in milliseconds
);

if (ret < 0) {
    fprintf(stderr, "Transfer failed: %s\n", libusb_error_name(ret));
}

Standard requests like GET_DESCRIPTOR (bRequest = 0x06) are defined by the USB 2.0 specification and behave identically across all devices. Vendor requests use bRequest values in the vendor-defined range and must be discovered through documentation or protocol capture. The wValue and wIndex fields carry request-specific parameters; for standard descriptor requests, wValue encodes the descriptor type in the high byte and an index in the low byte.

Transfer Types in Practice

The four USB transfer types map to distinct use cases, and choosing the wrong one costs you either bandwidth, latency guarantees, or data integrity.

Bulk transfers are the workhorse for data movement where timing does not matter but correctness does. USB mass storage uses them, and so do most custom data-collection devices. They consume all available bus bandwidth, retry automatically on error, and provide no timing guarantees. Maximum packet sizes are 64 bytes for full-speed (12 Mbps) and 512 bytes for high-speed (480 Mbps) devices.

Interrupt transfers are polled by the host controller at a rate defined by the endpoint’s bInterval field, not interrupt-driven in the hardware sense. Every HID device uses them. For a full-speed device, bInterval specifies the polling period in milliseconds (1 to 255). For high-speed, it specifies a power-of-two number of 125μs microframes. A gaming mouse might request bInterval = 1 for 1ms polling; a keyboard typically uses 8ms. The host reserves bus bandwidth for the polling interval, giving interrupt transfers bounded latency that bulk transfers cannot provide.

Isochronous transfers are for streaming data where timing matters more than correctness, such as USB audio capture and video devices. Every USB frame (1ms at full-speed, 125μs at high-speed) allocates a guaranteed slice of bandwidth for isochronous endpoints. Packets that miss their window are dropped without retry. Libusb supports them through per-frame packet descriptors allocated with libusb_alloc_transfer(n_iso_packets), which makes setup considerably more involved than bulk or interrupt transfers.

A Working Driver in Python

For prototyping, pyusb wraps libusb with a clean Python API. This finds a device and reads from a bulk IN endpoint:

import usb.core
import usb.util

dev = usb.core.find(idVendor=0x1234, idProduct=0x5678)
if dev is None:
    raise ValueError("Device not found")

dev.set_configuration()

cfg = dev.get_active_configuration()
intf = cfg[(0, 0)]

ep_in = usb.util.find_descriptor(
    intf,
    custom_match=lambda e: (
        usb.util.endpoint_direction(e.bEndpointAddress) ==
        usb.util.ENDPOINT_IN and
        usb.util.endpoint_type(e.bmAttributes) ==
        usb.util.ENDPOINT_TYPE_BULK
    )
)

data = ep_in.read(512, timeout=1000)
print(bytes(data).hex())

On Linux, grant non-root access with a udev rule placed in /etc/udev/rules.d/:

SUBSYSTEM=="usb", ATTR{idVendor}=="1234", ATTR{idProduct}=="5678", MODE="0666"

On Windows, pyusb requires a WinUSB-compatible driver for the target device. The Zadig utility installs WinUSB for any connected device through a GUI, without needing a hand-written INF file or driver signing. This is the fastest path from zero to working USB access on Windows.

Approaching an Unknown Device

When documentation is unavailable, traffic capture is the most direct path. On Linux, loading the usbmon kernel module exposes raw USB traffic per bus through /dev/usbmonX. Wireshark with the USBPcap backend provides equivalent visibility on Windows, and macOS has built-in USB capture accessible through its packet capture interface.

The reverse-engineering workflow follows a consistent pattern: enumerate descriptors to identify the class and endpoints, capture traffic from the official software while performing known operations, identify request patterns in the capture, then replicate them with libusb. Control transfers typically appear early as initialization sequences; bulk or interrupt transfers carry the actual data exchange.

The device class code matters for choosing your tooling. A device advertising HID class (0x03) is better approached with hidapi than raw libusb. Hidapi handles the HID report framing protocol, which sits on top of interrupt transfers and carries its own report ID structure. On macOS, raw libusb access to an HID device requires detaching the kernel driver first (libusb_detach_kernel_driver()), which needs elevated privileges and can cause the OS to lose track of the device in ways that are annoying to recover from.

Userspace vs. Kernel: Where the Tradeoff Lands

Kernel USB drivers have access to the full USB stack. They can implement new device classes, respond to power management callbacks, and integrate with the kernel device model. They are also considerably harder to develop: the iteration cycle is slow, bugs can panic the system, and distribution on modern Windows requires driver signing through the WHQL program or an EV code signing certificate.

Userspace drivers via libusb bypass most of that complexity. The costs are an extra context switch per transfer and no ability to intercept OS-level device lifecycle events like suspend and resume. For custom hardware projects, data collection tools, and device-specific utilities, these costs rarely matter. Transfer latency overhead runs in the tens of microseconds, which is negligible unless you are doing real-time isochronous audio at high sample rates.

Libusb has maintained API stability since version 1.0 in 2008. The library abstracts over WinUSB on Windows, IOKit on macOS, and usbfs on Linux; application code stays portable across all three without platform-specific conditionals for basic transfers. That is a meaningful achievement given how differently each OS exposes USB to userspace. The libusb documentation is thorough and the error codes are specific enough to diagnose most failure modes without a USB analyzer.

The USB specification is public, the descriptor format is standardized, and introspection tools like lsusb and Wireshark are available everywhere. The barrier to writing a working userspace USB driver is mostly unfamiliarity with the concepts. Once the descriptor hierarchy makes sense and the setup packet structure is clear, communicating with most devices comes down to a few hundred lines of code and some time with a traffic capture.

Was this interesting?