Reading the Map Before Writing the Driver: USB Descriptors for Software Developers

Most software developers have interacted with USB plenty of times without thinking about what happens below the OS device model. You plug in a device, the OS enumerates it, a driver loads, and your application talks to a nice abstraction. But occasionally you need to go lower: a custom hardware device with no official driver, a peripheral whose vendor ships a Windows-only tool you want to replicate on Linux, or a microcontroller project where you want to define your own protocol. At that point you need a real mental model of how USB works.

A recent writeup by werwolv covers the foundations well, but there is one aspect worth dwelling on longer than most introductions do: the descriptor hierarchy. It is the map USB hands you before you write a single line of transfer code, and understanding it properly changes how you approach an unfamiliar device.

The Descriptor Hierarchy Is Metadata You Can Read Before Anything Else

When a USB device connects to a host, the first thing that happens is enumeration. The host reads a series of descriptors from the device, which together describe everything needed to communicate with it. Software can read exactly this metadata through libusb without claiming any interface or sending custom data.

The hierarchy goes: Device > Configuration > Interface > Endpoint.

The device descriptor is 18 bytes. It contains the vendor ID (idVendor) and product ID (idProduct), the USB version (bcdUSB), the device class, and how many configurations the device supports. Most devices have one configuration.

A configuration descriptor describes one operating mode of the device and contains one or more interface descriptors. Interfaces represent distinct functional units. A USB headset, for example, might have an audio streaming interface and a HID interface for the volume buttons; these are separate interfaces within one configuration.

Each interface has endpoint descriptors. Endpoints are the actual communication channels. Endpoint zero is always present and always handles control transfers. Additional endpoints carry bulk, interrupt, or isochronous traffic, and the endpoint descriptor tells you the direction, the transfer type, and the maximum packet size.

Here is how you walk this hierarchy with libusb:

libusb_device **list;
ssize_t count = libusb_get_device_list(ctx, &list);

for (ssize_t i = 0; i < count; i++) {
    struct libusb_device_descriptor dev_desc;
    libusb_get_device_descriptor(list[i], &dev_desc);

    printf("VID: %04x  PID: %04x\n",
           dev_desc.idVendor, dev_desc.idProduct);

    struct libusb_config_descriptor *cfg;
    libusb_get_active_config_descriptor(list[i], &cfg);

    for (int iface = 0; iface < cfg->bNumInterfaces; iface++) {
        const struct libusb_interface *intf = &cfg->interface[iface];
        for (int alt = 0; alt < intf->num_altsetting; alt++) {
            const struct libusb_interface_descriptor *id =
                &intf->altsetting[alt];
            printf("  Interface %d (class %d)\n",
                   id->bInterfaceNumber, id->bInterfaceClass);
            for (int ep = 0; ep < id->bNumEndpoints; ep++) {
                const struct libusb_endpoint_descriptor *epd =
                    &id->endpoint[ep];
                printf("    Endpoint 0x%02x  type %d  maxpkt %d\n",
                       epd->bEndpointAddress,
                       epd->bmAttributes & 0x03,
                       epd->wMaxPacketSize);
            }
        }
    }
    libusb_free_config_descriptor(cfg);
}
libusb_free_device_list(list, 1);

The endpoint address encodes direction in bit 7: values 0x80 and above mean IN (device to host), values below 0x80 mean OUT. The bottom four bits are the endpoint number. The bmAttributes field’s bottom two bits give the transfer type: 0 = control, 1 = isochronous, 2 = bulk, 3 = interrupt.

Running this against an unknown device before doing anything else is often enough to tell you which interface you should target and what kind of transfers it expects.

Transfer Types Are Not Interchangeable

Developers coming from socket or file I/O backgrounds sometimes treat USB transfer types as interchangeable bandwidth tiers. They are not; each type has behavioral semantics that matter.

Control transfers use endpoint zero and follow a structured three-stage format: setup, optional data, status. Every USB device supports them. They handle device configuration and standard requests such as reading string descriptors or setting the active configuration. Devices often layer vendor-specific commands on top of control transfers as well, using the bRequest field to dispatch different operations.

Bulk transfers are the workhorse for moving large amounts of data where timing does not matter. A USB mass storage device uses bulk transfers. The host gives bulk traffic whatever bandwidth remains after higher-priority transfers are scheduled, so throughput varies under load. The hardware guarantees delivery through error detection and retransmission, but not timing.

Interrupt transfers have a guaranteed maximum latency. A USB keyboard uses interrupt transfers: the host polls the device on a schedule defined by the endpoint’s bInterval field (1 to 255 milliseconds), and the device sends a report if there is something to say. Despite the name, there is no hardware interrupt involved; the host polls on a fixed schedule and the OS surfaces the result to the driver.

Isochronous transfers are for time-sensitive streaming data. USB audio uses them. The host schedules a fixed bandwidth allocation and data flows at a constant rate. There is no retransmission on error, because for audio a late-arriving retransmit would be worse than the gap. This is the least commonly needed type in custom device work.

If you control the firmware on both ends, bulk transfers are usually the right default for command-response protocols. They are straightforward to implement on a microcontroller and handle large payloads cleanly without requiring you to manage a polling interval or think about isochronous scheduling.

libusb and Platform Differences

libusb 1.0 runs on Linux, macOS, Windows, FreeBSD, and several other platforms. On Linux it uses the kernel’s usbfs interface. On macOS it uses IOKit. On Windows it requires a WinUSB, libusbK, or libusb-win32 backend, which typically means installing a driver package using Zadig to attach the right backend to your specific device.

The core API is consistent across platforms once the backend is configured. You initialize a context, enumerate devices, open a handle, claim the interface you need, perform transfers, release, and close. The main portability concern is that on Linux you may need to either run as root or install a udev rule granting your user access to the device node.

For HID class devices specifically, HIDAPI is often the better choice. It wraps the native HID stack on Windows (no Zadig step required) and IOKit on macOS, with a libusb backend on Linux. The API is narrower: open by VID/PID, read and write report buffers. It does not expose the full descriptor tree, but for standard HID devices you rarely need that.

The distinction matters in practice. Writing a tool to talk to a game controller or a custom HID device, HIDAPI saves users the driver installation step on Windows, which is significant for anyone who is not already comfortable with Zadig and driver signing.

Reading What lsusb Shows

The lsusb -v command on Linux prints the complete descriptor tree for every connected device. This is the fastest way to inspect an unfamiliar device before writing any code. The output maps directly to the libusb data structures; every field has the same name and the same units. On macOS, System Information lists VID/PID and descriptor strings under the USB section in Hardware. On Windows, USB Device Tree Viewer gives equivalent detail.

Beyond static descriptor reading, Wireshark with USBPcap on Windows and the kernel’s usbmon module on Linux let you capture live USB traffic. This is the network packet capture equivalent for USB. If a vendor application already talks to the device, capturing that traffic and correlating it with descriptor information gives you most of what you need to replicate the protocol from scratch. The captures show the actual control requests, the bulk payloads, and the timing between transfers.

Where Userspace Drivers Are Used in Production

Several mature projects demonstrate what this approach produces at scale. libgphoto2 implements PTP and MTP over USB for hundreds of camera models entirely from userspace; it is the foundation for camera tethering on Linux. OpenRazer drives Razer peripherals on Linux using custom report-based protocols over HID. libfprint handles fingerprint readers and ships in most major Linux distributions. None of these required kernel modules.

The common pattern across all of them: read the descriptors to understand the device structure, identify which interface and endpoints carry the relevant traffic, capture or reverse-engineer the protocol if there is no documentation, then implement the transfer sequences. The descriptor hierarchy told them where to start; protocol analysis or vendor documentation filled in the rest.

The userspace approach also has a meaningful safety advantage over kernel driver development. A bug in a userspace driver crashes your process. A bug in a kernel driver can panic the machine. For most device work outside of performance-critical mass storage or streaming scenarios, userspace is the right layer to be in.

The werwolv article is a solid starting point for getting oriented in this space. The next step after reading it is to pick up a device, run the descriptor enumeration above against it, and see what the hierarchy tells you before writing anything else. That first read is usually more informative than searching for documentation, because the device carries its own map.