USB Reverse Engineering: The Layers That Make Wireshark Captures Readable
Source: lobsters
Reverse-engineering a USB device protocol sounds like it requires specialized hardware and deep kernel knowledge. In practice, the tools are already installed on your Linux machine, and the biggest barrier is knowing what the captured data represents before you start reading bytes. This walkthrough demonstrates the practical workflow well, but what makes it reliable rather than guesswork is the USB protocol’s own self-description mechanism.
What usbmon Actually Captures
On Linux, Wireshark captures USB traffic through usbmon, a kernel facility merged in Linux 2.6.11. It is not a separate driver; it hooks into the URB (USB Request Block) submission and completion paths inside usbcore. When usbcore submits a URB to the host controller and when the controller completes it, usbmon’s registered callbacks fire. The captured events are exposed as character devices at /dev/usbmon0 through /dev/usbmonN, one per bus, with usbmon0 aggregating all buses. libpcap reads from these with the DLT_USB_LINUX (189) or DLT_USB_LINUX_MMAPPED (220) link-layer types.
sudo modprobe usbmon
lsusb
# Bus 003 Device 007: ID 045e:028e Microsoft Corp. Xbox360 Controller
# This device is on usbmon3
sudo wireshark -i usbmon3 &
# or capture to file:
sudo tshark -i usbmon3 -w capture.pcapng
The critical point is that each Wireshark frame in a USB capture corresponds to a URB submission (marked ‘S’) or a URB completion (marked ‘C’), not wire-level packets. The actual token, data, and handshake packets that form each transaction at the physical layer are invisible unless you have dedicated hardware like a Total Phase Beagle or Ellisys analyzer. For protocol reverse engineering this almost never matters; URB-level capture gives you everything you need.
On Windows, USBPcap is a kernel-mode filter driver bundled with Wireshark since version 1.12. It exposes \\.\ USBPcap0, \\.\USBPcap1, and so on per root hub. On macOS there is no comparable built-in facility; the practical workaround is a Linux VM with USB passthrough and usbmon running inside it.
Transfer Types Tell You the Protocol Shape
USB defines four transfer types, and identifying which type a device uses for each operation tells you the timing model, reliability guarantee, and payload size constraints before you read a single payload byte.
Control transfers use endpoint 0, are bidirectional, and follow a fixed three-phase structure: an 8-byte SETUP packet, an optional DATA phase, then a STATUS acknowledgment. The SETUP packet layout is fixed across all USB devices by the specification:
bmRequestType bRequest wValue wIndex wLength
1 byte 1 byte 2 bytes 2 bytes 2 bytes
bmRequestType bits:
bit 7: direction (0=host->device, 1=device->host)
bits 6:5: type (00=Standard, 01=Class, 10=Vendor)
bits 4:0: recipient (Device, Interface, Endpoint, Other)
Vendor-specific control transfers, where the type bits equal 10, are the first place to look for undocumented command channels. If a device sends anything before data flows on other endpoints, it is almost certainly a control transfer configuring the device state.
Interrupt transfers are polled at a fixed interval defined in the endpoint descriptor. The host initiates every transaction; the device responds with data or a NAK. HID devices, keyboards, mice, and game controllers all use interrupt IN for their primary data stream. A gaming mouse configured for 1000 Hz sends one 8-byte interrupt IN transfer per millisecond. Interrupt transfers guarantee polling frequency but do not guarantee delivery latency.
Bulk transfers carry large payloads with no timing guarantee and automatic retries on error. Mass storage, USB-serial adapters, USB-Ethernet dongles, and printers use bulk. Full-speed bulk tops out around 1.2 MB/s; high-speed bulk can theoretically reach 53 MB/s. If you see large payloads on non-endpoint-0 endpoints with no periodic pattern, it is almost certainly bulk.
Isochronous transfers trade error recovery for guaranteed bandwidth and timing. Audio and video devices use this type; a USB microphone sends samples at a fixed rate with no retries because a dropped frame is preferable to a late one.
The Descriptor Hierarchy Is the Map
Before any payload transfer occurs, the host enumerates the device by requesting its descriptor tree. This hierarchy is the device’s machine-readable self-description, and reading it before opening Wireshark gives you a map of every communication channel and its properties.
At the top is the Device Descriptor (18 bytes): VID, PID, USB spec version, device class, and configuration count. Below it sit Configuration Descriptors, each containing Interface Descriptors, each containing Endpoint Descriptors. Every endpoint listed has a defined transfer type, direction, maximum packet size, and polling interval.
Device Descriptor
idVendor=0x045e idProduct=0x028e bcdUSB=2.00
bNumConfigurations=1
Configuration Descriptor
bNumInterfaces=4
Interface Descriptor (bInterfaceClass=0x03 -> HID)
HID Descriptor -> points to Report Descriptor
Endpoint 0x81: Interrupt IN, wMaxPacketSize=20, bInterval=1ms
Endpoint 0x01: Interrupt OUT, wMaxPacketSize=8, bInterval=8ms
You can dump this hierarchy without capturing any traffic:
# Verbose dump for a specific device by VID:PID
lsusb -v -d 045e:028e
# Readable summary for all connected devices
usb-devices
The endpoint list from lsusb -v tells you which endpoint numbers exist, which direction each runs, which transfer type each uses, and the maximum packet size. When you set a Wireshark display filter targeting a specific endpoint, you are already working from a known structure rather than guessing at what the numbers mean.
HID Report Descriptors: The Protocol Specification Embedded in the Device
Standard HID class devices (bInterfaceClass=0x03) carry something vendor-specific devices do not: a Report Descriptor that specifies the exact bit-level layout of every data packet. The device ships with a machine-readable description of its own protocol. This is why HID reverse engineering tends to converge faster than working with proprietary protocols.
The Report Descriptor uses a compact tag/type/size encoding. Each item is one to five bytes: the first byte encodes a four-bit tag, a two-bit type (Main, Global, or Local), and a two-bit data size, followed by zero to four bytes of value data.
# Extract via sysfs without any packet capture
cat /sys/bus/usb/devices/3-1/3-1:1.0/0003:045E:028E.0001/report_descriptor | xxd
# Or use usbhid-dump from the usbutils package
sudo usbhid-dump -d 045e:028e -a -e all
A three-button mouse report descriptor typically decodes to: 3 bits for button states, 5 padding bits, 1 signed byte for X delta, 1 signed byte for Y delta. Every interrupt IN packet from that device is exactly 3 bytes structured this way. Wireshark dissects it automatically in the usbhid protocol layer for recognized devices.
For gaming peripherals that use vendor-specific report IDs or non-standard layouts, the hid-tools package from kernel.org provides hid-recorder, which captures and decodes HID events live while printing the parsed report descriptor alongside raw event bytes. The usbhid-dump tool from usbutils is often the faster starting point since it queries the descriptor directly without requiring any capture infrastructure.
Filtering the Enumeration Noise
Enumeration traffic fires every time a device connects and adds substantial noise to a capture session. A few focused display filters isolate what matters:
# Only data-carrying packets
usb.data_len > 0
# Interrupt IN transfers (HID primary data stream)
usb.transfer_type == 1 && usb.endpoint_address.direction == 1
# Specific device on bus 3, device address 7
usb.bus_id == 3 && usb.device_address == 7
# Vendor-specific control transfers only
usb.transfer_type == 2 && usb.setup.bm_requesttype.type == 2
# Successful completions only
usb.urb_type == 0x43 && usb.urb_status == 0
For scripted analysis, tshark with -T fields extracts raw payload bytes into a form suitable for pattern analysis:
tshark -r capture.pcapng \
-Y "usb.transfer_type==1 && usb.endpoint_address==0x81 && usb.data_len > 0" \
-T fields -e frame.number -e usb.capdata -E separator=,
This produces a CSV of raw payload bytes from interrupt IN transfers. Correlating byte changes with physical actions then becomes straightforward: press a button, observe which bit flips; move an axis, find the byte that changes proportionally.
From Capture to a Working Driver
Once the protocol is understood, libusb 1.0 provides the path from understanding to implementation. On Linux, it can detach the kernel driver from a specific interface and claim it from userspace:
libusb_context *ctx;
libusb_device_handle *dev;
libusb_init(&ctx);
dev = libusb_open_device_with_vid_pid(ctx, 0x045e, 0x028e);
if (libusb_kernel_driver_active(dev, 0))
libusb_detach_kernel_driver(dev, 0);
libusb_claim_interface(dev, 0);
// Read one interrupt IN report (endpoint 0x81, 20 bytes for Xbox controller)
unsigned char buf[20];
int transferred;
libusb_interrupt_transfer(dev, 0x81, buf, sizeof(buf), &transferred, 5000);
// Send a rumble command (interrupt OUT, endpoint 0x01)
unsigned char rumble[8] = {0x00, 0x08, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0x00};
libusb_interrupt_transfer(dev, 0x01, rumble, sizeof(rumble), &transferred, 5000);
On Windows, Zadig replaces the native class driver with WinUSB or libusb-win32, after which the same libusb API works without modification. Python projects can use PyUSB, which wraps libusb with a higher-level interface and is often faster for prototyping a protocol decoder before writing production C.
The Linux kernel’s own drivers for many common devices were developed through exactly this workflow. The xpad driver for Xbox controllers, ftdi_sio for FTDI serial chips, and asix for ASIX-based USB-Ethernet dongles were all built by capturing Windows driver traffic and decoding the protocol from Wireshark sessions. The xboxdrv project went further and implemented a complete userspace driver using libusb rather than a kernel module, which means it runs as an ordinary process with no kernel code to maintain and no risk of crashing the system on a bug.
The Mental Model That Makes It Work
The tools for USB reverse engineering have been stable and freely available for years. usbmon has been in the Linux kernel since 2005; libpcap’s USB support followed not long after; USBPcap has been bundled with Wireshark since version 1.12 in 2014. Getting from a successful capture to a working driver depends on the mental model more than on access to tools.
Wireshark shows URBs, not wire packets. The descriptor hierarchy describes the device’s communication channels before any payload arrives. Transfer type determines the timing and reliability contract for each channel. HID devices carry their own protocol specification in the report descriptor, accessible through sysfs before you start a capture at all.
With those three anchors in place, a capture session becomes a process of correlation rather than excavation, and a screen of hex bytes becomes a protocol conversation with a known structure.