Your Speakers Are Already a Microphone: The Hardware Feature Nobody Talks About
Source: hackernews
Back in 2017, researchers at Ben-Gurion University published a paper called SPEAKE(a)R: Turn Speakers to Microphones for Fun and Profit at USENIX’s Workshop on Offensive Technologies. It resurfaced on Hacker News recently and is worth a proper technical read even now, because the underlying hardware behavior it exploits has not gone away.
The core claim is deceptively simple: using only software, you can reconfigure the audio output jack on a typical desktop or laptop into a working microphone input. No hardware modification, no soldering, no physical access. The attack captured intelligible speech from across a room.
The Physics First
Speakers and microphones are, at a fundamental level, the same device. Both consist of a coil of wire suspended in a magnetic field, attached to a diaphragm. In a speaker, current flowing through the coil creates a force via the Lorentz law, moving the diaphragm and producing sound. In a microphone, incoming sound pressure moves the diaphragm, which moves the coil through the magnetic field, inducing a small current via Faraday’s law of electromagnetic induction.
This reversibility is well known in audio engineering. Studio engineers have used dynamic microphones as headphone drivers in a pinch for decades. Headphones work as surprisingly decent microphones; the small drivers and lightweight diaphragms respond well to sound pressure. Guitar amp speakers used as microphones (recording directly in front of the cone) produce a thick, colored sound that some producers deliberately seek out.
The question SPEAKE(a)R asks is: can this be exploited silently, from software, on commodity hardware?
Jack Retasking: The Feature That Makes It Possible
Modern PC audio is built on the Intel High Definition Audio (HDA) specification, finalized in 2004. HDA replaced the older AC’97 standard and is the basis for virtually every integrated audio codec found in consumer PC hardware today, predominantly chips from Realtek (ALC series) but also Conexant, IDT, and others.
The HDA spec defines a verb-based command system for communicating with the codec over a dedicated bus. Crucially, the spec includes pin capability and configuration commands that allow software to query and set the direction of individual audio pins. This feature exists for legitimate reasons: a single physical 3.5mm jack on a laptop might need to serve as a headphone output in one context and a microphone input in another, or switch between line-in and line-out depending on what is plugged in. The codec handles impedance sensing and jack detection, and the driver can respond by remapping the pin function.
This is called jack retasking or jack remapping. On Linux it has been exposed for years through ALSA’s kernel interface. You can observe the capability yourself:
# List codec nodes and their capabilities
cat /proc/asound/card0/codec#0
# Use hda-verb to send raw HDA commands
hda-verb /dev/snd/hwC0D0 0x14 SET_PIN_WIDGET_CONTROL 0x20
# 0x14 is a typical front speaker node
# 0x20 sets it to input mode
The hdajacksensetest utility and the ALSA hda-jack-retask tool expose this through a friendlier interface. The Linux kernel has supported pin reconfiguration via sysfs under /sys/class/sound/ since at least kernel 2.6. Windows exposes equivalent functionality through the audio driver’s private IOCTLs, and the Realtek HD Audio driver in particular processes such commands through its codec control interface.
The SPEAKE(a)R researchers wrote a small userland program that sends the appropriate HDA verbs to flip a speaker output node into input mode, then reads the resulting PCM stream through the standard audio capture API. No kernel module, no elevated privilege beyond what audio access requires on most desktop configurations.
What the Attack Actually Captured
The paper measured performance using a pair of standard PC speakers placed at varying distances from a sound source. At 6 meters, the captured speech remained intelligible. They analyzed the frequency response of the repurposed speaker output and found usable signal up to roughly 10 kHz, which covers the full range of human speech and then some. For comparison, telephone audio is typically band-limited to 300 Hz to 3.4 kHz; the captured signal quality far exceeds that.
Signal processing mattered here. The induced voltage in a speaker coil is small, maybe a few millivolts at conversational distances, and the codec’s input stage is not optimized for such a low-impedance, low-voltage source. The researchers applied gain normalization and basic filtering in software to extract clean audio from what the raw ADC saw as a noisy, weak signal. Nothing exotic: standard DSP operations available in any audio processing library.
Headphones performed even better than full-range speakers as covert microphones. The lighter diaphragms and smaller coils in headphone drivers have better high-frequency sensitivity. A victim plugging headphones into an output jack and then stepping away from the computer leaves behind a working surveillance microphone that no software indicator will flag, because from the OS’s perspective, the jack has been reconfigured to a recognized audio input.
The Air-Gap Angle
The reason the Ben-Gurion group studied this is that it fits into a broader research program on covert exfiltration from air-gapped machines. If a machine has no network connection, no Bluetooth, no Wi-Fi, an attacker who has already compromised it needs another channel to extract data or conduct surveillance. The group has published extensively on acoustic, electromagnetic, and optical covert channels in this context.
SPEAKE(a)R is compelling in this space because speakers are ubiquitous on air-gapped workstations. Security-conscious organizations often physically remove or disable microphones; cameras get taped over. Speakers are generally considered benign output-only devices. The retasking attack subverts that assumption without requiring any hardware modification that a physical inspection would catch.
Mordechai Guri, the lead author, has published a long series of papers in this vein: GAIROSCOPE uses gyroscope sensors to receive ultrasonic transmissions, AirHopper encodes data in FM radio emissions from GPU memory access patterns, MOSQUITO uses ultrasonic speaker-to-speaker communication between two nearby air-gapped machines. SPEAKE(a)R sits comfortably in that tradition.
What Has Changed Since 2017
Not much, architecturally. The HDA spec is still the basis for virtually all consumer PC audio. Realtek ALC codecs are in essentially every Intel and AMD platform shipping today. Jack retasking remains a standard feature.
Some mitigations exist but none are universal. BIOS/UEFI firmware can disable the jack retasking capability for specific pins, preventing software from overriding the default pin configuration. Some enterprise audio policies lock down which applications can access audio hardware at all. The PulseAudio and PipeWire permission models on Linux add some friction but no hard technical barrier, since the underlying ALSA interface still permits direct codec access for processes with sufficient privilege.
Windows has not, to my knowledge, introduced any specific control over jack retasking in the years since the paper. The Windows audio driver model gives Realtek’s user-mode driver significant latitude in configuring the codec, and third-party audio software can interact with the codec through documented and undocumented paths.
Physical mitigation is the most reliable: using external USB audio adapters disconnects the speaker output hardware from the HDA codec entirely. A USB audio interface has no HDA pins to retask. The speaker output on a USB DAC is handled by firmware on the USB device itself, which is not addressable via HDA verbs from the host.
Why This Still Matters
This paper is nine years old and surfaces periodically on aggregators, each time prompting a round of “I knew speakers could do this” comments mixed with genuine surprise. The reason it keeps feeling relevant is that the threat model it describes remains intact and the defensive gap it exposes is rarely addressed in practice.
Most endpoint security focuses on traditional attack surfaces: process injection, driver exploits, network egress. Audio hardware reconfiguration is not a common detection category. A process that opens an audio capture device after silently retasking a speaker output looks, from an audit log perspective, like an application recording from a microphone that happens to exist on the system. Distinguishing between a legitimately configured microphone input and a retasked speaker output requires either firmware-level attestation of pin configuration or kernel-level auditing of HDA codec verbs, neither of which is standard.
For the Discord bot work I do and the systems-level tinkering that goes with it, this paper is a good reminder that hardware abstractions leak. The kernel presents you a clean audio API; underneath it, codec pins are configurable objects with capabilities that the higher-level model does not fully surface. Security assumptions built at the wrong layer of the stack tend not to hold.
If you have not read the original paper, it is worth the thirty minutes. The signal processing sections are detailed, the experimental setup is straightforward to reproduce, and the threat model section is honest about the constraints. It requires prior compromise of the target machine, which is a significant prerequisite. But for an attacker who already has code execution and is looking for persistent covert audio access on a machine whose microphone has been disabled or removed, the technique is elegant precisely because it requires no new hardware capability: just a different interpretation of one that was already there.