· 6 min read ·

Pitch Detection Without an OS: The Embedded Rust Engineering Behind a Guitar Tuner

Source: lobsters

Orhun Parmaksiz, the developer behind git-cliff, ratatui, and several other well-known Rust projects, recently published tuitar: a guitar trainer that runs on bare-metal hardware, written entirely in Rust. The project targets a microcontroller, drives a small OLED display, and detects pitch from a microphone in real time. It is a modest-looking project in scope, but the engineering decisions underneath it reveal a lot about where embedded Rust stands in 2026 and what it actually costs to do real-time audio without an operating system.

The Hardware and Its Constraints

The project targets the Raspberry Pi RP2040, the dual-core Cortex-M0+ microcontroller at the heart of the Raspberry Pi Pico. The RP2040 has 264KB of SRAM, 2MB of flash, no hardware floating-point unit on the M0+ cores, and a 12-bit SAR ADC capable of roughly 500K samples per second. There is no operating system, no dynamic memory allocator by default, no standard library, and no audio subsystem of any kind. The CPU runs at 125MHz.

For a guitar tuner, the frequency range that matters spans from 82.4Hz (the low E string, E2) up to 329.6Hz (high E, E4). The Nyquist theorem requires a sample rate of at least twice the highest frequency of interest, but in practice you need to capture harmonics to make pitch detection reliable. A sample rate of 8kHz is sufficient, and 44.1kHz is easily achievable via DMA-fed ADC reads. The constraint is not bandwidth; it is compute and memory. Every algorithm choice traces back to those 264KB and that Cortex-M0+ core.

For audio input, the RP2040’s built-in ADC can read an analog microphone directly, but the INMP441 MEMS I2S microphone is a cleaner option. The RP2040’s PIO (Programmable I/O) state machines can bit-bang I2S at audio sample rates, delivering 24-bit digital samples without the DC offset and noise problems of analog ADC paths.

Why Not FFT

The naive approach to pitch detection is to take a Fast Fourier Transform of the input buffer and find the peak frequency bin. This works for many applications. For a guitar tuner, it runs into a fundamental resolution problem.

FFT frequency resolution is sample_rate / N, where N is the buffer length. At 8kHz with a 1024-point FFT, resolution is 7.8Hz per bin. The low E string sits at 82.4Hz; 7.8Hz is roughly a half-step of error at that frequency. To get sub-1Hz resolution, you would need N greater than 8000, requiring over 32KB just for a buffer of 32-bit floats. The RP2040 has 264KB total, shared among all tasks, display buffers, and the stack.

HPS (Harmonic Product Spectrum) refines the FFT approach by multiplying downsampled spectra together to reinforce the fundamental, but the resolution floor does not change. You are still constrained by N.

There is also a harmonics problem. Guitar strings produce rich overtone series. A naive FFT peak-finder will often land on the second or third harmonic rather than the fundamental, producing an octave error. Harmonic Product Spectrum mitigates this but does not eliminate it, and the compute cost of the full pipeline grows.

YIN and the Normalized Difference Function

The YIN algorithm, published by de Cheveigné and Kawahara in 2002, takes a different route. Instead of transforming to the frequency domain, it works directly on the autocorrelation structure of the time-domain signal, with a normalization step that addresses the octave-error problem explicitly.

The core is the difference function:

d(τ) = Σ (x[t] - x[t+τ])²

For each lag τ, this measures how dissimilar the signal is from a version of itself shifted by τ samples. When τ equals the true period of the signal, the difference is near zero. The problem is that τ=0 is always a perfect match, so the function has a trivial minimum that must be suppressed.

YIN solves this with the Cumulative Mean Normalized Difference Function:

d'(τ) = 1                                          if τ = 0
d'(τ) = d(τ) / [(1/τ) × Σ_{j=1}^{τ} d(j)]        otherwise

The denominator is the running mean of all d values up to τ. This normalization means that d’(τ) equals 1 on average across all lags, so a true periodicity shows up as a value well below 1 rather than as an absolute minimum in an unnormalized function. The threshold is typically set at 0.1 to 0.15. Any lag where d’(τ) crosses below the threshold is a candidate period.

Subsample precision comes from parabolic interpolation around the minimum, and the algorithm selects the smallest qualifying τ to prefer the fundamental over octave doublings.

For a 1024-sample buffer on the RP2040, the naive O(N²) implementation completes in roughly 500 microseconds at 125MHz. The FFT-accelerated variant, which computes the difference function via convolution, gets this below 150 microseconds. Either fits inside the audio processing budget at 8kHz.

The pitch-detection crate on crates.io implements both YIN and the McLeod Pitch Method in no_std-compatible Rust, operating on f32 sample slices. On a no-FPU target, f32 operations are handled in software; the micromath and libm crates provide the necessary math functions without requiring std.

Embassy and the Async Audio Architecture

The Embassy async runtime has become the dominant embedded Rust framework for new projects. Rather than the interrupt-driven RTIC model, Embassy provides an async executor for Cortex-M that makes multitasking feel like ordinary async Rust.

For a guitar tuner, the natural task decomposition is: one task continuously reads ADC samples and fills a buffer, another consumes completed buffers and runs pitch detection, and a third handles display updates. Embassy’s typed channels make the handoffs safe:

static CHANNEL: Channel<NoopRawMutex, [i16; 1024], 2> = Channel::new();

#[embassy_executor::task]
async fn audio_task(mut adc: Adc<'static, Async>, mut pin: Channel<'static>, 
                    sender: Sender<'static, NoopRawMutex, [i16; 1024], 2>) {
    let mut buf = [0i16; 1024];
    loop {
        for s in buf.iter_mut() {
            *s = adc.read(&mut pin).await.unwrap() as i16 - 2048;
        }
        sender.send(buf).await;
    }
}

The await points let the executor interleave work without threads or a scheduler. The buffer is centered on zero by subtracting 2048 (half the 12-bit ADC range) before pitch detection.

Cents offset from the target frequency, needed to show whether a string is sharp or flat, is computed as:

fn cents_off(detected: f32, target: f32) -> f32 {
    1200.0 * (detected / target).log2()
}

A ±50 cent range maps cleanly onto a horizontal bar on a 128x64 OLED.

The Ecosystem That Makes This Portable

The embedded-hal 1.0 release in December 2023 stabilized the trait set that peripheral driver crates depend on. The ssd1306 display driver, embedded-graphics for drawing, and the ADC and I2C interfaces all use these traits. The same driver code compiles for RP2040, STM32, ESP32, and nRF52 targets without modification; only the HAL implementation crate changes.

This portability was not always there. The pre-1.0 embedded-hal landscape had competing versions and driver crates that pinned to specific minor versions, creating dependency conflicts that consumed significant porting effort. The 1.0 stabilization resolved most of that.

Developer Experience

probe-rs handles flashing and debugging without leaving Cargo:

# .cargo/config.toml
[target.thumbv6m-none-eabi]
runner = "probe-rs run --chip RP2040"
cargo run --release

This flashes the binary and attaches RTT (Real-Time Transfer) logging, which streams structured defmt log messages from the device over the debug probe at essentially zero runtime overhead. Compared to the Arduino workflow of a separate IDE, manual port selection, and Serial.println() blocking calls, this is a substantial quality-of-life improvement.

The compile target is thumbv6m-none-eabi for Cortex-M0+. Rust’s cross-compilation support handles this with a single rustup target add thumbv6m-none-eabi. The memory.x linker script places the binary into the RP2040’s 2MB flash layout.

What This Represents

Projects like tuitar are interesting as a measure of ecosystem maturity. A few years ago, an embedded Rust guitar tuner on the RP2040 would have required significant manual work to get Embassy, the display driver, and a pitch detection algorithm coordinated in a no_std environment. The toolchain friction alone, particularly around cross-compilation and flashing, discouraged casual projects.

The combination of Embassy’s stable async API, embedded-hal 1.0, probe-rs, and a growing set of no_std-compatible algorithm crates has reduced that friction substantially. Orhun’s tuitar is evidence of that shift: a working, polished embedded audio project from a developer whose primary work is in userspace Rust tooling, not embedded systems.

The algorithmic choices (YIN over FFT, I2S over analog ADC, async tasks over interrupt handlers) are each well-matched to the specific constraints of the RP2040. That kind of constraint-aware design is what embedded work demands, and it is worth studying even if you never plan to pick up a soldering iron.

Was this interesting?