· 6 min read ·

What Pitch Detection Demands From a Microcontroller: Inside an Embedded Rust Guitar Trainer

Source: lobsters

When orhun published tuitar, a guitar trainer built with embedded Rust, the immediate reaction was predictable: appreciation for the hardware demo, interest in the Rust embedded ecosystem, the usual discussion about whether this should have been written in C. The more interesting thread is what pitch detection on a microcontroller actually demands algorithmically, and why those demands expose something real about audio DSP on constrained hardware.

Why Pitch Detection Is Not “Find the Frequency”

Guitar strings produce a complex waveform. When you pluck an open low E string, you hear 82.41 Hz as the fundamental, but the waveform contains energy at 164.82 Hz, 247.23 Hz, 329.63 Hz, and beyond. Depending on how and where the string is plucked, the second or third harmonic can carry more spectral energy than the fundamental itself.

This is the fundamental frequency estimation problem. A naive FFT-based approach finds the spectral peak with the most energy, which may not be the fundamental. Human pitch perception works on the periodicity of the waveform, not on which spectral component is loudest, so frequency-domain approaches need additional machinery to reconstruct the true fundamental from a harmonic series.

Time-domain algorithms sidestep this by working directly with the period of the waveform. The classical autocorrelation function R(τ) measures how similar a signal is to a time-shifted version of itself. A periodic signal with period T produces peaks in the autocorrelation at multiples of T. The problem is that the lag-zero peak is always the largest, and noise creates false peaks at sub-period lags, making the naive approach unreliable in practice.

The YIN algorithm, published by de Cheveigné and Kawahara in 2002, addresses both problems. Instead of computing autocorrelation directly, YIN computes a difference function:

d(τ) = Σ (x[j] - x[j+τ])²

For a perfectly periodic signal, d(τ) reaches zero at the true period T. YIN normalizes this with the Cumulative Mean Normalized Difference Function (CMNDF), which scales each value by the running mean of all lags up to that point, forcing d’(0) = 1 always. The algorithm then finds the first lag where d’(τ) drops below a threshold, typically 0.1. This combination handles harmonic interference and low-frequency noise substantially better than plain autocorrelation.

The McLeod Pitch Method takes a similar approach with a different normalization based on the normalized square difference function (NSDF), which handles low signal levels more cleanly than YIN in some edge cases. Both methods are O(N²) in the naive form but can be reduced to O(N log N) using FFT-based convolution for the autocorrelation step, which matters when buffer sizes get large.

What the RP2040 Can and Cannot Do for Audio

The Raspberry Pi RP2040 is a reasonable platform for this: dual Cortex-M0+ cores at 133 MHz, 264 KB of SRAM across six banks, a 12-bit ADC capable of 500 kSa/s, and a 12-channel DMA controller. The constraints matter in specific ways.

The Cortex-M0+ has no hardware floating-point unit. Every f32 operation compiles to a software library call costing 20-50 cycles rather than 1-4. For a pitch detection algorithm running over a 1024-sample buffer, this accumulates. The practical options are to implement the algorithm in fixed-point arithmetic, accept the overhead at low sample rates where there is CPU headroom, or move to the RP2350, which uses Cortex-M33 with an optional FPU.

For guitar tuning, you only need to detect fundamentals between 82 Hz and 330 Hz. The Nyquist theorem requires a sample rate above twice the highest frequency of interest. At 8 kHz, you capture guitar fundamentals with comfortable margin. A 1024-sample buffer at 8 kHz gives 128 ms of audio per analysis window, containing roughly 10 full cycles of the lowest string. That is enough for stable YIN or McLeod detection.

The standard guitar string frequencies make the frequency range concrete:

StringNoteFrequency
6thE282.41 Hz
5thA2110.00 Hz
4thD3146.83 Hz
3rdG3196.00 Hz
2ndB3246.94 Hz
1stE4329.63 Hz

Configuring the ADC through embassy-rp for async sample collection looks roughly like this:

use embassy_rp::adc::{Adc, Channel, Config, InterruptHandler};
use embassy_rp::bind_interrupts;

bind_interrupts!(struct Irqs {
    ADC_IRQ_FIFO => InterruptHandler;
});

#[embassy_executor::task]
async fn audio_task(
    mut adc: Adc<'static, embassy_rp::adc::Async>,
    mut pin: Channel<'static>,
) {
    let mut buf = [0u16; 1024];
    loop {
        for sample in buf.iter_mut() {
            *sample = adc.read(&mut pin).await.unwrap();
        }
        // hand off buffer to pitch detection
    }
}

In production you would use DMA to fill the buffer without polling each sample, freeing the CPU for other work between buffer captures. The RP2040’s ADC FIFO paired with a DMA channel allows continuous audio capture at a precise rate with no CPU involvement until the buffer is full.

The pitch-detection Crate

The pitch-detection crate provides both YIN and McLeod implementations in Rust. It is no_std compatible with alloc, meaning it works on embedded targets as long as you configure a heap allocator such as embedded-alloc. The McLeod detector is generally the better default:

use pitch_detection::detector::mcleod::McLeodDetector;
use pitch_detection::detector::PitchDetector;

const SAMPLE_RATE: usize = 8000;
const BUF_SIZE: usize = 1024;
const POWER_THRESHOLD: f32 = 5.0;
const CLARITY_THRESHOLD: f32 = 0.7;

let mut detector = McLeodDetector::new(BUF_SIZE, BUF_SIZE / 2);

if let Some(pitch) = detector.get_pitch(
    &samples_f32,
    SAMPLE_RATE,
    POWER_THRESHOLD,
    CLARITY_THRESHOLD,
) {
    // pitch.frequency in Hz
    // pitch.clarity is 0.0-1.0 confidence
}

The clarity field is useful for filtering noisy frames. When a string rings cleanly, clarity approaches 1.0. During the attack transient or between notes, it drops, and you can skip those frames to avoid displaying unstable readings.

The ADC produces u16 samples centered around 2048 for a 12-bit range. Before passing them to the detector, you center and normalize:

let samples_f32: [f32; BUF_SIZE] = core::array::from_fn(|i| {
    (buf[i] as f32 - 2048.0) / 2048.0
});

A plain fixed-size array works here and avoids any allocation entirely. This matters on a chip where the total heap might be only a few kilobytes, and fragmentation from repeated allocations can cause hard-to-debug failures.

Embassy’s Cooperative Task Model for Real-Time Audio

Embassy uses a cooperative multitasking executor with static task allocation. There is no preemptive scheduler; tasks yield at .await points. For audio, this means the sample collection task, the pitch processing task, and the display update task coexist without priority inversion or context-switch overhead.

A practical task structure: an audio task fills the sample buffer via ADC or DMA and signals a channel when complete; a processing task receives the buffer, runs McLeod detection, and computes the nearest note name and cent deviation; a display task reads the current pitch state and updates an OLED screen over I2C.

defmt handles debug logging efficiently through a probe connection using binary encoding, which keeps logging overhead low enough to leave in timing-sensitive code. embedded-graphics and the ssd1306 driver provide the display layer, with a reasonably expressive 2D API for drawing the pitch indicator and note name.

What Rust Adds to This Problem

Doing this in C with the Pico SDK would work. The Pico SDK is competent, the C ecosystem for pitch detection is mature, and the performance characteristics would be similar. The difference shows up in specific places.

Rust’s ownership model catches a common bug in real-time audio pipelines: the producer and consumer of a sample buffer accessing it concurrently. In C, protecting the handoff requires discipline and explicit synchronization. In Rust, the compiler refuses to compile code that allows a mutable reference and any other reference to the buffer to coexist across the task boundary without explicit synchronization primitives. Embassy provides channel and signal primitives for exactly this.

The no_std constraint also enforces a clearer mental model. When Vec::push is unavailable without a heap allocator, every buffer and collection needs a size at compile time. This surfaces the relationship between buffer size, latency, and detection accuracy as a compile-time concern rather than a runtime discovery. For audio DSP, where those parameters directly affect both correctness and user experience, the forced explicitness is useful.

Tuitar is a well-scoped project at an interesting intersection. Guitar pitch detection is hard enough to require non-trivial algorithm choices without being so hard that the embedded constraints overwhelm the application logic. The result demonstrates what the embedded Rust ecosystem can deliver today: production-quality crates for pitch detection, async task scheduling, display drivers, and efficient logging, all composable without a runtime or operating system underneath.

Was this interesting?