Beyond volatile: How Rust Encodes Hardware Access Semantics in the Type System
Source: lobsters
C’s volatile qualifier has one job: tell the compiler not to optimize away a memory access. It does that job and only that job. The keyword says nothing about whether a register is read-only or write-only, nothing about valid bit-field values, nothing about whether two pieces of code might simultaneously access the same peripheral, and nothing about memory ordering. A volatile cast in C is syntactically trivial and semantically impoverished.
The Ferrous Systems article on hardware access in Rust describes how Rust approaches this domain. What the article shows is not just a safer way to write the same code, but a layered system where each level encodes a class of hardware access semantics that C’s type system cannot represent at all. The floor is explicit unsafe volatile access. The ceiling is portable, async-capable driver code that compiles to the same machine code as a careful C implementation.
The Floor: Making Unsafe Visible
In C, accessing a memory-mapped I/O register looks like this:
volatile uint32_t *const GPIOA_ODR = (volatile uint32_t *)0x40020014;
*GPIOA_ODR = 0x00000001;
uint32_t val = *GPIOA_ODR;
The cast from integer to pointer is silent. Nothing marks this code as particularly dangerous at the call site. The volatile qualifier prevents the compiler from eliding the access, but it carries no information about whether reading this register is valid, whether the bit pattern is legal, or whether concurrent access from an interrupt handler creates a hazard.
Rust exposes volatile semantics through two functions in core::ptr:
pub unsafe fn read_volatile<T>(src: *const T) -> T;
pub unsafe fn write_volatile<T>(dst: *mut T, src: T);
Raw MMIO access in Rust:
const GPIOA_ODR: *mut u32 = 0x4002_0014 as *mut u32;
unsafe {
core::ptr::write_volatile(GPIOA_ODR, 0x0000_0001);
let val = core::ptr::read_volatile(GPIOA_ODR);
}
The unsafe block is mandatory. The dangerous operation is visible at the call site, and a reviewer can see exactly where hardware assumptions are being made. That is the complete difference at this level: not safety by default, but clarity about where the safety guarantees end and where you are responsible for correctness yourself.
One thing worth being explicit about: neither C volatile nor Rust’s read_volatile/write_volatile imply memory barriers. On ARM Cortex-M, if you write to a register that feeds a DMA controller, you still need a data memory barrier via cortex_m::asm::dmb() before the DMA transfer begins. The volatile primitives guarantee the access happens in program order relative to other volatile accesses; they say nothing about ordering relative to other bus masters or hardware state machines.
The PAC Layer: Where the Type System Starts Doing Work
Writing raw volatile access by hand is correct but expresses nothing. There is no type-level distinction between reading a read-only input data register and writing a control register with complex bit fields. A *mut u32 is a *mut u32.
svd2rust changes this. Chip vendors distribute SVD (System View Description) files: XML documents that describe a microcontroller’s complete register map, including peripheral base addresses, register offsets, field positions and widths, access types (read-only, write-only, read-write, write-once), and enumerated values for fields. svd2rust reads an SVD file and generates a Rust Peripheral Access Crate (PAC) from it, with every register backed by volatile reads and writes under the hood.
The generated API for a register looks like this:
let dp = stm32f4::stm32f407::Peripherals::take().unwrap();
let gpioa = &dp.GPIOA;
// Read bit 5 of the input data register
let pin5 = gpioa.idr.read().idr5().bit_is_set();
// Set bit 5 of the output data register, leaving others unchanged
gpioa.odr.modify(|_, w| w.odr5().set_bit());
// Configure PA5 as output (mode bits = 0b01)
gpioa.moder.write(|w| unsafe { w.moder5().bits(0b01) });
Every read() call is backed by read_volatile. Every write() and modify() uses write_volatile. But the generated types enforce what C cannot. If a register is marked read-only in the SVD file, the generated type has no write() method, and attempting to call it is a compile error. If a field has enumerated values, the generated writer type accepts only a Rust enum; you cannot pass an arbitrary bit pattern without unsafe. The unsafe on w.moder5().bits(0b01) above is present precisely because MODER5 has no enumerated values in the SVD, so the type system cannot validate the bit pattern and the caller explicitly takes responsibility.
The Peripherals::take() singleton pattern addresses peripheral aliasing. Internally it flips a static mut bool and panics on a second call. This means two parts of a program cannot simultaneously hold access to GPIOA without going through Peripherals::steal(), which is unsafe. Aliased mutable peripheral references are not prevented by the borrow checker at the register struct level, but the singleton makes duplication an explicit, visible choice.
The HAL Layer: Abstracting Across Chips
PAC crates are chip-specific. Code written against the STM32F4 PAC will not compile for an nRF52840 or an RP2040. The embedded-hal crate provides a solution: a set of traits that abstract hardware peripherals at a functional level, independent of any specific chip’s register layout.
Since its 1.0 release in December 2023, embedded-hal provides stable traits for digital I/O, SPI, I2C, UART, and delays. The core design:
pub trait OutputPin {
type Error;
fn set_high(&mut self) -> Result<(), Self::Error>;
fn set_low(&mut self) -> Result<(), Self::Error>;
}
pub trait SpiBus<Word = u8> {
type Error;
fn transfer(&mut self, read: &mut [Word], write: &[Word]) -> Result<(), Self::Error>;
fn flush(&mut self) -> Result<(), Self::Error>;
// ...
}
A HAL crate for a specific chip, for example stm32f4xx-hal or rp2040-hal, implements these traits using the underlying PAC. A driver crate for an SSD1306 display or a temperature sensor implements its logic against SpiBus, not against any specific chip’s SPI peripheral. That driver then works on any platform whose HAL implements the trait. The monomorphization is zero-cost: the compiler sees through the trait boundary at compile time and generates direct register access code with no dynamic dispatch.
The accompanying embedded-hal-async crate mirrors these traits with async fn methods, enabling await-based hardware access in executors like Embassy. A driver crate can implement both blocking and async variants without duplicating its core logic.
The 1.0 release resolved a long period of ecosystem instability. The dominant 0.x version accumulated years of real-world use alongside real-world design mistakes, and the migration broke compatibility throughout the stack. The embedded-hal-compat crate now bridges 0.x and 1.0 during the transition period, but the release settled what the standard interfaces would be.
The Full Stack, Zero Runtime Cost
Put it together and a typical embedded Rust application looks like this:
#![no_std]
#![no_main]
use cortex_m_rt::entry;
use stm32f4xx_hal::{pac, prelude::*};
use panic_halt as _;
#[entry]
fn main() -> ! {
let dp = pac::Peripherals::take().unwrap();
let cp = cortex_m::peripheral::Peripherals::take().unwrap();
let rcc = dp.RCC.constrain();
let clocks = rcc.cfgr.sysclk(84.MHz()).freeze();
let gpioa = dp.GPIOA.split();
let mut led = gpioa.pa5.into_push_pull_output();
let mut delay = cp.SYST.delay(&clocks);
loop {
led.set_high();
delay.delay_ms(500_u32);
led.set_low();
delay.delay_ms(500_u32);
}
}
The compile-time guarantees packed into this code: pa5 cannot be used as an output before into_push_pull_output() is called; set_high() cannot be called after led is consumed or reconfigured as an input; no other code in the program can alias GPIOA without an explicit unsafe call to Peripherals::steal(). These guarantees cost nothing at runtime. The generated assembly is equivalent to what a careful C programmer would write by hand, and the optimizer has full visibility into the entire call chain because all abstraction layers inline completely.
The cortex-m crate sits alongside the PAC, providing access to ARM Cortex-M core peripherals that are common across all ARM chips: NVIC for interrupt control, SysTick, the SCB for reset and sleep modes, the DWT cycle counter, and assembly intrinsics. Where a C programmer using CMSIS would write __DMB(), a Rust embedded programmer writes cortex_m::asm::dmb(). The semantics are identical; the wrapping is explicit.
The Higher Layers
Two frameworks compete for the embedded application story above the HAL layer. RTIC (Real-Time Interrupt-driven Concurrency) uses compile-time static analysis to schedule interrupt-driven tasks and enforce resource ownership between them. RTIC 2.0 added native async task support. Embassy takes an async-first approach, providing a no_std task executor, async timer abstractions, and async peripheral drivers that implement embedded-hal-async. Both compile to small, predictable binaries on Cortex-M. Both use embedded-hal at their peripheral interfaces, so driver crates written against the trait work in either framework.
The Linux kernel’s Rust support, which has grown substantially since merging in kernel 6.1 and accelerated through 6.8 and beyond, uses its own MMIO abstraction in kernel::io, wrapping readl/writel equivalents in a bounds-checked IoMem<SIZE> type. The embedded ecosystem and the kernel use separate libraries, but the design reasoning is identical: wrap raw volatile access in a type that encodes valid access patterns, and let the compiler enforce them.
What This Is Actually Doing
C’s volatile is a necessary primitive. Nothing in Rust replaces the underlying need for volatile semantics when talking to hardware. What the Rust embedded ecosystem does is treat that primitive as a floor, not a ceiling.
Each layer above the raw volatile call encodes something that C can only express through documentation and convention. The PAC layer encodes register access permissions and field value validity. The HAL layer encodes peripheral identity and protocol semantics. The singleton peripheral pattern encodes ownership. The type-level distinction between input and output pins encodes configuration state.
None of this is magic, and none of it is novel as a concept. C programmers have been maintaining these invariants by hand for decades. The difference is that Rust makes the compiler the enforcer rather than the developer. The cost of enforcement is zero at runtime because it happens entirely at compile time. That is the argument for Rust in embedded systems: not that the resulting binary is any different from what a careful C programmer would produce, but that the class of bugs caught before the binary exists is substantially larger.