I bricked my first development board within forty-five minutes of getting it out of the box. Wrote some C code, forgot to configure the clock properly, flashed it, and the thing just… stopped responding. No debugger connection. No serial output. A $15 paperweight.
That experience — the raw, unforgiving nature of hardware programming — is exactly why Rust matters in embedded. Not because it prevents you from writing to the wrong register (it can’t, really), but because it gives you tools to structure hardware access so mistakes become harder to make.
The Embedded Rust Ecosystem
Embedded Rust isn’t some experimental side project anymore. There’s a proper ecosystem with layers:
┌─────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────┤
│ HAL Crate (e.g., stm32f4xx-hal) │ ← Safe hardware API
├─────────────────────────────────────────┤
│ PAC (e.g., stm32f4) │ ← Raw register access
├─────────────────────────────────────────┤
│ cortex-m / cortex-m-rt │ ← CPU support + runtime
├─────────────────────────────────────────┤
│ Bare Metal (your #![no_std] binary) │
└─────────────────────────────────────────┘
PAC (Peripheral Access Crate): Auto-generated from SVD files (XML descriptions of chip registers). Gives you raw, typed access to every register on the chip. Still unsafe, but at least the register addresses and bit fields are correct by construction.
HAL (Hardware Abstraction Layer): Builds safe APIs on top of the PAC. Instead of writing bits to three different registers to configure a GPIO pin, you call pin.into_push_pull_output(). Type-state programming ensures you can’t use a pin as output before configuring it that way.
Board Support Packages: Optional crate that ties together the HAL with board-specific details like which pin is connected to which LED.
Setting Up a Real Project
Let’s target an STM32F4 — one of the most common ARM Cortex-M4 chips. If you don’t have hardware, that’s fine — we’ll discuss emulation too.
# Install the target
rustup target add thumbv7em-none-eabihf
# Install cargo-binutils for inspecting binaries
cargo install cargo-binutils
rustup component add llvm-tools-preview
# Install probe-rs for flashing and debugging (replaces openocd + gdb)
cargo install probe-rs-tools
Create the project:
cargo init --name blinky
Cargo.toml:
[package]
name = "blinky"
version = "0.1.0"
edition = "2021"
[dependencies]
cortex-m = "0.7"
cortex-m-rt = "0.7"
panic-halt = "0.2"
stm32f4xx-hal = { version = "0.21", features = ["stm32f411"] }
[profile.release]
opt-level = "s" # Optimize for size
lto = true # Link-time optimization
codegen-units = 1 # Better optimization, slower compile
debug = true # Keep debug info for probe-rs
.cargo/config.toml:
[target.thumbv7em-none-eabihf]
runner = "probe-rs run --chip STM32F411CEUx"
rustflags = ["-C", "link-arg=-Tlink.x"]
[build]
target = "thumbv7em-none-eabihf"
memory.x — This is critical. It tells the linker where your chip’s memory lives:
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
RAM (rwx) : ORIGIN = 0x20000000, LENGTH = 128K
}
The Classic Blinky
Every embedded journey starts here. Let’s blink an LED:
#![no_std]
#![no_main]
use cortex_m_rt::entry;
use panic_halt as _;
use stm32f4xx_hal::{
pac,
prelude::*,
};
#[entry]
fn main() -> ! {
// Take ownership of the device peripherals — this can only be called once
let dp = pac::Peripherals::take().unwrap();
let cp = cortex_m::Peripherals::take().unwrap();
// Configure the clock system
let rcc = dp.RCC.constrain();
let clocks = rcc.cfgr
.sysclk(84.MHz())
.freeze();
// Configure GPIO pin C13 as push-pull output (common LED pin)
let gpioc = dp.GPIOC.split();
let mut led = gpioc.pc13.into_push_pull_output();
// Set up a delay provider using the system timer
let mut delay = cp.SYST.delay(&clocks);
// Blink forever
loop {
led.set_high();
delay.delay_ms(500u32);
led.set_low();
delay.delay_ms(500u32);
}
}
A few things to notice here:
pac::Peripherals::take() returns Option<Peripherals> and only succeeds once. After that, it returns None. This is singleton enforcement at the type level — you can’t accidentally create two references to the same hardware register bank.
into_push_pull_output() is a type-state transition. The return type is different from the input type. You literally cannot call set_high() on a pin that hasn’t been configured as an output — it won’t compile. This is Rust’s type system preventing hardware misconfiguration at compile time.
The -> ! return type on main(). Embedded programs don’t return. There’s nowhere to return to. The loop at the end enforces this.
Going Lower: Direct Register Access
The HAL is convenient, but sometimes you need to hit registers directly. Here’s what blinky looks like using just the PAC:
#![no_std]
#![no_main]
use cortex_m_rt::entry;
use panic_halt as _;
use stm32f4xx_hal::pac;
#[entry]
fn main() -> ! {
let dp = pac::Peripherals::take().unwrap();
// Enable GPIOC clock (RCC_AHB1ENR register, bit 2)
dp.RCC.ahb1enr.modify(|_, w| w.gpiocen().enabled());
// Configure PC13 as output (MODER register)
dp.GPIOC.moder.modify(|_, w| w.moder13().output());
// Set output type to push-pull (OTYPER register)
dp.GPIOC.otyper.modify(|_, w| w.ot13().push_pull());
// Set speed to low (OSPEEDR register)
dp.GPIOC.ospeedr.modify(|_, w| w.ospeedr13().low_speed());
loop {
// Set PC13 high (BSRR register — write-only, atomic)
dp.GPIOC.bsrr.write(|w| w.bs13().set());
cortex_m::asm::delay(8_000_000);
// Set PC13 low (BSRR register, reset bits)
dp.GPIOC.bsrr.write(|w| w.br13().reset());
cortex_m::asm::delay(8_000_000);
}
}
This is substantially more code, but you can see exactly what’s happening. Every register access is typed — modify gives you a read-modify-write, write gives you a write-only access. The closures provide builder-style access to individual bit fields. You can’t accidentally write to a bit that doesn’t exist.
And Even Lower: Raw Pointers
If the PAC doesn’t cover your chip (rare but it happens), you go fully manual:
#![no_std]
#![no_main]
use core::panic::PanicInfo;
use core::ptr;
// STM32F411 register addresses
const RCC_BASE: usize = 0x4002_3800;
const RCC_AHB1ENR: *mut u32 = (RCC_BASE + 0x30) as *mut u32;
const GPIOC_BASE: usize = 0x4002_0800;
const GPIOC_MODER: *mut u32 = (GPIOC_BASE + 0x00) as *mut u32;
const GPIOC_BSRR: *mut u32 = (GPIOC_BASE + 0x18) as *mut u32;
#[panic_handler]
fn panic(_: &PanicInfo) -> ! {
loop {}
}
#[no_mangle]
pub extern "C" fn _start() -> ! {
unsafe {
// Enable GPIOC clock
let val = ptr::read_volatile(RCC_AHB1ENR);
ptr::write_volatile(RCC_AHB1ENR, val | (1 << 2));
// Set PC13 as output (bits 27:26 = 01)
let val = ptr::read_volatile(GPIOC_MODER);
ptr::write_volatile(GPIOC_MODER, (val & !(3 << 26)) | (1 << 26));
loop {
// Set PC13
ptr::write_volatile(GPIOC_BSRR, 1 << 13);
for _ in 0..800_000 {
core::hint::black_box(());
}
// Reset PC13
ptr::write_volatile(GPIOC_BSRR, 1 << (13 + 16));
for _ in 0..800_000 {
core::hint::black_box(());
}
}
}
}
ptr::read_volatile and ptr::write_volatile are critical here. Without volatile, the optimizer might remove or reorder your register accesses. The hardware doesn’t care about your optimizer’s opinion — a write to a register is a side effect that must happen, in order.
Interrupts — Where It Gets Interesting
Real embedded code is interrupt-driven. You don’t poll in a loop; hardware tells you when something happens:
#![no_std]
#![no_main]
use core::cell::RefCell;
use cortex_m::interrupt::Mutex;
use cortex_m_rt::entry;
use panic_halt as _;
use stm32f4xx_hal::{
gpio::{Edge, Input, PC13, PinState},
interrupt,
pac::{self, EXTI},
prelude::*,
};
// Shared resources between main and interrupt handler
static BUTTON: Mutex<RefCell<Option<PC13<Input>>>> = Mutex::new(RefCell::new(None));
static EXTI_PERIPHERAL: Mutex<RefCell<Option<EXTI>>> = Mutex::new(RefCell::new(None));
static PRESS_COUNT: Mutex<RefCell<u32>> = Mutex::new(RefCell::new(0));
#[entry]
fn main() -> ! {
let mut dp = pac::Peripherals::take().unwrap();
let rcc = dp.RCC.constrain();
let _clocks = rcc.cfgr.sysclk(84.MHz()).freeze();
let gpioc = dp.GPIOC.split();
// Configure PC13 as input with interrupt on falling edge
let mut button = gpioc.pc13.into_pull_up_input();
let mut syscfg = dp.SYSCFG.constrain();
button.make_interrupt_source(&mut syscfg);
button.trigger_on_edge(&mut dp.EXTI, Edge::Falling);
button.enable_interrupt(&mut dp.EXTI);
// Move resources into global statics (interrupt handler needs them)
cortex_m::interrupt::free(|cs| {
BUTTON.borrow(cs).replace(Some(button));
EXTI_PERIPHERAL.borrow(cs).replace(Some(dp.EXTI));
});
// Enable the interrupt in the NVIC
unsafe {
cortex_m::peripheral::NVIC::unmask(pac::Interrupt::EXTI15_10);
}
loop {
cortex_m::asm::wfi(); // Wait for interrupt — saves power
}
}
#[interrupt]
fn EXTI15_10() {
cortex_m::interrupt::free(|cs| {
if let Some(ref mut button) = *BUTTON.borrow(cs).borrow_mut() {
if button.is_low() {
let mut count = PRESS_COUNT.borrow(cs).borrow_mut();
*count += 1;
}
button.clear_interrupt_pending_bit();
}
});
}
This looks verbose, and honestly, it is. The Mutex<RefCell<Option<...>>> pattern is the standard way to share state between main and interrupt handlers in embedded Rust. It’s ugly, but it’s correct:
Mutex(fromcortex_m, notstd) disables interrupts while you hold the lockRefCellgives you runtime borrow checkingOptionlets you initialize the resource inmainand then move it to the global
The RTIC framework (Real-Time Interrupt-driven Concurrency) cleans this up dramatically — but understanding the manual approach first matters.
Debugging Embedded Rust
probe-rs has transformed the debugging experience. You get:
# Flash and run with RTT (Real-Time Transfer) logging
cargo run --release
# Or use defmt for efficient logging
# In Cargo.toml: defmt = "0.3", defmt-rtt = "0.4", panic-probe = "0.3"
With defmt, you get structured logging that compiles down to almost nothing:
use defmt::info;
use defmt_rtt as _;
use panic_probe as _;
#[entry]
fn main() -> ! {
info!("Boot complete, clock: {} MHz", 84);
info!("Starting sensor loop");
let mut reading: u16 = 0;
loop {
reading = read_sensor();
defmt::debug!("Sensor: {}", reading);
if reading > 1000 {
defmt::warn!("Threshold exceeded: {}", reading);
}
}
}
defmt encodes log messages as indexes into a table — the actual strings stay on your host machine. So info!("Boot complete, clock: {} MHz", 84) might only transmit 3 bytes over the wire. On a chip with 2KB of RAM, that matters.
Binary Size — Because Every Byte Counts
Check what you’re producing:
cargo size --release -- -A
# Typical output:
# section size addr
# .vector_table 1024 0x8000000
# .text 2848 0x8000400
# .rodata 128 0x8000b20
# .data 8 0x20000000
# .bss 16 0x20000008
# Total 4024
4KB for a working blinky with interrupts. In C, you’d get roughly the same. Rust’s zero-cost abstractions aren’t marketing — they’re measurable.
If your binary is too big, common culprits:
[profile.release]
opt-level = "z" # Optimize aggressively for size
lto = true
codegen-units = 1
strip = true # Strip debug symbols from binary
panic = "abort" # Don't include unwinding code
Also check for formatting code — core::fmt is surprisingly large. Each write! invocation can add several KB. In extreme cases, use manual byte-level output instead.
Testing Without Hardware
You don’t need a physical board to develop embedded Rust:
# QEMU can emulate various ARM boards
cargo install cargo-embed
# Run tests on the host
cargo test --target x86_64-unknown-linux-gnu
Structure your code so the hardware-dependent layer is thin:
// This module is pure logic — testable on any platform
#[cfg_attr(test, derive(Debug, PartialEq))]
pub struct PidController {
kp: f32,
ki: f32,
kd: f32,
integral: f32,
prev_error: f32,
}
impl PidController {
pub fn new(kp: f32, ki: f32, kd: f32) -> Self {
Self { kp, ki, kd, integral: 0.0, prev_error: 0.0 }
}
pub fn update(&mut self, setpoint: f32, measured: f32, dt: f32) -> f32 {
let error = setpoint - measured;
self.integral += error * dt;
let derivative = (error - self.prev_error) / dt;
self.prev_error = error;
self.kp * error + self.ki * self.integral + self.kd * derivative
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn pid_converges_to_setpoint() {
let mut pid = PidController::new(1.0, 0.1, 0.01);
let mut value = 0.0;
for _ in 0..1000 {
let output = pid.update(100.0, value, 0.01);
value += output * 0.01;
}
assert!((value - 100.0).abs() < 1.0);
}
}
Keep your business logic in pure Rust. Push hardware interaction to the edges. This isn’t just good embedded practice — it’s good software engineering.
Where Embedded Rust Stands Today
I’ll be honest — embedded Rust has rough edges. The ecosystem is still maturing. HAL crate quality varies by chip family. Some vendors have excellent support (Nordic, STM32), others are spotty. You’ll occasionally hit a missing peripheral driver and have to write PAC-level code.
But the trajectory is clear. The type-state patterns for pin configuration catch real bugs. The ownership model prevents the shared-mutable-state nightmares that plague C firmware. And the tooling with probe-rs and defmt is genuinely better than the traditional OpenOCD + GDB + printf debugging workflow.
Next lesson, we’re going deeper into hardware communication — memory-mapped I/O, the fundamental mechanism that makes all of this work.