Logo
Logo

Atharva Pandey/Lesson 4: vtables — How dyn Trait actually works under the hood

Created Fri, 07 Mar 2025 11:15:00 +0000 Modified Fri, 07 Mar 2025 11:15:00 +0000

I was profiling a parser once and found that a hot path using dyn Iterator was 3x slower than the equivalent code using generics. The algorithm was identical. The difference? Dynamic dispatch — every method call went through a vtable lookup instead of being inlined. That day I learned to respect what dyn really costs. Let’s open the hood.

Static vs Dynamic Dispatch

Rust gives you two ways to call methods on trait objects: static dispatch (generics) and dynamic dispatch (dyn Trait).

Static dispatch happens at compile time. The compiler generates a separate copy of the function for each concrete type, and calls are resolved directly. No indirection, full inlining, maximum optimization:

trait Drawable {
    fn draw(&self);
}

struct Circle;
struct Square;

impl Drawable for Circle {
    fn draw(&self) { println!("Drawing circle"); }
}

impl Drawable for Square {
    fn draw(&self) { println!("Drawing square"); }
}

// Static dispatch — monomorphized
fn draw_static<T: Drawable>(item: &T) {
    item.draw(); // compiler knows the exact type, inlines the call
}

fn main() {
    draw_static(&Circle);  // calls Circle::draw directly
    draw_static(&Square);  // calls Square::draw directly
}

The compiler generates two versions of draw_static: one for Circle, one for Square. Each version calls the concrete method directly. Fast, but it increases binary size.

Dynamic dispatch happens at runtime. The compiler generates one version of the function that uses a pointer to look up the right method at runtime:

// Dynamic dispatch — one function, runtime lookup
fn draw_dynamic(item: &dyn Drawable) {
    item.draw(); // looks up draw() in the vtable at runtime
}

fn main() {
    draw_dynamic(&Circle);
    draw_dynamic(&Square);
}

Same behavior, but the mechanism is completely different. draw_dynamic receives a fat pointer containing the data and a vtable, and uses the vtable to find the right draw function.

What Is a vtable?

A vtable (virtual method table) is a struct generated by the compiler that contains function pointers for every method of a trait, plus some metadata. There’s one vtable per concrete type per trait — so Circle’s vtable for Drawable is a different static data structure than Square’s vtable for Drawable.

Conceptually, a vtable looks like this:

// This is what the compiler generates — you never write this yourself
struct DrawableVtable {
    // Metadata
    drop_in_place: fn(*mut ()),     // destructor
    size: usize,                     // size of the concrete type
    align: usize,                    // alignment of the concrete type

    // Trait methods
    draw: fn(*const ()),             // pointer to the concrete draw() implementation
}

// vtable for Circle implementing Drawable
static CIRCLE_DRAWABLE_VTABLE: DrawableVtable = DrawableVtable {
    drop_in_place: circle_drop_in_place,
    size: 0,    // Circle is a ZST
    align: 1,
    draw: circle_draw,
};

// vtable for Square implementing Drawable
static SQUARE_DRAWABLE_VTABLE: DrawableVtable = DrawableVtable {
    drop_in_place: square_drop_in_place,
    size: 0,
    align: 1,
    draw: square_draw,
};

The vtable is stored in the read-only data section of the binary. It’s created at compile time, never at runtime. Each vtable always contains at least three entries: drop_in_place, size, and align. The trait’s methods come after these.

The Fat Pointer Structure

When you have a &dyn Drawable, you’re holding a fat pointer — two machine words instead of one:

&dyn Drawable (16 bytes on 64-bit):
+------------------+-------------------+
| data pointer     | vtable pointer    |
| (points to the   | (points to the    |
|  concrete value) |  static vtable)   |
+------------------+-------------------+
   8 bytes              8 bytes

Let’s verify this:

use std::mem;

trait Animal {
    fn speak(&self);
    fn name(&self) -> &str;
}

struct Dog { name: String }
struct Cat;

impl Animal for Dog {
    fn speak(&self) { println!("Woof!"); }
    fn name(&self) -> &str { &self.name }
}

impl Animal for Cat {
    fn speak(&self) { println!("Meow!"); }
    fn name(&self) -> &str { "cat" }
}

fn main() {
    println!("&Dog:        {} bytes", mem::size_of::<&Dog>());        // 8
    println!("&dyn Animal: {} bytes", mem::size_of::<&dyn Animal>()); // 16

    println!("Box<Dog>:        {} bytes", mem::size_of::<Box<Dog>>());        // 8
    println!("Box<dyn Animal>: {} bytes", mem::size_of::<Box<dyn Animal>>()); // 16

    // Even raw pointers show the difference
    println!("*const Dog:        {} bytes", mem::size_of::<*const Dog>());        // 8
    println!("*const dyn Animal: {} bytes", mem::size_of::<*const dyn Animal>()); // 16
}

The extra 8 bytes is always the vtable pointer. This is consistent whether you use references, Box, Rc, Arc, or raw pointers — they all double in size when pointing to a trait object.

How a vtable Call Works

Let’s trace through what happens when you call a method on a trait object:

fn make_sound(animal: &dyn Animal) {
    animal.speak();
}

fn main() {
    let dog = Dog { name: String::from("Rex") };
    make_sound(&dog);
}

Here’s the process, step by step:

  1. &dog is a regular thin pointer to Dog on the stack (8 bytes)
  2. When coerced to &dyn Animal, the compiler creates a fat pointer: (pointer_to_dog, pointer_to_dog_animal_vtable)
  3. Inside make_sound, the call animal.speak() becomes:
    • Extract the vtable pointer from the fat pointer
    • Look up the speak entry in the vtable (at a fixed offset)
    • Call the function pointer, passing the data pointer as self

In pseudo-assembly:

// animal.speak() compiles roughly to:
load vtable_ptr from animal + 8    // get vtable pointer
load speak_fn from vtable_ptr + 24 // offset for speak (after drop, size, align)
call speak_fn(animal.data_ptr)     // call with data pointer as first arg

That’s two pointer dereferences and an indirect call. On modern CPUs, this costs about 2-5 nanoseconds extra compared to a direct call. Not much for a single call, but in a tight loop processing millions of items, it adds up — partly because indirect calls defeat branch prediction and prevent inlining.

Multiple Traits and Supertraits

What happens when a trait has supertraits?

trait Named {
    fn name(&self) -> &str;
}

trait Greetable: Named {
    fn greet(&self);
}

struct Person { name: String }

impl Named for Person {
    fn name(&self) -> &str { &self.name }
}

impl Greetable for Person {
    fn greet(&self) { println!("Hi, I'm {}!", self.name()); }
}

fn welcome(guest: &dyn Greetable) {
    // Can call both Greetable and Named methods
    println!("Welcome, {}!", guest.name());
    guest.greet();
}

fn main() {
    let p = Person { name: String::from("Atharva") };
    welcome(&p);
}

The vtable for Person as dyn Greetable includes methods from both Greetable and Named. It’s one vtable, flattened:

Person_Greetable_vtable:
  drop_in_place
  size
  align
  name  (from Named)
  greet (from Greetable)

This means &dyn Greetable is still just 16 bytes — one data pointer and one vtable pointer. The supertrait methods are baked right into the same vtable.

But here’s the limitation: you can’t have &dyn (TraitA + TraitB) for arbitrary trait combinations (with the exception of auto traits like Send and Sync). Each dyn Trait gets one vtable. If you need an object that implements multiple unrelated traits, you have a few options:

trait CombinedTrait: TraitA + TraitB {}
impl<T: TraitA + TraitB> CombinedTrait for T {}
// Now use dyn CombinedTrait

// Or use separate trait objects:
struct Wrapper {
    a: Box<dyn TraitA>,
    b: Box<dyn TraitB>,
}

The Performance Reality

Let’s measure the actual cost:

trait Processor {
    fn process(&self, value: u64) -> u64;
}

struct Doubler;
impl Processor for Doubler {
    #[inline(never)] // prevent inlining to make the comparison fair
    fn process(&self, value: u64) -> u64 { value * 2 }
}

fn static_dispatch<P: Processor>(proc: &P, data: &[u64]) -> u64 {
    let mut sum = 0u64;
    for &v in data {
        sum = sum.wrapping_add(proc.process(v));
    }
    sum
}

fn dynamic_dispatch(proc: &dyn Processor, data: &[u64]) -> u64 {
    let mut sum = 0u64;
    for &v in data {
        sum = sum.wrapping_add(proc.process(v));
    }
    sum
}

fn main() {
    let proc = Doubler;
    let data: Vec<u64> = (0..10_000_000).collect();

    let start = std::time::Instant::now();
    let r1 = static_dispatch(&proc, &data);
    println!("Static:  {:?} (result: {})", start.elapsed(), r1);

    let start = std::time::Instant::now();
    let r2 = dynamic_dispatch(&proc, &data);
    println!("Dynamic: {:?} (result: {})", start.elapsed(), r2);
}

Without #[inline(never)], the static version would be dramatically faster because the compiler inlines process. With it, you’re measuring the raw cost of direct vs indirect calls. The dynamic version is typically 20-50% slower in tight loops like this. In real-world code with more work per iteration, the difference shrinks to the point of being negligible.

When to Use Dynamic Dispatch

Don’t be afraid of dyn Trait. The performance cost is often irrelevant:

Use dyn Trait when:

  • You need heterogeneous collections (Vec<Box<dyn Widget>>)
  • You want to reduce compile times and binary size (generics create monomorphized copies)
  • You’re building plugin systems or extensible architectures
  • The method calls aren’t in a hot loop

Prefer generics when:

  • You’re in a performance-critical hot loop
  • You want the compiler to inline and optimize across call boundaries
  • You know the concrete types at compile time
  • You need associated types or const generics

In my experience, most applications should default to generics and switch to dyn Trait when they have a concrete reason — heterogeneous storage, reduced compilation, or architectural flexibility.

Manually Inspecting vtables

You can’t officially inspect vtables in safe Rust, but in unsafe land, you can poke at the raw representation:

use std::raw::TraitObject; // This is unstable, but illustrative

// On stable Rust, you can do this with pointer casting:
trait Greeter {
    fn greet(&self) -> &str;
}

struct English;
struct Spanish;

impl Greeter for English {
    fn greet(&self) -> &str { "Hello!" }
}

impl Greeter for Spanish {
    fn greet(&self) -> &str { "¡Hola!" }
}

fn main() {
    let eng = English;
    let spa = Spanish;

    let dyn_eng: &dyn Greeter = &eng;
    let dyn_spa: &dyn Greeter = &spa;

    // Extract the raw pointers
    let eng_raw: [usize; 2] = unsafe { std::mem::transmute(dyn_eng) };
    let spa_raw: [usize; 2] = unsafe { std::mem::transmute(dyn_spa) };

    println!("English: data={:#x}, vtable={:#x}", eng_raw[0], eng_raw[1]);
    println!("Spanish: data={:#x}, vtable={:#x}", spa_raw[0], spa_raw[1]);

    // Different data pointers (different objects)
    // Different vtable pointers (different types)
}

You’ll see that English and Spanish have different vtable addresses — each concrete type gets its own vtable for each trait it implements. The vtables live in the binary’s read-only section, so their addresses are stable for the lifetime of the program.

What’s Next

vtables are one type of fat pointer in Rust. But they’re not the only one. Slices (&[T]) and string slices (&str) also use fat pointers — a data pointer plus a length. In Lesson 5, we’ll look at all the fat pointer variants, compare them, and understand the unified concept behind &dyn Trait, &[T], and &str.