Logo
Logo

Atharva Pandey/Lesson 1: Collections Deep Dive — Vec, VecDeque, BTreeMap and when to use which

Created Sun, 15 Sep 2024 10:22:00 +0000 Modified Sun, 15 Sep 2024 10:22:00 +0000

I spent two days debugging a performance cliff in a data pipeline — turns out I’d been using a HashMap where a BTreeMap would’ve cut iteration time in half because I needed sorted output downstream. The collection you pick matters more than most people think.

The Big Picture

Rust’s standard library ships a small but deliberate set of collections. Unlike languages that give you seventeen flavors of list, Rust gives you a few well-designed options and expects you to understand the trade-offs. That’s actually a feature — fewer choices means you can develop genuine intuition about when to use what.

Here’s the mental model I use: every collection sits on a spectrum between “fast random access” and “fast insertion/removal.” Your job is to figure out which operations your code actually cares about.

Vec — Your Default Choice

Vec<T> is a growable array. Contiguous memory, cache-friendly, amortized O(1) push to the back. If you’re not sure what to use, start with Vec. You’ll be right more often than not.

fn main() {
    let mut scores: Vec<i32> = Vec::new();

    // Push is amortized O(1) — Vec doubles capacity when full
    scores.push(95);
    scores.push(87);
    scores.push(92);

    // Index access is O(1)
    println!("First score: {}", scores[0]);

    // But .get() is safer — returns Option<&T>
    match scores.get(10) {
        Some(s) => println!("Score: {s}"),
        None => println!("No score at index 10"),
    }

    // Pre-allocate when you know the size
    let mut known_size: Vec<String> = Vec::with_capacity(1000);
    println!("Length: {}, Capacity: {}", known_size.len(), known_size.capacity());

    for i in 0..1000 {
        known_size.push(format!("item-{i}"));
    }
    // No reallocations happened — we allocated exactly once

    // Drain gives you ownership of removed elements
    let removed: Vec<String> = known_size.drain(0..5).collect();
    println!("Removed {} items", removed.len());

    // Retain is your in-place filter
    let mut nums = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    nums.retain(|&x| x % 2 == 0);
    assert_eq!(nums, vec![2, 4, 6, 8, 10]);
}

The thing people miss about Vec is that operations at the front are O(n). Every insert(0, val) shifts the entire contents. I’ve seen codebases where someone builds a queue with Vec::insert(0, ...) and Vec::pop() — that’s an O(n) enqueue on every call. Don’t do this.

Vec Tricks Worth Knowing

fn main() {
    // Dedup removes *consecutive* duplicates — sort first if you want all dupes gone
    let mut vals = vec![3, 1, 2, 1, 3, 2, 1];
    vals.sort();
    vals.dedup();
    assert_eq!(vals, vec![1, 2, 3]);

    // split_off chops a Vec in two
    let mut first_half = vec![1, 2, 3, 4, 5, 6];
    let second_half = first_half.split_off(3);
    assert_eq!(first_half, vec![1, 2, 3]);
    assert_eq!(second_half, vec![4, 5, 6]);

    // windows and chunks for sliding window patterns
    let data = vec![1, 2, 3, 4, 5];
    let moving_averages: Vec<f64> = data
        .windows(3)
        .map(|w| w.iter().sum::<i32>() as f64 / 3.0)
        .collect();
    println!("Moving averages: {moving_averages:?}");

    // swap_remove is O(1) removal if you don't care about order
    let mut items = vec!["a", "b", "c", "d", "e"];
    items.swap_remove(1); // Swaps "b" with "e", then pops
    assert_eq!(items, vec!["a", "e", "c", "d"]);
}

swap_remove is criminally underused. If you’re removing elements from a Vec and order doesn’t matter, it’s O(1) instead of O(n). I use it constantly in game-loop style code where you’re processing and removing entities.

VecDeque — When You Need Both Ends

VecDeque<T> is a double-ended queue backed by a ring buffer. O(1) push and pop from both front and back. This is what you want when you’re building a queue, a sliding window, or any FIFO structure.

use std::collections::VecDeque;

fn main() {
    let mut queue: VecDeque<String> = VecDeque::new();

    // O(1) operations on both ends
    queue.push_back("first".to_string());
    queue.push_back("second".to_string());
    queue.push_back("third".to_string());
    queue.push_front("zeroth".to_string());

    // FIFO: pop from the front
    while let Some(item) = queue.pop_front() {
        println!("Processing: {item}");
    }

    // Sliding window of last N items
    let mut window: VecDeque<f64> = VecDeque::with_capacity(5);
    let readings = [1.0, 2.5, 3.7, 4.2, 5.1, 6.3, 7.0];

    for reading in readings {
        if window.len() == 5 {
            window.pop_front();
        }
        window.push_back(reading);

        let avg: f64 = window.iter().sum::<f64>() / window.len() as f64;
        println!("Window avg: {avg:.2} (size: {})", window.len());
    }
}

One gotcha: VecDeque doesn’t guarantee contiguous memory. If you need to pass its contents to something expecting a slice, call make_contiguous() first or use as_slices() which gives you two slices (the ring buffer might wrap around).

use std::collections::VecDeque;

fn process_slice(data: &[i32]) {
    println!("Processing {} items", data.len());
}

fn main() {
    let mut deque = VecDeque::from(vec![1, 2, 3, 4, 5]);

    // This might be two separate slices internally
    let (front, back) = deque.as_slices();
    println!("Front: {front:?}, Back: {back:?}");

    // Force it into one contiguous slice
    let slice = deque.make_contiguous();
    process_slice(slice);
}

HashMap and HashSet — Fast Lookup

HashMap<K, V> is your go-to for key-value lookups. O(1) average case for insert, lookup, and removal. Rust’s implementation uses a Robin Hood hashing strategy (via hashbrown) that’s genuinely fast.

use std::collections::HashMap;

fn main() {
    let mut word_count: HashMap<String, usize> = HashMap::new();

    let text = "the quick brown fox jumps over the lazy dog the fox";
    for word in text.split_whitespace() {
        // entry() is the idiomatic way to handle "insert if missing"
        *word_count.entry(word.to_string()).or_insert(0) += 1;
    }

    for (word, count) in &word_count {
        println!("{word}: {count}");
    }

    // or_insert_with for expensive default values
    let mut cache: HashMap<String, Vec<u8>> = HashMap::new();
    let key = "large-computation".to_string();
    let result = cache.entry(key).or_insert_with(|| {
        println!("Computing...");
        vec![0u8; 1024] // Only runs if key is missing
    });
    println!("Result length: {}", result.len());
}

The entry API is one of my favorite things about Rust’s HashMap. In most languages, you’d do a lookup, check if the key exists, then insert — that’s two hash computations. entry() hashes once and gives you a handle to either the existing value or the vacant slot.

use std::collections::{HashMap, HashSet};

fn main() {
    // HashSet is just HashMap<T, ()> underneath
    let mut seen: HashSet<String> = HashSet::new();

    let items = vec!["apple", "banana", "apple", "cherry", "banana"];
    for item in items {
        if seen.insert(item.to_string()) {
            println!("New item: {item}");
        } else {
            println!("Already seen: {item}");
        }
    }

    // Set operations
    let a: HashSet<i32> = [1, 2, 3, 4].into_iter().collect();
    let b: HashSet<i32> = [3, 4, 5, 6].into_iter().collect();

    let union: HashSet<_> = a.union(&b).collect();
    let intersection: HashSet<_> = a.intersection(&b).collect();
    let difference: HashSet<_> = a.difference(&b).collect();

    println!("Union: {union:?}");
    println!("Intersection: {intersection:?}");
    println!("A - B: {difference:?}");
}

HashMap Performance Pitfalls

The biggest one: don’t use String as a key when &str would work. Every lookup with a String key means you’re allocating just to do a hash comparison. Thankfully, HashMap<String, V> implements get for &str thanks to the Borrow trait:

use std::collections::HashMap;

fn main() {
    let mut map: HashMap<String, i32> = HashMap::new();
    map.insert("key".to_string(), 42);

    // This works — no allocation needed for lookup
    let val = map.get("key");
    println!("{val:?}");

    // Pre-size your maps when you know roughly how many entries
    let mut large_map: HashMap<u64, String> = HashMap::with_capacity(10_000);
    for i in 0..10_000u64 {
        large_map.insert(i, format!("value-{i}"));
    }
    // No rehashing happened
}

BTreeMap and BTreeSet — Sorted Collections

BTreeMap<K, V> keeps keys sorted. Operations are O(log n) instead of O(1), but you get ordered iteration, range queries, and first/last access. If you need your data sorted or you need range scans, this is your collection.

use std::collections::BTreeMap;

fn main() {
    let mut scores: BTreeMap<String, Vec<i32>> = BTreeMap::new();

    scores.insert("Charlie".to_string(), vec![85, 92]);
    scores.insert("Alice".to_string(), vec![95, 88]);
    scores.insert("Eve".to_string(), vec![78, 91]);
    scores.insert("Bob".to_string(), vec![90, 87]);
    scores.insert("Diana".to_string(), vec![82, 94]);

    // Iteration is always sorted by key
    for (name, s) in &scores {
        let avg: f64 = s.iter().sum::<i32>() as f64 / s.len() as f64;
        println!("{name}: {avg:.1}");
    }
    // Prints: Alice, Bob, Charlie, Diana, Eve — alphabetical

    // Range queries — get all names from "B" to "D" (inclusive)
    println!("\nNames B through D:");
    for (name, _) in scores.range("B".to_string()..="D".to_string()) {
        println!("  {name}");
    }

    // First and last
    if let Some((first, _)) = scores.first_key_value() {
        println!("\nFirst: {first}");
    }
    if let Some((last, _)) = scores.last_key_value() {
        println!("Last: {last}");
    }
}

I reach for BTreeMap in three situations: when I need sorted output, when I need range queries, or when my keys don’t implement Hash (since BTreeMap only requires Ord). That third case comes up more than you’d expect with custom types.

use std::collections::BTreeMap;

// This type doesn't implement Hash, but it does implement Ord
#[derive(Debug, Eq, PartialEq, Ord, PartialOrd)]
struct Version {
    major: u32,
    minor: u32,
    patch: u32,
}

fn main() {
    let mut releases: BTreeMap<Version, String> = BTreeMap::new();

    releases.insert(Version { major: 1, minor: 0, patch: 0 }, "Initial release".into());
    releases.insert(Version { major: 1, minor: 2, patch: 0 }, "Added widgets".into());
    releases.insert(Version { major: 1, minor: 1, patch: 0 }, "Bug fixes".into());
    releases.insert(Version { major: 2, minor: 0, patch: 0 }, "Breaking changes".into());

    // Automatically sorted by version
    for (ver, desc) in &releases {
        println!("{}.{}.{}: {desc}", ver.major, ver.minor, ver.patch);
    }
}

LinkedList — You Probably Don’t Want This

Rust has LinkedList<T> in the standard library. I’m mentioning it so I can tell you to almost never use it. Linked lists have terrible cache locality — every node is a separate heap allocation, so traversal causes cache misses on nearly every step. Vec or VecDeque will beat LinkedList in almost every real-world scenario.

The only legitimate use case I’ve encountered is when you need O(1) splicing — merging two lists or splitting one in half. And even then, measure first.

BinaryHeap — Priority Queues

BinaryHeap<T> gives you a max-heap. O(log n) push and pop, and peek() always returns the largest element.

use std::collections::BinaryHeap;
use std::cmp::Reverse;

fn main() {
    // Max-heap by default
    let mut max_heap = BinaryHeap::new();
    max_heap.push(5);
    max_heap.push(1);
    max_heap.push(10);
    max_heap.push(3);

    while let Some(val) = max_heap.pop() {
        print!("{val} "); // 10 5 3 1
    }
    println!();

    // Min-heap: wrap values in Reverse
    let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
    min_heap.push(Reverse(5));
    min_heap.push(Reverse(1));
    min_heap.push(Reverse(10));
    min_heap.push(Reverse(3));

    while let Some(Reverse(val)) = min_heap.pop() {
        print!("{val} "); // 1 3 5 10
    }
    println!();

    // Task scheduling with priorities
    #[derive(Debug, Eq, PartialEq)]
    struct Task {
        priority: u32,
        name: String,
    }

    impl Ord for Task {
        fn cmp(&self, other: &Self) -> std::cmp::Ordering {
            self.priority.cmp(&other.priority)
        }
    }

    impl PartialOrd for Task {
        fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
            Some(self.cmp(other))
        }
    }

    let mut tasks = BinaryHeap::new();
    tasks.push(Task { priority: 3, name: "Send email".into() });
    tasks.push(Task { priority: 10, name: "Fix production bug".into() });
    tasks.push(Task { priority: 1, name: "Update docs".into() });
    tasks.push(Task { priority: 7, name: "Code review".into() });

    while let Some(task) = tasks.pop() {
        println!("[P{}] {}", task.priority, task.name);
    }
}

Decision Framework

Here’s the cheat sheet I keep in my head:

NeedUseWhy
Growable array, random accessVec<T>Cache-friendly, O(1) index
Queue (FIFO) or dequeVecDeque<T>O(1) both ends
Key-value lookupHashMap<K, V>O(1) average
Sorted key-valueBTreeMap<K, V>O(log n) but ordered iteration
Unique set membershipHashSet<T>O(1) contains
Sorted unique setBTreeSet<T>O(log n) but range queries
Priority queueBinaryHeap<T>O(log n) push/pop, O(1) peek max

And here’s the question I always ask myself: “What operation does this code do most frequently?” If it’s random access, use Vec. If it’s front-and-back operations, VecDeque. If it’s lookups by key, HashMap. If it’s “give me the top N,” BinaryHeap.

The wrong collection choice won’t crash your program. It’ll just make it slower than it needs to be — and that slowness will compound in production under load, exactly when you need performance most.