Logo
Logo

Atharva Pandey/Lesson 12: Ring Buffers — Fixed-size queues for real-time systems

Created Mon, 16 Sep 2024 00:00:00 +0000 Modified Mon, 16 Sep 2024 00:00:00 +0000

The ring buffer is the data structure that makes real-time systems possible. Audio processing, network packet capture, kernel I/O buffers, metrics collection — anywhere you have a producer and a consumer that need to exchange data with zero allocation and bounded latency, you’ll find a ring buffer.

It’s also one of the most elegant structures in systems programming: a fixed-size array, two indices, and one invariant. Let me show you how it works and why it appears everywhere from Linux kernel drivers to Disruptor (the LMAX exchange’s million-transactions-per-second queue).

How It Actually Works

A ring buffer (circular buffer) is a fixed-size array where write and read positions wrap around when they reach the end. Think of it as a clock face — after position 11 comes position 0 again.

Two indices: head (next write position) and tail (next read position). When they’re equal, the buffer is empty. When (head + 1) % size == tail, the buffer is full.

package main

import (
    "errors"
    "fmt"
)

type RingBuffer[T any] struct {
    data []T
    head int // next write position
    tail int // next read position
    size int
    full bool
}

func NewRingBuffer[T any](capacity int) *RingBuffer[T] {
    return &RingBuffer[T]{
        data: make([]T, capacity),
        size: capacity,
    }
}

func (rb *RingBuffer[T]) Push(val T) error {
    if rb.full {
        return errors.New("buffer full")
    }
    rb.data[rb.head] = val
    rb.head = (rb.head + 1) % rb.size
    rb.full = rb.head == rb.tail
    return nil
}

func (rb *RingBuffer[T]) Pop() (T, error) {
    var zero T
    if rb.IsEmpty() {
        return zero, errors.New("buffer empty")
    }
    val := rb.data[rb.tail]
    rb.tail = (rb.tail + 1) % rb.size
    rb.full = false
    return val, nil
}

func (rb *RingBuffer[T]) IsEmpty() bool {
    return !rb.full && rb.head == rb.tail
}

func (rb *RingBuffer[T]) Len() int {
    if rb.full {
        return rb.size
    }
    if rb.head >= rb.tail {
        return rb.head - rb.tail
    }
    return rb.size - rb.tail + rb.head
}

func main() {
    rb := NewRingBuffer[int](4)

    rb.Push(1); rb.Push(2); rb.Push(3); rb.Push(4)
    fmt.Println("Full:", rb.full)

    v, _ := rb.Pop()
    fmt.Println("Popped:", v) // 1 — FIFO order

    rb.Push(5) // wraps around to position 0
    fmt.Println("Length:", rb.Len())

    for !rb.IsEmpty() {
        v, _ := rb.Pop()
        fmt.Printf("%d ", v)
    }
    // 2 3 4 5
}

The critical property: both Push and Pop are O(1) with zero allocation. There’s no dynamic resizing, no copying, no GC pressure. The same backing array is reused forever.

When to Use It

Use a ring buffer when:

  • You need a bounded FIFO queue with no allocation overhead
  • You have a real-time constraint (audio processing, network capture, metrics)
  • Producer and consumer run at different speeds and you want to absorb bursts
  • You want to drop old data when the buffer fills (sliding window over a stream)
  • You need lock-free SPSC (single-producer, single-consumer) communication between goroutines

Don’t use a ring buffer when:

  • The queue size is truly unbounded (use a dynamic queue or channel)
  • You need random access by index (use a slice)
  • You need ordered iteration without consuming (use a deque)

Production Example

The single-producer, single-consumer (SPSC) ring buffer is a gold standard for inter-goroutine communication without locks. With only one writer and one reader, you can use atomic operations instead of a mutex — eliminating lock contention entirely:

package main

import (
    "fmt"
    "sync/atomic"
    "time"
    "unsafe"
)

// Lock-free SPSC ring buffer using atomic head/tail
// Only safe for exactly one goroutine writing and one goroutine reading
type SPSCBuffer struct {
    data []int64
    size uint64
    _    [56]byte  // padding to separate cache lines
    head uint64    // written by producer only
    _    [56]byte
    tail uint64    // written by consumer only
    _    [56]byte
}

func NewSPSCBuffer(size uint64) *SPSCBuffer {
    return &SPSCBuffer{
        data: make([]int64, size),
        size: size,
    }
}

func (b *SPSCBuffer) Push(val int64) bool {
    head := atomic.LoadUint64(&b.head)
    tail := atomic.LoadUint64(&b.tail)
    if head-tail >= b.size {
        return false // full
    }
    b.data[head%b.size] = val
    atomic.StoreUint64(&b.head, head+1)
    return true
}

func (b *SPSCBuffer) Pop() (int64, bool) {
    tail := atomic.LoadUint64(&b.tail)
    head := atomic.LoadUint64(&b.head)
    if tail == head {
        return 0, false // empty
    }
    val := b.data[tail%b.size]
    atomic.StoreUint64(&b.tail, tail+1)
    return val, true
}

// Demonstrate padding effect: false sharing vs isolated cache lines
var _ = unsafe.Sizeof(SPSCBuffer{}) // prevent import elision

func main() {
    buf := NewSPSCBuffer(1024)
    produced := 0
    consumed := 0
    done := make(chan struct{})

    // Producer goroutine
    go func() {
        for i := int64(0); i < 1_000_000; i++ {
            for !buf.Push(i) {
                // spin if full — in real code, use backoff
            }
            produced++
        }
    }()

    // Consumer goroutine
    go func() {
        for consumed < 1_000_000 {
            if _, ok := buf.Pop(); ok {
                consumed++
            }
        }
        close(done)
    }()

    start := time.Now()
    <-done
    elapsed := time.Since(start)

    fmt.Printf("Transferred 1M messages in %v\n", elapsed)
    fmt.Printf("Throughput: %.1f M msg/sec\n", 1.0/elapsed.Seconds())
}

The padding between head and tail is critical. Both are written by different goroutines. Without padding, they’d share a cache line, and each write would invalidate the other goroutine’s cache — false sharing from Lesson 1. With 56 bytes of padding between them (filling out a 64-byte cache line), each goroutine has its own cache line for its index.

Now let’s look at a sliding window use case — keeping the last N metrics without allocating:

package main

import "fmt"

// SlidingWindow maintains the last N values and computes statistics
type SlidingWindow struct {
    data  []float64
    head  int
    count int
    size  int
}

func NewSlidingWindow(size int) *SlidingWindow {
    return &SlidingWindow{
        data: make([]float64, size),
        size: size,
    }
}

func (sw *SlidingWindow) Add(val float64) {
    sw.data[sw.head] = val
    sw.head = (sw.head + 1) % sw.size
    if sw.count < sw.size {
        sw.count++
    }
}

func (sw *SlidingWindow) Average() float64 {
    if sw.count == 0 {
        return 0
    }
    sum := 0.0
    for i := 0; i < sw.count; i++ {
        sum += sw.data[i]
    }
    return sum / float64(sw.count)
}

func (sw *SlidingWindow) Max() float64 {
    if sw.count == 0 {
        return 0
    }
    max := sw.data[0]
    for i := 1; i < sw.count; i++ {
        if sw.data[i] > max {
            max = sw.data[i]
        }
    }
    return max
}

func main() {
    // Keep last 5 latency measurements
    window := NewSlidingWindow(5)

    latencies := []float64{12, 15, 8, 200, 11, 9, 13, 14}
    for _, l := range latencies {
        window.Add(l)
        fmt.Printf("Added %.0fms — avg: %.1fms, max: %.0fms\n",
            l, window.Average(), window.Max())
    }
}

This pattern appears in rate limiters, circuit breakers, and monitoring agents — compute rolling statistics over a stream without unbounded memory growth.

The Tradeoffs

Fixed size means you must make a decision about overflow. When the buffer fills, you have two choices: block the producer (backpressure) or overwrite the oldest entry (lossy). Which is right depends on your application. A network packet capture tool should drop new packets when full. An audio buffer should block the producer and wait for the consumer to catch up.

Power-of-two sizes are faster. Modulo arithmetic (head % size) can be replaced with bitwise AND (head & (size-1)) when size is a power of two. This is a meaningful optimization in tight loops:

// Slower: general modulo
idx := head % size

// Faster: bitwise mask (requires size to be a power of two)
const size = 1024 // must be power of 2
mask := uint64(size - 1)
idx := head & mask

The SPSC assumption is fragile. Lock-free SPSC ring buffers are only safe with exactly one producer and one consumer. If two goroutines both call Push concurrently, they’ll race on head. For MPMC (multiple-producer, multiple-consumer), you need a more complex design — the LMAX Disruptor pattern, for example, uses a sequence-based claim mechanism.

Memory visibility ordering. The atomic stores and loads in the SPSC example provide the memory ordering guarantees that make cross-goroutine communication safe. A plain assignment to head without atomics would be a data race on most architectures because of CPU reordering and cache coherency delays.

Key Takeaway

The ring buffer is the data structure of real-time systems. Its properties — O(1) operations, zero allocation, predictable memory usage — are exactly what’s needed when you can’t afford to pause for garbage collection or block on a mutex. The SPSC variant is one of the fastest inter-thread communication mechanisms available, and the false-sharing-aware padding trick is a prime example of how hardware knowledge translates directly to performance.

Everywhere you see “circular buffer,” “ring buffer,” or “fixed-size queue” in systems code — from Linux kernel drivers to Disruptor to audio frameworks — this is the structure underneath.


Previous: Lesson 11: Skip Lists — How Redis sorted sets work


Course Complete

You’ve made it through all 12 lessons in Data Structures for Production Engineers. Here’s what we covered:

LessonTopicKey Insight
1ArraysCache lines decide performance
2Linked ListsAlmost never the right choice
3Hash MapsO(1) with asterisks
4Stacks & QueuesHiding in every system
5Trees & BSTsWhy databases use trees
6B-TreesHow indexes actually work
7HeapsScheduling and top-K
8GraphsYou use them daily
9TriesPrefix matching at scale
10Bloom FiltersProbably yes, definitely no
11Skip ListsHow Redis sorted sets work
12Ring BuffersReal-time systems foundation

The goal was never to teach you algorithms in isolation — it was to show you the why behind systems you already use. Every data structure here appears in production software you depend on every day. Recognizing the structure behind a system is the first step to understanding and improving it.