Lesson 4: Code Generation — From AST to bytecode or machine code -

The tree-walking interpreter in Lesson 3 works, but it has a ceiling. Every time you evaluate an expression, you traverse the AST from scratch. No caching, no precomputed form, no optimization. For a REPL or a small scripting language this is fine. For a production language runtime — something that runs for hours, executes millions of operations — you want a tighter inner loop. That tighter loop is a virtual machine executing bytecode. And the step that produces bytecode from the AST is code generation.

This lesson walks through compiling the same small language to a stack-based bytecode and executing it on a simple virtual machine. You will see where constants live, how variable access becomes an index operation, how control flow becomes jumps, and why this is fundamentally faster than tree-walking.

The Problem

In a tree-walking interpreter, evaluating 1 + 2 + 3 means:

Traverse the outer InfixExpression(+) node.
Recursively traverse InfixExpression(+) on the left.
Recursively traverse IntegerLiteral(1).
Recursively traverse IntegerLiteral(2).
Compute 1 + 2, return.
Recursively traverse IntegerLiteral(3).
Compute 3 + 3, return.

Seven steps, with Go function calls, type switches, interface dispatch, and heap allocations at each level. For a tight inner loop in user code, this overhead accumulates.

Bytecode compiles this to a flat sequence of instructions that a virtual machine executes in a tight loop:

OpConstant 0   ; push constant pool[0] (value: 1)
OpConstant 1   ; push constant pool[1] (value: 2)
OpAdd          ; pop two, push sum
OpConstant 2   ; push constant pool[2] (value: 3)
OpAdd          ; pop two, push sum

The VM executes these instructions in a for loop with a single switch. No recursion, no interface dispatch. Just reading bytes and operating on a stack.

How It Works

Instructions and opcodes are the instruction set of your virtual machine:

type Opcode byte

const (
    OpConstant  Opcode = iota // push a constant from the pool
    OpAdd                     // pop two integers, push sum
    OpSub
    OpMul
    OpDiv
    OpTrue                    // push true
    OpFalse                   // push false
    OpNull
    OpEqual
    OpNotEqual
    OpGreaterThan
    OpMinus                   // negate
    OpBang                    // logical not
    OpJump                    // unconditional jump
    OpJumpNotTruthy           // conditional jump (if not true, jump)
    OpSetGlobal               // store top of stack in global slot
    OpGetGlobal               // push value from global slot
    OpSetLocal                // store top of stack in local slot
    OpGetLocal                // push value from local slot
    OpCall                    // call function
    OpReturnValue             // return with a value
    OpReturn                  // return without value (null)
    OpPop                     // discard top of stack
)

Instructions are byte sequences. Some opcodes take operands encoded as big-endian bytes immediately following the opcode:

type Definition struct {
    Name          string
    OperandWidths []int // width of each operand in bytes
}

var definitions = map[Opcode]*Definition{
    OpConstant:  {"OpConstant", []int{2}},     // one 2-byte operand (constant pool index)
    OpAdd:       {"OpAdd", []int{}},
    OpJump:      {"OpJump", []int{2}},         // jump target (2-byte offset)
    OpSetGlobal: {"OpSetGlobal", []int{2}},
    OpGetGlobal: {"OpGetGlobal", []int{2}},
    OpSetLocal:  {"OpSetLocal", []int{1}},     // local slot index (1 byte, max 256 locals)
    OpGetLocal:  {"OpGetLocal", []int{1}},
    OpCall:      {"OpCall", []int{1}},         // number of arguments
    // ...
}

type Instructions []byte

func Make(op Opcode, operands ...int) []byte {
    def, ok := definitions[op]
    if !ok {
        return []byte{}
    }
    out := []byte{byte(op)}
    for i, o := range operands {
        width := def.OperandWidths[i]
        switch width {
        case 2:
            out = append(out, byte(o>>8), byte(o))
        case 1:
            out = append(out, byte(o))
        }
    }
    return out
}

The compiler walks the AST and emits instructions:

type Compiler struct {
    instructions Instructions
    constants    []Object        // constant pool
    globals      SymbolTable     // global variable name → index
    scopes       []CompilationScope
}

type CompilationScope struct {
    instructions Instructions
    lastInstruction  EmittedInstruction
    previousInstruction EmittedInstruction
}

func (c *Compiler) Compile(node Node) error {
    switch node := node.(type) {
    case *Program:
        for _, s := range node.Statements {
            if err := c.Compile(s); err != nil {
                return err
            }
        }
    case *ExpressionStatement:
        if err := c.Compile(node.Expression); err != nil {
            return err
        }
        c.emit(OpPop) // discard expression result (not a return value)

    case *InfixExpression:
        if err := c.Compile(node.Left); err != nil { return err }
        if err := c.Compile(node.Right); err != nil { return err }
        switch node.Operator {
        case "+": c.emit(OpAdd)
        case "-": c.emit(OpSub)
        case "*": c.emit(OpMul)
        case "/": c.emit(OpDiv)
        case "==": c.emit(OpEqual)
        case "!=": c.emit(OpNotEqual)
        case ">": c.emit(OpGreaterThan)
        default:
            return fmt.Errorf("unknown operator: %s", node.Operator)
        }

    case *IntegerLiteral:
        integer := &Integer{Value: node.Value}
        c.emit(OpConstant, c.addConstant(integer))

    case *LetStatement:
        symbol := c.globals.Define(node.Name.Value)
        if err := c.Compile(node.Value); err != nil { return err }
        c.emit(OpSetGlobal, symbol.Index)

    case *Identifier:
        symbol, ok := c.globals.Resolve(node.Value)
        if !ok {
            return fmt.Errorf("undefined variable %s", node.Value)
        }
        c.emit(OpGetGlobal, symbol.Index)

    case *IfExpression:
        if err := c.Compile(node.Condition); err != nil { return err }
        // emit a conditional jump with a placeholder offset
        jumpNotTruthyPos := c.emit(OpJumpNotTruthy, 9999)
        if err := c.Compile(node.Consequence); err != nil { return err }
        // emit unconditional jump over the else block
        jumpPos := c.emit(OpJump, 9999)
        // back-patch the conditional jump to here
        afterConsequence := len(c.currentInstructions())
        c.changeOperand(jumpNotTruthyPos, afterConsequence)
        if node.Alternative != nil {
            if err := c.Compile(node.Alternative); err != nil { return err }
        }
        afterAlternative := len(c.currentInstructions())
        c.changeOperand(jumpPos, afterAlternative)
    }
    return nil
}

The changeOperand / back-patching trick is how compilers handle forward jumps: emit a jump with a placeholder target, compile the body, then go back and fill in the real target address.

The virtual machine executes the bytecode:

type VM struct {
    constants    []Object
    globals      []Object
    stack        [StackSize]Object
    sp           int // stack pointer: points to next free slot
    frames       [MaxFrames]*Frame
    framesIndex  int
}

type Frame struct {
    fn          *CompiledFunction
    ip          int // instruction pointer
    basePointer int // points to first local in this frame's stack window
}

func (vm *VM) Run() error {
    var ip int
    var ins Instructions
    var op Opcode

    for vm.currentFrame().ip < len(vm.currentFrame().Instructions())-1 {
        vm.currentFrame().ip++
        ip = vm.currentFrame().ip
        ins = vm.currentFrame().Instructions()
        op = Opcode(ins[ip])

        switch op {
        case OpConstant:
            constIndex := int(ins[ip+1])<<8 | int(ins[ip+2])
            vm.currentFrame().ip += 2
            vm.push(vm.constants[constIndex])

        case OpAdd, OpSub, OpMul, OpDiv:
            right := vm.pop().(*Integer).Value
            left := vm.pop().(*Integer).Value
            switch op {
            case OpAdd: vm.push(&Integer{Value: left + right})
            case OpSub: vm.push(&Integer{Value: left - right})
            case OpMul: vm.push(&Integer{Value: left * right})
            case OpDiv: vm.push(&Integer{Value: left / right})
            }

        case OpSetGlobal:
            globalIndex := int(ins[ip+1])<<8 | int(ins[ip+2])
            vm.currentFrame().ip += 2
            vm.globals[globalIndex] = vm.pop()

        case OpGetGlobal:
            globalIndex := int(ins[ip+1])<<8 | int(ins[ip+2])
            vm.currentFrame().ip += 2
            vm.push(vm.globals[globalIndex])

        case OpJumpNotTruthy:
            target := int(ins[ip+1])<<8 | int(ins[ip+2])
            vm.currentFrame().ip += 2
            condition := vm.pop()
            if !isTruthy(condition) {
                vm.currentFrame().ip = target - 1 // -1 because loop increments
            }

        case OpCall:
            numArgs := int(ins[ip+1])
            vm.currentFrame().ip++
            fn := vm.stack[vm.sp-1-numArgs].(*CompiledFunction)
            frame := NewFrame(fn, vm.sp-numArgs)
            vm.pushFrame(frame)

        case OpReturnValue:
            returnValue := vm.pop()
            frame := vm.popFrame()
            vm.sp = frame.basePointer - 1 // pop function and args
            vm.push(returnValue)
        }
    }
    return nil
}

In Practice

Disassembling bytecode is essential for debugging. Write a Disassemble(ins Instructions) string function that decodes each instruction using the definitions map and prints it as human-readable text. Whenever the compiler or VM behaves unexpectedly, disassemble the generated bytecode and read it.

Local variables vs globals. In the code above, global variables use a flat array (vm.globals) indexed by a symbol table. Local variables use the stack itself — the base pointer marks where the current frame’s locals begin. OpGetLocal N reads vm.stack[basePointer + N]. This is exactly how most real VMs, including the CPython VM and the JVM, handle locals.

LLVM as a code generation backend. If you want to target native machine code rather than a custom VM, LLVM is the standard backend. You walk the AST and emit LLVM IR (a textual intermediate representation) or use an LLVM Go binding like llir/llvm. LLVM handles register allocation, instruction selection, and optimization. The Go compiler itself uses a similar multi-stage approach: AST to SSA, SSA to machine code.

The Gotchas

Back-patching and forward references. The compiler emits jumps with placeholder operands and patches them later. This works for one pass over a linear AST but gets complicated with nested control flow. Keep a stack of “patch positions” for each nested scope that needs back-patching.

Stack depth analysis. A stack machine needs a stack large enough for the deepest expression. You can compute the maximum stack depth statically (count push minus pop at each point), or just allocate a generous fixed stack. For production VMs, static analysis prevents stack overflow at compile time.

Function compilation creates sub-bytecodes. Each function compiles to its own CompiledFunction with its own Instructions byte slice. The main program’s bytecode references these function objects as constants. Compiling nested functions requires a scope stack in the compiler to track which instruction list you are emitting into.

Closures over local variables. When an inner function references a local of an outer function, the local must survive after the outer function returns. This requires “upvalues” (Lua’s term) or heap-escaping locals. The compiler must detect captured variables and emit special get/set ops that reference the closure’s upvalue array rather than the stack. This is the hardest part of a realistic compiler.

Key Takeaway

Code generation transforms the AST into a flat sequence of bytecode instructions executed by a virtual machine. The compiler walks the AST exactly as the tree-walking evaluator did, but instead of computing results immediately, it emits instructions. The VM then executes those instructions in a tight loop: no recursion, no interface dispatch, just reading bytes and operating on a stack. The performance difference over tree-walking is significant — often 5–10x for numeric workloads. Understanding this path — source text to tokens to AST to bytecode to execution — gives you a complete mental model of how programming languages work.

Previous: Lesson 3: AST and Evaluation — Walking the tree to compute results

🎓 Course Complete! You have finished Compilers and Interpreters. You built a lexer that turns text into tokens, a Pratt parser that builds an AST, a tree-walking evaluator with closures and environments, and a bytecode compiler with a stack-based virtual machine. These are the foundational pieces of every language runtime in existence. The details change; the structure does not.

Atharva Pandey/Lesson 4: Code Generation — From AST to bytecode or machine code

The Problem

How It Works

In Practice

The Gotchas

Key Takeaway