Skip to content

Structural flow for instructions#22

Open
msakuta wants to merge 18 commits intomasterfrom
strflow
Open

Structural flow for instructions#22
msakuta wants to merge 18 commits intomasterfrom
strflow

Conversation

@msakuta
Copy link
Copy Markdown
Owner

@msakuta msakuta commented Oct 13, 2024

Background

In WebAssembly, jump instruction does not contain explicit address, which makes the compiler much easier to implement, because it does not have to "fixup" the jump address if it is jumping forward.
The work of calculating the destination address is responsibility of the runtime virtual machine.
This design is a necessity rather than a preference since Wasm uses variable length encoding for integers, calculating offset later would shift a whole chunk of instructions.

Another reason Wasm uses structured control flow is the safety. It won't break predictable behavior by jumping to a random address by a bug in bytecode, since the jump address is determined by the structure. The address is inferred by the block nesting level, which is easily verified by checking if it's less than the block stack size.

Also, structured control flow is much easier to read and reason about. Jump addresses are harder to understand and debug, since the information of control structure is lost by compiling from AST to bytecode.

I like this design so much so that I steal the idea of structural control flow. Now it looks like a hybrid of higher level language and assembly.

structural flow instructions

We have following new "instructions" with quotes, because they are not found in real CPUs.

enum OpCode {
    // ...
    /// Marks the beginning of a conditional block, should be followed by Else or End control flow instruction.
    If,
    /// Marks the beginning of an else block of a conditional block, should be followed by an End control flow instruction.
    Else,
    /// Marks the beginning of a block, in which jump instructions will jump to the end.
    Block,
    /// Marks the beginning of a loop, which is the destination instruction when a jump instruction is called.
    Loop,
    /// Marks the end of a control block.
    #[default]
    End,
}

The stack for control blocks, pushed each time by Block or Loop instruction and popped by End.
Jump instructions jump forward in Block and back in Loop.

We follow the WebAssembly VM model, where loops and branches are implemented as blocks (structured control flow).
The block can be one of the following:

  • Block (jump forward)

    Block
    ...
    End

  • Loop (jump backward)

    Loop
    ...
    End

  • If (skip forward conditionally)

    If
    ...
    End

  • If/Else (skip to else clause conditionally)

    If
    ...
    Else
    ...
    End

Any jump instructions (Jmp, Jt and Jf) in these blocks will transfer the control flow to a new address, either the beginning of a block (Loop) or end (all the other instructions).

Compiler generated code

In practice, Block and Loop will always come in pairs. If you compile a for, loop or while loop, the instructions would look like this:

Loop
  Block
    ...
  End
End

The inner part is the body of the loop control structure. Jmp _ 1 will jump to the end of the block, meaning escaping the loop, making break. Jmp _ 2 will jump to the beginning of the loop, making continue flow.

Disassembly improvement

Now the disassembly is much more ergonomic. It shows the control flow structure by indentation.

Instructions(28):
  [  0] LoadLiteral 0 1 (I64(0))
  [  1] LoadLiteral 1 2 (I64(3))
  [  2] Loop 0 0
  [  3]   Block 0 0
  [  4]     Move 1 3
  [  5]     Lt 3 2
  [  6]     Jf 3 1
  [  7]     LoadLiteral 2 4 (I64(1))
  [  8]     LoadLiteral 1 5 (I64(3))
  [  9]     Loop 0 0
  [ 10]       Block 0 0
  [ 11]         Move 4 6
  [ 12]         Lt 6 5
  [ 13]         Jf 6 1
  [ 14]         Move 1 7
  [ 15]         Mul 7 4
  [ 16]         LoadLiteral 3 8 (Str("print"))
  [ 17]         Move 7 9
  [ 18]         Call 1 8
  [ 19]         Incr 4 0
  [ 20]         Jmp 0 2
  [ 21]       End 0 0
  [ 22]     End 0 0
  [ 23]     Incr 1 0
  [ 24]     Jmp 0 2
  [ 25]   End 0 0
  [ 26] End 0 0
  [ 27] Ret 0 8

It was like this before.

Instructions(20):
  [0] LoadLiteral 0 1 (I64(0))
  [1] LoadLiteral 1 2 (I64(3))
  [2] Move 1 3
  [3] Lt 3 2
  [4] Jf 3 19
  [5] LoadLiteral 2 4 (I64(1))
  [6] LoadLiteral 1 5 (I64(3))
  [7] Move 4 6
  [8] Lt 6 5
  [9] Jf 6 17
  [10] Move 1 7
  [11] Mul 7 4
  [12] LoadLiteral 3 8 (Str("print"))
  [13] Move 7 9
  [14] Call 1 8
  [15] Incr 4 0
  [16] Jmp 0 7
  [17] Incr 1 0
  [18] Jmp 0 2
  [19] Ret 0 8

Benchmark

Since structural flow puts the burden of calculating jump address to the bytecode interpreter, we would like to know how much is the impact on perfoamnce.
Here is the latest measurement of Mandelbrot set ascii art rendering time among other languages. Error bars are standard deviation of 5 runs.

mandel-time

The ones relevant are named Mascal.

  • Mascal AST interpreter - an interpreter that evaluates the AST directly. It is very slow
  • Mascal bytecode - the implementation of bytecode compiler before this PR. It is fast, but the compiler needs to calculate and "fixup" addresses.
  • Mascal bytecode strflow - the bytecode implementation that uses structural flow. However, it is not optimal because it calculates the jump address every time by scanning the instructions.
  • Mascal bytecode strflow-cached - the implementation with cached addresses to jump in the interpreter. It calculates the jump address only once on loading the bytecode.

It is very small difference to measure accurately, so I increased the number of iterations from 256 to 1024 so that the task takes longer time, and extracted only relevant measurements to Mascal.

mandel-time

It is still marginal difference, but it seems consistent that the speed is Mascal strflow < Mascal strflow-cached < Mascal bytecode. It makes sense because the timing of calculating jump addresses are like below:

  • Mascal bytecode
    • Compile
      • Calculate jump address (fixup)
    • Load
    • Execute
  • Mascal strflow-cached
    • Compile
    • Load
      • Calculate jump address (fixup)
    • Execute
  • Mascal strflow
    • Compile
    • Load
    • Execute
      • Calculate jump address (fixup)

As you go down, it comes closer to the runtime, so it will put more workload on the execution.

Also note that I added Mascal varlen, which is an implementation of variable length instructions in bytecode in varlen branch. It is the most compact representation, but not necessarily the fastest.

Conclusion

The performance is marginally better if we calculate the jump address at compile times (as real CPU instructions do), but the overhead can be minimized by caching the jump addresses at loading time.

Do we still want to apply this change? We want to have variable length instructions to minimize cache memory requirement for the instructions, which would require structural control flow. However, experiments show that variable length encoding adds significant runtime overhead, which cancels the benefit of memory locality.

msakuta added 18 commits October 9, 2024 02:31
In WebAssembly, jump instruction does not contain explicit address,
which makes the compiler much easier to implement, because it does not
have to "fixup" the jump address if it is jumping forward.
The work of calculating the destination address is responsibility of the
runtime virtual machine. This design is a necessity rather than a
preference since Wasm uses variable length encoding for integers,
calculating offset later would shift a whole chunk of instructions.

I like this design so much so that I steal the idea of structural
control flow. Now it looks like a hybrid of higher level language and
assembly.
Now it follows Wasm structured control flow model completely.
It's like operand stack, but for blocks.
Because we resolve the addresses at cache_bytecode(), we don't need to
do it in runtime.
@msakuta msakuta force-pushed the master branch 4 times, most recently from 70003a0 to ba6a556 Compare November 29, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant