Structural flow for instructions by msakuta · Pull Request #22 · msakuta/mascal

msakuta · 2024-10-13T04:28:11Z

Background

In WebAssembly, jump instruction does not contain explicit address, which makes the compiler much easier to implement, because it does not have to "fixup" the jump address if it is jumping forward.
The work of calculating the destination address is responsibility of the runtime virtual machine.
This design is a necessity rather than a preference since Wasm uses variable length encoding for integers, calculating offset later would shift a whole chunk of instructions.

Another reason Wasm uses structured control flow is the safety. It won't break predictable behavior by jumping to a random address by a bug in bytecode, since the jump address is determined by the structure. The address is inferred by the block nesting level, which is easily verified by checking if it's less than the block stack size.

Also, structured control flow is much easier to read and reason about. Jump addresses are harder to understand and debug, since the information of control structure is lost by compiling from AST to bytecode.

I like this design so much so that I steal the idea of structural control flow. Now it looks like a hybrid of higher level language and assembly.

structural flow instructions

We have following new "instructions" with quotes, because they are not found in real CPUs.

enum OpCode {
    // ...
    /// Marks the beginning of a conditional block, should be followed by Else or End control flow instruction.
    If,
    /// Marks the beginning of an else block of a conditional block, should be followed by an End control flow instruction.
    Else,
    /// Marks the beginning of a block, in which jump instructions will jump to the end.
    Block,
    /// Marks the beginning of a loop, which is the destination instruction when a jump instruction is called.
    Loop,
    /// Marks the end of a control block.
    #[default]
    End,
}

The stack for control blocks, pushed each time by Block or Loop instruction and popped by End.
Jump instructions jump forward in Block and back in Loop.

We follow the WebAssembly VM model, where loops and branches are implemented as blocks (structured control flow).
The block can be one of the following:

Block (jump forward)

Block
...
End
Loop (jump backward)

Loop
...
End
If (skip forward conditionally)

If
...
End
If/Else (skip to else clause conditionally)

If
...
Else
...
End

Any jump instructions (Jmp, Jt and Jf) in these blocks will transfer the control flow to a new address, either the beginning of a block (Loop) or end (all the other instructions).

Compiler generated code

In practice, Block and Loop will always come in pairs. If you compile a for, loop or while loop, the instructions would look like this:

Loop
  Block
    ...
  End
End

The inner part is the body of the loop control structure. Jmp _ 1 will jump to the end of the block, meaning escaping the loop, making break. Jmp _ 2 will jump to the beginning of the loop, making continue flow.

Disassembly improvement

Now the disassembly is much more ergonomic. It shows the control flow structure by indentation.

Instructions(28):
  [  0] LoadLiteral 0 1 (I64(0))
  [  1] LoadLiteral 1 2 (I64(3))
  [  2] Loop 0 0
  [  3]   Block 0 0
  [  4]     Move 1 3
  [  5]     Lt 3 2
  [  6]     Jf 3 1
  [  7]     LoadLiteral 2 4 (I64(1))
  [  8]     LoadLiteral 1 5 (I64(3))
  [  9]     Loop 0 0
  [ 10]       Block 0 0
  [ 11]         Move 4 6
  [ 12]         Lt 6 5
  [ 13]         Jf 6 1
  [ 14]         Move 1 7
  [ 15]         Mul 7 4
  [ 16]         LoadLiteral 3 8 (Str("print"))
  [ 17]         Move 7 9
  [ 18]         Call 1 8
  [ 19]         Incr 4 0
  [ 20]         Jmp 0 2
  [ 21]       End 0 0
  [ 22]     End 0 0
  [ 23]     Incr 1 0
  [ 24]     Jmp 0 2
  [ 25]   End 0 0
  [ 26] End 0 0
  [ 27] Ret 0 8

It was like this before.

Instructions(20):
  [0] LoadLiteral 0 1 (I64(0))
  [1] LoadLiteral 1 2 (I64(3))
  [2] Move 1 3
  [3] Lt 3 2
  [4] Jf 3 19
  [5] LoadLiteral 2 4 (I64(1))
  [6] LoadLiteral 1 5 (I64(3))
  [7] Move 4 6
  [8] Lt 6 5
  [9] Jf 6 17
  [10] Move 1 7
  [11] Mul 7 4
  [12] LoadLiteral 3 8 (Str("print"))
  [13] Move 7 9
  [14] Call 1 8
  [15] Incr 4 0
  [16] Jmp 0 7
  [17] Incr 1 0
  [18] Jmp 0 2
  [19] Ret 0 8

Benchmark

Since structural flow puts the burden of calculating jump address to the bytecode interpreter, we would like to know how much is the impact on perfoamnce.
Here is the latest measurement of Mandelbrot set ascii art rendering time among other languages. Error bars are standard deviation of 5 runs.

The ones relevant are named Mascal.

Mascal AST interpreter - an interpreter that evaluates the AST directly. It is very slow
Mascal bytecode - the implementation of bytecode compiler before this PR. It is fast, but the compiler needs to calculate and "fixup" addresses.
Mascal bytecode strflow - the bytecode implementation that uses structural flow. However, it is not optimal because it calculates the jump address every time by scanning the instructions.
Mascal bytecode strflow-cached - the implementation with cached addresses to jump in the interpreter. It calculates the jump address only once on loading the bytecode.

It is very small difference to measure accurately, so I increased the number of iterations from 256 to 1024 so that the task takes longer time, and extracted only relevant measurements to Mascal.

It is still marginal difference, but it seems consistent that the speed is Mascal strflow < Mascal strflow-cached < Mascal bytecode. It makes sense because the timing of calculating jump addresses are like below:

Mascal bytecode
- Compile
  - Calculate jump address (fixup)
- Load
- Execute
Mascal strflow-cached
- Compile
- Load
  - Calculate jump address (fixup)
- Execute
Mascal strflow
- Compile
- Load
- Execute
  - Calculate jump address (fixup)

As you go down, it comes closer to the runtime, so it will put more workload on the execution.

Also note that I added Mascal varlen, which is an implementation of variable length instructions in bytecode in varlen branch. It is the most compact representation, but not necessarily the fastest.

Conclusion

The performance is marginally better if we calculate the jump address at compile times (as real CPU instructions do), but the overhead can be minimized by caching the jump addresses at loading time.

Do we still want to apply this change? We want to have variable length instructions to minimize cache memory requirement for the instructions, which would require structural control flow. However, experiments show that variable length encoding adds significant runtime overhead, which cancels the benefit of memory locality.

In WebAssembly, jump instruction does not contain explicit address, which makes the compiler much easier to implement, because it does not have to "fixup" the jump address if it is jumping forward. The work of calculating the destination address is responsibility of the runtime virtual machine. This design is a necessity rather than a preference since Wasm uses variable length encoding for integers, calculating offset later would shift a whole chunk of instructions. I like this design so much so that I steal the idea of structural control flow. Now it looks like a hybrid of higher level language and assembly.

Now it follows Wasm structured control flow model completely.

It's like operand stack, but for blocks.

Because we resolve the addresses at cache_bytecode(), we don't need to do it in runtime.

msakuta added 18 commits October 9, 2024 02:31

Fix block scanning and nested for loop

21fde81

Fix while statement

75e1d0e

Add If/Else block and fix loop control flow

675182d

Now it follows Wasm structured control flow model completely.

Properly skip else clause

cdcd420

Show break statement position in error report

3102b32

Fix for loop block stack in compiler

8fab5b1

Fix while block stack

7da4802

Fix block stack size when returned from a function

10185ae

It's like operand stack, but for blocks.

Count If block nest count properly

478bee7

Add indentation to disassembly to show flow structure more clearly

5e7a3d9

Cache jump map before executing code

e2786ba

Make cache_bytecode a method for a more consistent API

7781bb6

Remove debug prints and warnings

48f0760

Fix while loop and tests

5de70d4

Fix missing TryFrom for If and Else opcodes

7231ecd

We don't really need to keep track of block_stack

bba36d9

Because we resolve the addresses at cache_bytecode(), we don't need to do it in runtime.

Remove irrelevant comments

858e25e

msakuta force-pushed the master branch 4 times, most recently from 70003a0 to ba6a556 Compare November 29, 2025 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structural flow for instructions#22

Structural flow for instructions#22
msakuta wants to merge 18 commits intomasterfrom
strflow

msakuta commented Oct 13, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

msakuta commented Oct 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

structural flow instructions

Compiler generated code

Disassembly improvement

Benchmark

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

msakuta commented Oct 13, 2024 •

edited

Loading