Benchmarking programs and results
Apps in this repo:
l_bench.c/b_bench.c- Runcpuid1000000 times. This should have similar results compared to running on Linux or BareMetal ascpuidis an "expensive" instruction.l_raytrace.c/b_raytrace.c- A cpu-bound raytracing program.l_ethernet_bench.c/b_ethernet_bench.c- Poll the network 1000000 times. The Linux version is pinned to a single CPU core to prevent additional delays. This isn't needed in BareMetal.- Assembly versions for BareMetal are in the assembly directory.
For network load testing netflood was used. Both physical systems were wired directly to a 5-port 10Gbit NICGIGA switch (Model S100-0500T).
All Debian systems were installed without the "Debian desktop environment" and "GNOME". "SSH server" was added.
apt install git gcc nasm
Physical system for testing is as follows:
- Framework Desktop
- AMD RYZEN AI MAX+ 395 w/ Radeon™ 8060S × 32
- 128.0 GiB RAM
- Ubuntu 25.10
VMs were started as follows:
Linux:
qemu-system-x86_64 -machine q35 -name "Debian 13.3.0" -smp sockets=1,cpus=1 -cpu host -enable-kvm -m 1024 -drive id=disk0,file=debian1330.img,if=none,format=raw -device virtio-scsi-pci -device scsi-hd,drive=disk0
BareMetal:
qemu-system-x86_64 -machine q35 -name "BareMetal OS" -smp sockets=1,cpus=1 -cpu host -enable-kvm -m 256 -drive id=disk0,file="sys/baremetal_os.img",if=none,format=raw -device ide-hd,drive=disk0
Executing cpuid instruction in a loop. Times between Linux and BareMetal should be similar.
ian@debian-vm:~/Code/Benchmark$ ./l_bench
Iterations: 1000000
Average: 2257.36 ns
ian@debian-vm:~/Code/Benchmark$ ./l_bench
Iterations: 1000000
Average: 2246.52 ns
ian@debian-vm:~/Code/Benchmark$ ./l_bench
Iterations: 1000000
Average: 2251.37 ns
ian@debian-vm:~/Code/Benchmark$
> load
Enter file number: 4
> exec
Iterations: 1000000
Average: 2223 ns
> exec
Iterations: 1000000
Average: 2190 ns
> exec
Iterations: 1000000
Average: 2196 ns
>
Execution time between BareMetal and Linux was similar as expected.
Raytrace app generating a 1920x1080x32bpp image in RAM.
ian@debian-vm:~/Code/Benchmark$ ./l_raytrace
raytrace...
Time: 351 s
ian@debian-vm:~/Code/Benchmark$ ./l_raytrace
raytrace...
Time: 350 s
ian@debian-vm:~/Code/Benchmark$
> load
Enter file number: 4
> exec
raytrace...
Time: 323 s
> exec
raytrace...
Time: 323 s
>
The raytrace app executed 8% faster on BareMetal compared to Linux.
Note: This application ran correctly on BareMetal with only 16 MB of RAM assigned to the VM. Linux crashed on startup until it was given at least 192 MB of RAM. Alpine Linux was able to run the raytrace app with only 128 MB of RAM assigned to the VM but added a couple seconds to the execution time.
Executing relevant function for reading Ethernet packets.
Testing was done against the virtio-net-pci interface (enp0s4).
qemu-system-x86_64 -machine q35 -name "Debian 13.3.0" -smp sockets=1,cpus=4 -cpu host -enable-kvm -m 4096 -drive id=disk0,file=debian1330.img,if=none,format=raw -device virtio-scsi-pci -device scsi-hd,drive=disk0 -netdev user,id=nat0 -device e1000,netdev=nat0 -netdev socket,id=priv0,listen=:12345 -device virtio-net-pci,netdev=priv0
root@debian-vm:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp0s4 -n 1000000
Iterations: 1000000
Average: 107.87 ns
Bytes received: 0
root@debian-vm:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp0s4 -n 1000000
Iterations: 1000000
Average: 111.37 ns
Bytes received: 0
root@debian-vm:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp0s4 -n 1000000
Iterations: 1000000
Average: 108.34 ns
Bytes received: 0
root@debian-vm:/home/ian/Code/Benchmark$
qemu-system-x86_64 -machine q35 -name "BareMetal OS" -smp sockets=1,cpus=4 -cpu host -enable-kvm -m 256 -drive id=disk0,file="sys/baremetal_os.img",if=none,format=raw -device ide-hd,drive=disk0 -netdev socket,id=testnet1,connect=localhost:12345 -device virtio-net-pci,netdev=testnet1,mac=10:11:12:00:1A:F4
> load
Enter file number: 5
> exec
Iterations: 1000000
Average: 25 ns
Bytes received: 0
> exec
Iterations: 1000000
Average: 26 ns
Bytes received: 0
> exec
Iterations: 1000000
Average: 25 ns
Bytes received: 0
>
> load
Enter file number: 5
> exec
Iterations: 1000000
Average: 13 ns
Bytes received: 0
> exec
Iterations: 1000000
Average: 14 ns
Bytes received: 0
> exec
Iterations: 1000000
Average: 13 ns
Bytes received: 0
>
Linux imposes significant overhead compared to BareMetal when reading from the network. Calling the BareMetal kernel from C also imposes some overhead - likely due to saving/restoring state.
BareMetal (Assembly) = ~8.5× faster than Linux BareMetal (C) = ~4.4× faster than Linux
Specs:
- AMD Ryzen 7 7700X - Zen 4 (Raphael) - 8 cores, base 4.50GHz, boost 5.40GHz
- ASUS PRIME B650M-A II
- 16GiB RAM (1x 16GiB DDR5)
- 240GB SATA (Kingston)
- Intel X540-T1 10Gbit network card (NICGIGA)
- Internal network adaptor disabled in BIOS
ian@debian-amd:~/Code/Benchmark$ ./l_bench
Iterations: 1000000
Average: 28.51 ns
ian@debian-amd:~/Code/Benchmark$ ./l_bench
Iterations: 1000000
Average: 27.99 ns
ian@debian-amd:~/Code/Benchmark$ ./l_bench
Iterations: 1000000
Average: 28.11 ns
ian@debian-amd:~/Code/Benchmark$
> loadr
Enter file number: 4
> exec
Iterations: 1000000
Average: 27 ns
> exec
Iterations: 1000000
Average: 27 ns
> exec
Iterations: 1000000
Average: 27 ns
>
Execution time between BareMetal and Linux was similar as expected.
root@debian-amd:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 187.04 ns
Bytes received: 0
root@debian-amd:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 187.50 ns
Bytes received: 0
root@debian-amd:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 187.28 ns
Bytes received: 0
root@debian-amd:/home/ian/Code/Benchmark$
root@debian-amd:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 262.24 ns
Bytes received: 322479000
root@debian-amd:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 261.95 ns
Bytes received: 322227000
root@debian-amd:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 262.30 ns
Bytes received: 322746000
root@debian-amd:/home/ian/Code/Benchmark$
> loadr
Enter file number: 5
> exec
Iterations: 1000000
Average: 3 ns
Bytes received: 0
> exec
Iterations: 1000000
Average: 3 ns
Bytes received: 0
> exec
Iterations: 1000000
Average: 3 ns
Bytes received: 0
>
> loadr
Enter file number: 5
> exec
Iterations: 1000000
Average: 4 ns
Bytes received: 8650500
> exec
Iterations: 1000000
Average: 4 ns
Bytes received: 8598000
> exec
Iterations: 1000000
Average: 4 ns
Bytes received: 8581500
>
Linux imposes significant overhead compared to BareMetal when reading from the network.
BareMetal (no load)= ~62.3× faster than Linux BareMetal (load)= ~65.5× faster than Linux
Specs:
- Intel® Core™ i5-12400 - Alder Lake - 6 cores, base 2.50GHz, boost 4.40GHz
- ASUS PRIME B760M-A AX
- 16GiB RAM (1x 16GiB DDR5)
- 128GB NVMe (Patriot)
- Intel X540-T1 10Gbit network card (Beijing Sinead)
- Internal network adaptors disabled in BIOS
ian@debian-intel:~/Code/Benchmark$ ./l_bench
Iterations: 1000000
Average: 32.17 ns
ian@debian-intel:~/Code/Benchmark$ ./l_bench
Iterations: 1000000
Average: 32.16 ns
ian@debian-intel:~/Code/Benchmark$ ./l_bench
Iterations: 1000000
Average: 32.42 ns
ian@debian-intel:~/Code/Benchmark$
> loadr
Enter file number: 4
> exec
Iterations: 1000000
Average: 31 ns
> exec
Iterations: 1000000
Average: 31 ns
> exec
Iterations: 1000000
Average: 31 ns
>
Execution time between BareMetal and Linux was similar as expected.
root@debian-intel:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 113.73 ns
Bytes received: 0
root@debian-intel:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 113.67 ns
Bytes received: 0
root@debian-intel:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 113.82 ns
Bytes received: 0
root@debian-intel:/home/ian/Code/Benchmark$
root@debian-intel:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 146.13 ns
Bytes received: 179830500
root@debian-intel:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 147.71 ns
Bytes received: 181471500
root@debian-intel:/home/ian/Code/Benchmark$ ./l_ethernet_bench enp1s0 -n 1000000
Iterations: 1000000
Average: 148.75 ns
Bytes received: 182830500
root@debian-intel:/home/ian/Code/Benchmark$
> loadr
Enter file number: 5
> exec
Iterations: 1000000
Average: 3 ns
Bytes received: 0
> exec
Iterations: 1000000
Average: 3 ns
Bytes received: 0
> exec
Iterations: 1000000
Average: 3 ns
Bytes received: 0
>
> loadr
Enter file number: 5
> exec
Iterations: 1000000
Average: 4 ns
Bytes received: 8181000
> exec
Iterations: 1000000
Average: 4 ns
Bytes received: 8188500
> exec
Iterations: 1000000
Average: 4 ns
Bytes received: 8184000
>
Linux imposes significant overhead compared to BareMetal when reading from the network.
BareMetal (no load)= ~37.7× faster than Linux BareMetal (load)= ~36.8× faster than Linux
// EOF