WIP feat (browsers): create throughput benchmark for browser providers by kisernl · Pull Request #115 · computesdk/benchmarks

kisernl · 2026-05-07T15:59:25Z

This pull request introduces a new browser step throughput benchmark to measure and compare how quickly different browser providers can execute a sequence of agent-style actions within a single session. It adds a comprehensive workflow for automated benchmarking, updates documentation, and enhances configuration and reporting for these new benchmarks.

Key changes:

New Benchmarking Capability

Added a new GitHub Actions workflow (.github/workflows/browser-throughput-benchmarks.yml) to automate browser throughput benchmarking across multiple providers, including scheduled daily runs, PR-triggered runs, and result collection/posting.
Introduced new npm scripts in package.json for running browser throughput benchmarks per provider and for generating SVG summary tables. [1] [2]

Documentation

Added THROUGHPUT.md to thoroughly document the new browser step throughput benchmark, including its motivation, methodology, scoring, action sequence, and limitations.

Benchmark Implementation Improvements

Updated src/browser/benchmark.ts to allow configurable timeout and to correctly derive the iteration count for reporting, improving result accuracy and flexibility. [1] [2] [3]

github-actions · 2026-05-07T16:00:51Z

Browser Benchmark Results

#	Provider	Score	Create	Connect	Navigate	Release	Total	Status
1	Kernel	96.9	0.05s	0.11s	0.12s	0.05s	0.32s	10/10
2	Browserbase	93.7	0.21s	0.17s	0.13s	0.13s	0.65s	10/10
3	Hyperbrowser	89.9	0.30s	0.53s	0.24s	0.10s	1.27s	10/10
4	Steel	75.9	0.47s	0.86s	0.16s	0.16s	1.71s	9/10

View full run · SVG available as build artifact

github-actions · 2026-05-07T16:01:52Z

Browser Throughput Benchmark Results

#	Provider	Score	APS (med)	Task (med)	Task (p95)	Screenshot	Status
1	Kernel	54.5	3.83/s	13.04s	14.73s	305ms	3/3
2	Browserbase	48.1	3.12/s	16.04s	16.40s	209ms	3/3
3	Hyperbrowser	20.6	1.52/s	32.86s	34.97s	857ms	3/3
4	Steel	7.7	1.72/s	29.06s	29.06s	618ms	1/3

View full run · SVG available as build artifact

github-actions · 2026-05-07T16:01:55Z

Sandbox Benchmark Results

Sequential

#	Provider	Score	Median TTI	P95	P99	Status
1	declaw	98.6	0.03s	0.29s	0.29s	10/10
2	daytona	96.5	0.24s	0.50s	0.50s	10/10
3	e2b	94.1	0.48s	0.74s	0.74s	10/10
4	upstash	93.8	0.56s	0.72s	0.72s	10/10
5	modal	93.2	0.56s	0.86s	0.86s	10/10
6	tensorlake	93.0	0.42s	1.10s	1.10s	10/10
7	blaxel	92.9	0.51s	1.01s	1.01s	10/10
8	vercel	92.6	0.68s	0.84s	0.84s	10/10
9	archil	90.9	0.47s	1.57s	1.57s	10/10
10	hopx	82.2	1.60s	2.06s	2.06s	10/10
11	runloop	81.8	0.78s	3.37s	3.37s	10/10
12	codesandbox	72.8	2.57s	2.96s	2.96s	10/10
13	cloudflare	58.8	2.33s	6.80s	6.80s	10/10
14	northflank	8.5	1.50s	1.50s	1.50s	1/10

Staggered

#	Provider	Score	Median TTI	P95	P99	Status
1	declaw	99.7	0.03s	0.03s	0.03s	10/10
2	archil	96.1	0.33s	0.49s	0.49s	10/10
3	daytona	95.6	0.25s	0.72s	0.72s	10/10
4	e2b	94.7	0.48s	0.59s	0.59s	10/10
5	blaxel	93.5	0.57s	0.76s	0.76s	10/10
6	upstash	93.4	0.61s	0.72s	0.72s	10/10
7	tensorlake	93.2	0.57s	0.85s	0.85s	10/10
8	vercel	92.7	0.63s	0.87s	0.87s	10/10
9	modal	86.3	0.57s	2.57s	2.57s	10/10
10	runloop	84.5	1.46s	1.68s	1.68s	10/10
11	hopx	83.9	1.44s	1.86s	1.86s	10/10
12	codesandbox	72.5	2.56s	3.05s	3.05s	10/10
13	cloudflare	64.9	2.83s	4.54s	4.54s	10/10
14	northflank	0.0	0.00s	0.00s	0.00s	0/10

Burst

#	Provider	Score	Median TTI	P95	P99	Status
1	declaw	99.4	0.05s	0.06s	0.06s	10/10
2	daytona	96.7	0.29s	0.39s	0.39s	10/10
3	archil	95.1	0.48s	0.51s	0.51s	10/10
4	tensorlake	95.1	0.42s	0.60s	0.60s	10/10
5	blaxel	93.1	0.62s	0.80s	0.80s	10/10
6	upstash	91.7	0.80s	0.88s	0.88s	10/10
7	vercel	91.5	0.77s	0.97s	0.97s	10/10
8	e2b	91.3	0.67s	1.15s	1.15s	10/10
9	modal	85.2	0.66s	2.70s	2.70s	10/10
10	runloop	83.8	1.47s	1.84s	1.84s	10/10
11	hopx	80.4	1.94s	1.99s	1.99s	10/10
12	codesandbox	69.9	2.82s	3.28s	3.28s	10/10
13	cloudflare	52.5	4.05s	5.81s	5.81s	10/10
14	northflank	41.4	1.64s	1.83s	1.83s	5/10

View full run · SVGs available as build artifacts

github-actions · 2026-05-07T16:08:23Z

Storage Benchmark Results

1MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Tigris	96.4	0.03s	319.6 Mbps	0.07s	1000/1000
2	Cloudflare R2	94.5	0.16s	52.5 Mbps	0.22s	1000/1000

4MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Tigris	97.1	0.07s	514.3 Mbps	0.14s	1000/1000
2	Cloudflare R2	94.7	0.21s	161.1 Mbps	0.35s	1000/1000

10MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Tigris	97.9	0.11s	746.5 Mbps	0.35s	1000/1000
2	Cloudflare R2	94.0	0.39s	213.7 Mbps	0.76s	1000/1000

16MB Files

#	Provider	Score	Download	Throughput	Upload	Status
1	Tigris	97.4	0.18s	734.0 Mbps	0.37s	1000/1000
2	Cloudflare R2	93.6	0.56s	239.0 Mbps	0.83s	1000/1000

View full run · SVGs available as build artifacts

Copilot

Pull request overview

Adds a new “browser step throughput” benchmark mode to measure per-action performance within a single long-lived browser session, complementing the existing browser lifecycle benchmark.

Changes:

Introduces a new browser-throughput benchmark runner (50-action Wikipedia loop), result schema, and composite scoring.
Adds provider configs, SVG generation, CLI wiring, and npm scripts for running and reporting throughput benchmarks.
Adds a dedicated GitHub Actions workflow to run/merge throughput results and post PR comments.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
THROUGHPUT.md	Documents the new throughput benchmark methodology, scoring, running, and scheduling.
src/run.ts	Adds a new `browser-throughput` mode to run the benchmark and write results.
src/merge-results.ts	Adds merge + table-printing logic for browser-throughput artifacts.
src/browser/throughput-types.ts	Defines result and provider config types for throughput benchmarking.
src/browser/throughput-scoring.ts	Implements composite scoring + sorting for throughput results.
src/browser/throughput-providers.ts	Adds provider definitions and session options (stealth/headless/viewport).
src/browser/throughput-benchmark.ts	Implements the 50-action throughput benchmark runner and JSON writer.
src/browser/generate-throughput-svg.ts	Generates an SVG leaderboard for throughput results.
results/browser-throughput/.gitkeep	Ensures the results directory exists in-repo.
package.json	Adds bench and SVG generation scripts for browser-throughput.
.github/workflows/browser-throughput-benchmarks.yml	Adds CI workflow to run, merge, render, and publish throughput benchmark results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

open-cla · 2026-06-15T15:04:52Z

Contributor License Agreement

All contributors are covered by a CLA.

kisernl requested a review from Copilot May 7, 2026 16:09

Copilot started reviewing on behalf of kisernl May 7, 2026 16:09 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread src/browser/throughput-scoring.ts Outdated

Comment thread src/run.ts

Comment thread src/browser/throughput-benchmark.ts

Comment thread THROUGHPUT.md

Comment thread THROUGHPUT.md Outdated

Comment thread src/merge-results.ts

kisernl added 2 commits June 15, 2026 14:55

feat (browsers): create throughput benchmark for browser providers

e41c90d

fix: resolve PR comments

c65de32

kisernl force-pushed the step-throughput-benchmark branch from 06440b7 to c65de32 Compare June 15, 2026 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP feat (browsers): create throughput benchmark for browser providers#115

WIP feat (browsers): create throughput benchmark for browser providers#115
kisernl wants to merge 2 commits into
masterfrom
step-throughput-benchmark

kisernl commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

open-cla Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kisernl commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Benchmarking Capability

Documentation

Benchmark Implementation Improvements

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Browser Benchmark Results

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Browser Throughput Benchmark Results

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sandbox Benchmark Results

Sequential

Staggered

Burst

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Storage Benchmark Results

1MB Files

4MB Files

10MB Files

16MB Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

open-cla Bot commented Jun 15, 2026

Contributor License Agreement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kisernl commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading

github-actions Bot commented May 7, 2026 •

edited

Loading