Skip to content

WIP feat (browsers): create throughput benchmark for browser providers#115

Open
kisernl wants to merge 2 commits into
masterfrom
step-throughput-benchmark
Open

WIP feat (browsers): create throughput benchmark for browser providers#115
kisernl wants to merge 2 commits into
masterfrom
step-throughput-benchmark

Conversation

@kisernl

@kisernl kisernl commented May 7, 2026

Copy link
Copy Markdown
Collaborator

This pull request introduces a new browser step throughput benchmark to measure and compare how quickly different browser providers can execute a sequence of agent-style actions within a single session. It adds a comprehensive workflow for automated benchmarking, updates documentation, and enhances configuration and reporting for these new benchmarks.

Key changes:

New Benchmarking Capability

  • Added a new GitHub Actions workflow (.github/workflows/browser-throughput-benchmarks.yml) to automate browser throughput benchmarking across multiple providers, including scheduled daily runs, PR-triggered runs, and result collection/posting.
  • Introduced new npm scripts in package.json for running browser throughput benchmarks per provider and for generating SVG summary tables. [1] [2]

Documentation

  • Added THROUGHPUT.md to thoroughly document the new browser step throughput benchmark, including its motivation, methodology, scoring, action sequence, and limitations.

Benchmark Implementation Improvements

  • Updated src/browser/benchmark.ts to allow configurable timeout and to correctly derive the iteration count for reporting, improving result accuracy and flexibility. [1] [2] [3]

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Browser Benchmark Results

# Provider Score Create Connect Navigate Release Total Status
1 Kernel 96.9 0.05s 0.11s 0.12s 0.05s 0.32s 10/10
2 Browserbase 93.7 0.21s 0.17s 0.13s 0.13s 0.65s 10/10
3 Hyperbrowser 89.9 0.30s 0.53s 0.24s 0.10s 1.27s 10/10
4 Steel 75.9 0.47s 0.86s 0.16s 0.16s 1.71s 9/10

View full run · SVG available as build artifact

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Browser Throughput Benchmark Results

# Provider Score APS (med) Task (med) Task (p95) Screenshot Status
1 Kernel 54.5 3.83/s 13.04s 14.73s 305ms 3/3
2 Browserbase 48.1 3.12/s 16.04s 16.40s 209ms 3/3
3 Hyperbrowser 20.6 1.52/s 32.86s 34.97s 857ms 3/3
4 Steel 7.7 1.72/s 29.06s 29.06s 618ms 1/3

View full run · SVG available as build artifact

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Sandbox Benchmark Results

Sequential

# Provider Score Median TTI P95 P99 Status
1 declaw 98.6 0.03s 0.29s 0.29s 10/10
2 daytona 96.5 0.24s 0.50s 0.50s 10/10
3 e2b 94.1 0.48s 0.74s 0.74s 10/10
4 upstash 93.8 0.56s 0.72s 0.72s 10/10
5 modal 93.2 0.56s 0.86s 0.86s 10/10
6 tensorlake 93.0 0.42s 1.10s 1.10s 10/10
7 blaxel 92.9 0.51s 1.01s 1.01s 10/10
8 vercel 92.6 0.68s 0.84s 0.84s 10/10
9 archil 90.9 0.47s 1.57s 1.57s 10/10
10 hopx 82.2 1.60s 2.06s 2.06s 10/10
11 runloop 81.8 0.78s 3.37s 3.37s 10/10
12 codesandbox 72.8 2.57s 2.96s 2.96s 10/10
13 cloudflare 58.8 2.33s 6.80s 6.80s 10/10
14 northflank 8.5 1.50s 1.50s 1.50s 1/10

Staggered

# Provider Score Median TTI P95 P99 Status
1 declaw 99.7 0.03s 0.03s 0.03s 10/10
2 archil 96.1 0.33s 0.49s 0.49s 10/10
3 daytona 95.6 0.25s 0.72s 0.72s 10/10
4 e2b 94.7 0.48s 0.59s 0.59s 10/10
5 blaxel 93.5 0.57s 0.76s 0.76s 10/10
6 upstash 93.4 0.61s 0.72s 0.72s 10/10
7 tensorlake 93.2 0.57s 0.85s 0.85s 10/10
8 vercel 92.7 0.63s 0.87s 0.87s 10/10
9 modal 86.3 0.57s 2.57s 2.57s 10/10
10 runloop 84.5 1.46s 1.68s 1.68s 10/10
11 hopx 83.9 1.44s 1.86s 1.86s 10/10
12 codesandbox 72.5 2.56s 3.05s 3.05s 10/10
13 cloudflare 64.9 2.83s 4.54s 4.54s 10/10
14 northflank 0.0 0.00s 0.00s 0.00s 0/10

Burst

# Provider Score Median TTI P95 P99 Status
1 declaw 99.4 0.05s 0.06s 0.06s 10/10
2 daytona 96.7 0.29s 0.39s 0.39s 10/10
3 archil 95.1 0.48s 0.51s 0.51s 10/10
4 tensorlake 95.1 0.42s 0.60s 0.60s 10/10
5 blaxel 93.1 0.62s 0.80s 0.80s 10/10
6 upstash 91.7 0.80s 0.88s 0.88s 10/10
7 vercel 91.5 0.77s 0.97s 0.97s 10/10
8 e2b 91.3 0.67s 1.15s 1.15s 10/10
9 modal 85.2 0.66s 2.70s 2.70s 10/10
10 runloop 83.8 1.47s 1.84s 1.84s 10/10
11 hopx 80.4 1.94s 1.99s 1.99s 10/10
12 codesandbox 69.9 2.82s 3.28s 3.28s 10/10
13 cloudflare 52.5 4.05s 5.81s 5.81s 10/10
14 northflank 41.4 1.64s 1.83s 1.83s 5/10

View full run · SVGs available as build artifacts

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Storage Benchmark Results

1MB Files

# Provider Score Download Throughput Upload Status
1 Tigris 96.4 0.03s 319.6 Mbps 0.07s 1000/1000
2 Cloudflare R2 94.5 0.16s 52.5 Mbps 0.22s 1000/1000

4MB Files

# Provider Score Download Throughput Upload Status
1 Tigris 97.1 0.07s 514.3 Mbps 0.14s 1000/1000
2 Cloudflare R2 94.7 0.21s 161.1 Mbps 0.35s 1000/1000

10MB Files

# Provider Score Download Throughput Upload Status
1 Tigris 97.9 0.11s 746.5 Mbps 0.35s 1000/1000
2 Cloudflare R2 94.0 0.39s 213.7 Mbps 0.76s 1000/1000

16MB Files

# Provider Score Download Throughput Upload Status
1 Tigris 97.4 0.18s 734.0 Mbps 0.37s 1000/1000
2 Cloudflare R2 93.6 0.56s 239.0 Mbps 0.83s 1000/1000

View full run · SVGs available as build artifacts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “browser step throughput” benchmark mode to measure per-action performance within a single long-lived browser session, complementing the existing browser lifecycle benchmark.

Changes:

  • Introduces a new browser-throughput benchmark runner (50-action Wikipedia loop), result schema, and composite scoring.
  • Adds provider configs, SVG generation, CLI wiring, and npm scripts for running and reporting throughput benchmarks.
  • Adds a dedicated GitHub Actions workflow to run/merge throughput results and post PR comments.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
THROUGHPUT.md Documents the new throughput benchmark methodology, scoring, running, and scheduling.
src/run.ts Adds a new browser-throughput mode to run the benchmark and write results.
src/merge-results.ts Adds merge + table-printing logic for browser-throughput artifacts.
src/browser/throughput-types.ts Defines result and provider config types for throughput benchmarking.
src/browser/throughput-scoring.ts Implements composite scoring + sorting for throughput results.
src/browser/throughput-providers.ts Adds provider definitions and session options (stealth/headless/viewport).
src/browser/throughput-benchmark.ts Implements the 50-action throughput benchmark runner and JSON writer.
src/browser/generate-throughput-svg.ts Generates an SVG leaderboard for throughput results.
results/browser-throughput/.gitkeep Ensures the results directory exists in-repo.
package.json Adds bench and SVG generation scripts for browser-throughput.
.github/workflows/browser-throughput-benchmarks.yml Adds CI workflow to run, merge, render, and publish throughput benchmark results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/browser/throughput-scoring.ts Outdated
Comment thread src/run.ts
Comment thread src/browser/throughput-benchmark.ts
Comment thread THROUGHPUT.md
Comment thread THROUGHPUT.md Outdated
Comment thread src/merge-results.ts
@kisernl kisernl force-pushed the step-throughput-benchmark branch from 06440b7 to c65de32 Compare June 15, 2026 15:04
@open-cla

open-cla Bot commented Jun 15, 2026

Copy link
Copy Markdown

Contributor License Agreement

All contributors are covered by a CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants