Feature Request: Environment preflight checks for E2E evals (fail fast)
Current Behavior
Long-running E2E evals fail mid-way if dependencies are missing (ffmpeg, pandoc, wkhtmltopdf, Python modules, etc.).
Real example from property-inspection-video-analysis E2E-1:
- Download 881MB of videos (takes 25+ minutes)
- Extract 74 frames with ffmpeg
- Then fail because
PIL not installed
- Wasted 30+ minutes before discovering missing dependency
Desired Behavior
Add optional env: or preflight: section to EVAL.yaml:
env:
required_commands:
- ffmpeg
- pandoc
- wkhtmltopdf
required_python_modules:
- PIL
- openai
preflight_check: true # Run before test cases start
Expected Behavior
When preflight_check: true, AgentV should:
- Check all
required_commands exist in PATH
- Check all
required_python_modules can be imported
- Fail immediately with clear message if anything missing
- Only proceed to test cases if all checks pass
Current Workaround
Manual setup script before eval run:
#!/bin/bash
set -e
command -v ffmpeg >/dev/null 2>&1 || { echo "ffmpeg required"; exit 1; }
python3 -c "import PIL" 2>/dev/null || { echo "PIL required"; exit 1; }
python3 -c "import openai" 2>/dev/null || { echo "openai required"; exit 1; }
# ... then run agentv pipeline ...
Reproduction
Repo: https://github.com/tsoyang-org/property-inspection-bench
Test cases:
- E2E-1 (video→report): ~30 min runtime, requires ffmpeg, PIL, openai
- E2E-2 (markdown→PDF): ~5 min runtime, requires pandoc, wkhtmltopdf/weasyprint
Both would benefit from fail-fast validation before starting long downloads.
Related
Issue #1207 - numeric shell operators (same project)
Feature Request: Environment preflight checks for E2E evals (fail fast)
Current Behavior
Long-running E2E evals fail mid-way if dependencies are missing (ffmpeg, pandoc, wkhtmltopdf, Python modules, etc.).
Real example from property-inspection-video-analysis E2E-1:
PILnot installedDesired Behavior
Add optional
env:orpreflight:section to EVAL.yaml:Expected Behavior
When
preflight_check: true, AgentV should:required_commandsexist in PATHrequired_python_modulescan be importedCurrent Workaround
Manual setup script before eval run:
Reproduction
Repo: https://github.com/tsoyang-org/property-inspection-bench
Test cases:
Both would benefit from fail-fast validation before starting long downloads.
Related
Issue #1207 - numeric shell operators (same project)