diff --git a/.config/dotnet-tools.json b/.config/dotnet-tools.json index 871b8f40a13..d7dabcda6df 100644 --- a/.config/dotnet-tools.json +++ b/.config/dotnet-tools.json @@ -57,6 +57,13 @@ "ilverify" ], "rollForward": false + }, + "microsoft.aitools.binlogmcp": { + "version": "1.0.0", + "commands": [ + "binlog-mcp" + ], + "rollForward": true } } } diff --git a/.github/agents/agentic-workflows.agent.md b/.github/agents/agentic-workflows.agent.md index f7e5eb4f1cd..b84bd918b94 100644 --- a/.github/agents/agentic-workflows.agent.md +++ b/.github/agents/agentic-workflows.agent.md @@ -1,4 +1,5 @@ --- +name: agentic-workflows description: GitHub Agentic Workflows (gh-aw) - Create, debug, and upgrade AI-powered workflows with intelligent prompt routing disable-model-invocation: true --- diff --git a/.github/agents/compiler-perf-investigator.md b/.github/agents/compiler-perf-investigator.md index bfad591e479..716821d5ae9 100644 --- a/.github/agents/compiler-perf-investigator.md +++ b/.github/agents/compiler-perf-investigator.md @@ -7,6 +7,8 @@ description: Specialized agent for investigating F# build performance issues usi These are **general investigation instructions** for this agent, a template for perf analysis of slow/problematic F# compilation and build, suitable for a variety of scenarios (repos, snippets, gists). +**Related tools:** the `binlog-analysis` skill fetches a build's MSBuild `.binlog` and analyzes it live via the `binlog-mcp` MCP — structured errors, root-cause diagnosis, and target/task/analyzer timings — a fast first pass over a build log before deeper trace/dump analysis. + --- ## PRINCIPLES OF OPERATION diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index fbc27d0ef18..693ab41a538 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -16,6 +16,7 @@ Build fails → 99% YOUR previous change broke it. You ARE the compiler. DON'T say "pre-existing", "infra issue", "unrelated". DO `git clean -xfd artifacts` and rebuild. Bootstrap contamination: early commits break compiler → later "fixes" still use broken bootstrap. Clean fully. +Triage a build failure → `binlog-analysis` skill fetches the binlog (local build or failed AzDo PR build) and analyzes it live via the `binlog-mcp` MCP (structured errors, root-cause diagnose, MSBuild perf X-ray). ## Test diff --git a/.github/skills/binlog-analysis/SKILL.md b/.github/skills/binlog-analysis/SKILL.md new file mode 100644 index 00000000000..2a353e43ceb --- /dev/null +++ b/.github/skills/binlog-analysis/SKILL.md @@ -0,0 +1,93 @@ +--- +name: binlog-analysis +description: >- + Triage a build / compile / restore / WarnAsError failure from its MSBuild + binary log. Fetches the binlog (a local build's, or a failed dotnet/fsharp + Azure DevOps PR build's published artifact) and analyzes it live via the + `binlog-mcp` MCP server — structured errors, root-cause diagnosis, and an + MSBuild perf X-ray. NOT for test failures or CheckCodeFormatting: a build + binlog has no errors there. +--- + +# Binlog Analysis (via the binlog-mcp MCP server) + +Boil a failed build down to root causes. This skill does two small things and +delegates the heavy lifting to an MCP server: + +1. **Fetch** the build's `*.binlog` — from your local build, or from a failed + `fsharp-ci` Azure DevOps PR build (downloads the published artifact). +2. **Hand the path to the `binlog-mcp` MCP server** (`Microsoft.AITools.BinlogMcp`), + which queries the binlog live (≈38 tools): structured errors, categorized root + causes, and an MSBuild X-ray (target/task/analyzer timings, incrementality, + double-writes, …). + +Because the analysis lives in the MCP server, this skill stays tiny — and gets +better automatically as that server gains features. + +## When to use + +- A local build (with `-bl`) or a failed `fsharp-ci` PR build broke and you need + to know **why** — compile / restore / analyzer / WarnAsError errors, or build + perf. + +## When NOT to use + +- **Test** failures or **CheckCodeFormatting**. The build binlog has no errors in + that case: `binlog_overview` will show the build succeeded / 0 errors — stop and + use `pr-build-status` / `flaky-test-detector` instead. + +## Step 1 — get the binlog path + +```pwsh +# Local: newest *.binlog under /artifacts/log (build first, e.g. ./build.sh --binaryLog) +pwsh .github/skills/binlog-analysis/scripts/Get-Binlog.ps1 + +# Local: a specific file, directory, or glob +pwsh .github/skills/binlog-analysis/scripts/Get-Binlog.ps1 -BinlogPath artifacts/log/Debug/Build.binlog + +# Azure DevOps: latest FAILED fsharp-ci build for a PR (downloads + keeps the binlog) +pwsh .github/skills/binlog-analysis/scripts/Get-Binlog.ps1 -PrNumber 19941 + +# Azure DevOps: explicit build id; -AllLegs for every leg; -Json for a path list +pwsh .github/skills/binlog-analysis/scripts/Get-Binlog.ps1 -BuildId 1462217 -Json +``` + +It prints the resolved `*.binlog` path(s) (Azure DevOps artifacts are downloaded +to a temp folder and kept so the MCP can read them). + +## Step 2 — analyze via the binlog-mcp MCP tools + +With each path, call the MCP server (the argument is `binlog_file`): + +- `binlog_overview` — build status + error/warning counts. **Call this first** to + decide whether there's anything to analyze. +- `binlog_diagnose` — categorized root causes + next-step hints. +- `binlog_errors` / `binlog_warnings` — structured diagnostics (code / file / line + / column / project). +- `binlog_search` — free-form drill-down. +- Perf: `binlog_expensive_targets` / `binlog_expensive_tasks` / + `binlog_expensive_analyzers`, `binlog_incremental_analysis`, + `binlog_double_writes`, `binlog_target_graph`. + +> **Multi-targeting note.** Today `binlog_errors` returns one row per target +> framework, so a single source error in a multi-TFM project (e.g. +> FSharp.Compiler.Service → `net10.0;netstandard2.0`) appears once per TFM. A +> lossless dedup (`code,file,line` → set of TFMs) is proposed upstream in +> `dotnet-microsoft/ai-tools`; when it lands, this skill gets the deduped view for +> free — no change here. + +## Prerequisites + +- The **binlog-mcp MCP server** registered with your agent. The tool is pinned in + the repo's `.config/dotnet-tools.json` (`Microsoft.AITools.BinlogMcp`); run + `dotnet tool restore` once, then register it as an MCP server. For example, + Copilot CLI (`~/.copilot/mcp-config.json`): + ```jsonc + { "mcpServers": { "binlog-mcp": { + "command": "dotnet", "args": ["tool", "run", "binlog-mcp"], + "tools": ["*"], "deferTools": "auto" } } } + ``` + (gh-aw: a `mcp-servers:` block; VS Code: `.vscode/mcp.json`. Telemetry is on by + default — opt out with `DOTNET_CLI_TELEMETRY_OPTOUT=1` if desired.) +- PowerShell 7+ (`pwsh`) and a .NET 10 SDK (already required by the repo). +- Azure DevOps modes need network access to `dev.azure.com`. diff --git a/.github/skills/binlog-analysis/scripts/Get-Binlog.ps1 b/.github/skills/binlog-analysis/scripts/Get-Binlog.ps1 new file mode 100644 index 00000000000..45f4a144e6c --- /dev/null +++ b/.github/skills/binlog-analysis/scripts/Get-Binlog.ps1 @@ -0,0 +1,165 @@ +<# +.SYNOPSIS + Resolve a build's .binlog and print its path, for live analysis via the + `binlog-mcp` MCP server (Microsoft.AITools.BinlogMcp). Works on a local + build's binlog or on a failed dotnet/fsharp Azure DevOps PR build's + published binlog. + +.DESCRIPTION + This skill's job is ACQUISITION only — it does not analyze. It locates (and, + for Azure DevOps, downloads) the binary log and prints the path(s). Hand each + path to the binlog-mcp MCP tools (binlog_overview, binlog_diagnose, + binlog_errors, ...) with `binlog_file: `. + + Source (pick one): + * Local — pass -BinlogPath, or run with no arguments to auto-discover the + newest *.binlog under /artifacts/log. + * Azure DevOps — pass -PrNumber (latest failed `fsharp-ci` build) or an + explicit -BuildId; the build-leg binlog artifact is downloaded + to a temp folder and KEPT, so the MCP can read it afterwards. + +.PARAMETER BinlogPath + Local binlog source: a .binlog file, a directory (newest *.binlog inside, or + all with -AllLegs), or a glob. No download is performed. + +.PARAMETER PrNumber + GitHub PR number in dotnet/fsharp. The latest failed build is used. + +.PARAMETER BuildId + Explicit Azure DevOps build id. + +.PARAMETER AllLegs + Include every binlog rather than just the build leg / newest: all binlog + artifacts for an AzDo build, or every *.binlog in a local directory. + +.PARAMETER Json + Emit the resolved path(s) as JSON (`{ "binlogs": [ ... ] }`). + +.EXAMPLE + # Newest local build binlog (build first, e.g. ./build.sh --binaryLog): + pwsh Get-Binlog.ps1 + +.EXAMPLE + pwsh Get-Binlog.ps1 -BinlogPath artifacts/log/Debug/Build.binlog + +.EXAMPLE + pwsh Get-Binlog.ps1 -PrNumber 19941 + +.EXAMPLE + pwsh Get-Binlog.ps1 -BuildId 1462217 -Json +#> +[CmdletBinding(DefaultParameterSetName = 'Local')] +param( + [Parameter(ParameterSetName = 'ByPath', Mandatory, Position = 0)] + [string[]]$BinlogPath, + + [Parameter(ParameterSetName = 'ByPr', Mandatory, Position = 0)] + [int]$PrNumber, + + [Parameter(ParameterSetName = 'ByBuild', Mandatory, Position = 0)] + [long]$BuildId, + + [string]$Org = 'dnceng-public', + [string]$Project = 'public', + [int]$Definition = 90, + [switch]$AllLegs, + [switch]$Json +) + +$ErrorActionPreference = 'Stop' +$api = "https://dev.azure.com/$Org/$Project/_apis/build" + +function Resolve-BuildId([int]$pr) { + $url = "$api/builds?definitions=$Definition&reasonFilter=pullRequest&statusFilter=completed&`$top=100&api-version=7.1" + $builds = (Invoke-RestMethod -Uri $url).value | + Where-Object { $_.triggerInfo.'pr.number' -eq "$pr" } + if (-not $builds) { throw "No completed PR builds found for PR #$pr (definition $Definition)." } + $failed = $builds | Where-Object { $_.result -eq 'failed' } | Sort-Object finishTime -Descending + $chosen = if ($failed) { $failed[0] } else { ($builds | Sort-Object finishTime -Descending)[0] } + Write-Host "PR #$pr -> build $($chosen.id) ($($chosen.result), finished $($chosen.finishTime))" + return $chosen.id +} + +function Get-RepoRoot { (Resolve-Path (Join-Path $PSScriptRoot '..\..\..\..')).Path } + +function Resolve-LocalBinlogs([string[]]$paths, [bool]$all) { + $out = [System.Collections.Generic.List[string]]::new() + foreach ($p in $paths) { + if (Test-Path -LiteralPath $p -PathType Leaf) { + if ([IO.Path]::GetExtension($p) -eq '.binlog') { $out.Add((Resolve-Path -LiteralPath $p).Path) } + else { Write-Warning "Skipping non-binlog file: $p" } + } + elseif (Test-Path -LiteralPath $p -PathType Container) { + $found = Get-ChildItem -LiteralPath $p -Recurse -Filter *.binlog -ErrorAction SilentlyContinue | + Sort-Object LastWriteTime -Descending + if (-not $found) { throw "No *.binlog under directory: $p" } + if ($all) { $found | ForEach-Object { $out.Add($_.FullName) } } else { $out.Add($found[0].FullName) } + } + else { + $glob = Get-ChildItem -Path $p -ErrorAction SilentlyContinue | Where-Object { $_.Extension -eq '.binlog' } + if (-not $glob) { throw "Path not found or no .binlog match: $p" } + $glob | ForEach-Object { $out.Add($_.FullName) } + } + } + return $out +} + +$binlogs = [System.Collections.Generic.List[string]]::new() + +switch ($PSCmdlet.ParameterSetName) { + 'Local' { + $logDir = Join-Path (Get-RepoRoot) 'artifacts/log' + if (-not (Test-Path $logDir)) { + throw "No local build logs at '$logDir'. Build with a binary log first " + + "(e.g. ./build.sh --binaryLog or eng/Build.ps1 -binaryLog), or pass -BinlogPath." + } + Write-Host "Auto-discovering newest binlog under $logDir ..." + $binlogs = Resolve-LocalBinlogs @($logDir) $AllLegs.IsPresent + } + 'ByPath' { + $binlogs = Resolve-LocalBinlogs $BinlogPath $AllLegs.IsPresent + } + default { + # Azure DevOps modes (ByPr / ByBuild): download the build-leg binlog + # artifact and KEEP it so the binlog-mcp MCP server can read it. + if ($PSCmdlet.ParameterSetName -eq 'ByPr') { $BuildId = Resolve-BuildId $PrNumber } + $artifacts = (Invoke-RestMethod -Uri "$api/builds/$BuildId/artifacts?api-version=7.1").value + $selected = if ($AllLegs) { + $artifacts | Where-Object { $_.name -match 'binlog' } + } else { + $build = $artifacts | Where-Object { $_.name -match 'build binlog' } + if ($build) { $build } else { $artifacts | Where-Object { $_.name -match 'binlog' } } + } + if (-not $selected) { throw "Build $BuildId has no binlog artifacts." } + + $downloadDir = Join-Path ([IO.Path]::GetTempPath()) "binlog-analysis-$BuildId" + if (Test-Path $downloadDir) { Remove-Item -Recurse -Force $downloadDir } + New-Item -ItemType Directory -Force -Path $downloadDir | Out-Null + foreach ($a in $selected) { + $zip = Join-Path $downloadDir "$($a.name -replace '[^\w.-]', '_').zip" + Write-Host "Downloading artifact: $($a.name)" + Invoke-WebRequest -Uri $a.resource.downloadUrl -OutFile $zip + $dest = Join-Path $downloadDir ($a.name -replace '[^\w.-]', '_') + Expand-Archive -Path $zip -DestinationPath $dest -Force + Get-ChildItem $dest -Recurse -Filter *.binlog | ForEach-Object { $binlogs.Add($_.FullName) } + } + Write-Host "Kept under: $downloadDir" + } +} + +if ($binlogs.Count -eq 0) { throw "No .binlog files resolved." } + +if ($Json) { + [pscustomobject]@{ binlogs = @($binlogs) } | ConvertTo-Json + return +} + +Write-Host '' +Write-Host "Resolved $($binlogs.Count) binlog(s):" +foreach ($b in $binlogs) { Write-Host " $b" } +Write-Host '' +Write-Host 'Next: analyze with the binlog-mcp MCP server (arg name is binlog_file):' +Write-Host ' binlog_overview { binlog_file: "" } # build status + error/warning counts' +Write-Host ' binlog_diagnose { binlog_file: "" } # categorized root causes + next steps' +Write-Host ' binlog_errors { binlog_file: "" } # structured errors (code/file/line/project)' +Write-Host 'If the build succeeded / 0 errors (e.g. a test-only or formatting failure), there is nothing to analyze.' diff --git a/.github/skills/hypothesis-driven-debugging/SKILL.md b/.github/skills/hypothesis-driven-debugging/SKILL.md index 8dc28e02412..82af0d4afc8 100644 --- a/.github/skills/hypothesis-driven-debugging/SKILL.md +++ b/.github/skills/hypothesis-driven-debugging/SKILL.md @@ -16,6 +16,8 @@ Use this skill when: - Troubleshooting performance regressions - Examining warning/error message issues +> **Related:** for a build / compile / restore failure, run the `binlog-analysis` skill first — it fetches the build's MSBuild binary log and analyzes it live via the `binlog-mcp` MCP (structured errors + root-cause diagnosis), a fast way to scope the minimal reproduction below. + ## Core Principles 1. **Always start with a minimal reproduction** diff --git a/.github/skills/pr-build-status/SKILL.md b/.github/skills/pr-build-status/SKILL.md index 50c5cc139c8..9082989cf63 100644 --- a/.github/skills/pr-build-status/SKILL.md +++ b/.github/skills/pr-build-status/SKILL.md @@ -11,6 +11,8 @@ compatibility: Requires GitHub CLI (gh) authenticated with access to dotnet/fsha Retrieve and systematically analyze Azure DevOps build failures for GitHub PRs. +> **Related:** for build / compile / restore / WarnAsError failures, the `binlog-analysis` skill fetches the failed build's MSBuild binary log (local build or AzDo PR build) and analyzes it live via the `binlog-mcp` MCP — structured errors, root-cause diagnosis, and an MSBuild perf X-ray. Use it once you have the failed build or PR number. + ## CRITICAL: Collect-First Workflow **DO NOT push fixes until ALL errors are collected and reproduced locally.**