Skip to content

Add AI credit cap observability attributes to OTLP conclusion spans#38550

Merged
mnkiefer merged 4 commits into
mainfrom
copilot/add-ai-credit-cap-observability-attributes
Jun 11, 2026
Merged

Add AI credit cap observability attributes to OTLP conclusion spans#38550
mnkiefer merged 4 commits into
mainfrom
copilot/add-ai-credit-cap-observability-attributes

Conversation

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

This change extends OTLP conclusion-span telemetry to include AI credit cap context, so runs can be distinguished by budget configuration and cap-related failure signals alongside existing gh-aw.aic cost data. It surfaces cap value and cap/rate-limit state already resolved in ai_credits_context.cjs without adding provenance or unrelated billing metadata.

  • Conclusion-span AI credit cap attributes

    • send_otlp_span.cjs now imports and uses resolveAICreditsFailureState().
    • Emits gh-aw.max_ai_credits as a numeric attribute only when a valid value is available.
    • Emits gh-aw.max_ai_credits_exceeded and gh-aw.ai_credits_rate_limit_error as boolean attributes on conclusion spans.
  • Scope and behavior constraints preserved

    • Existing gh-aw.aic behavior is unchanged.
    • No source/provenance fields (e.g., gh-aw.ai_credits.source) were added.
    • New attributes are attached to conclusion spans, not unrelated spans (including dedicated agent spans).
  • Test coverage updates

    • Added assertions in send_otlp_span.test.cjs for:
      • numeric emission of gh-aw.max_ai_credits
      • boolean emission of gh-aw.max_ai_credits_exceeded
      • boolean emission of gh-aw.ai_credits_rate_limit_error
      • omission of malformed/missing gh-aw.max_ai_credits
      • conclusion-only placement of new fields
  • Observability docs/spec alignment

    • Updated docs/src/content/docs/reference/open-telemetry.mdx attribute table.
    • Updated specs/otel-observability-spec.md conclusion-span conditional attributes.
const { maxAICredits, aiCreditsRateLimitError, maxAICreditsExceeded } = resolveAICreditsFailureState();

const maxAICreditsValue = normalizeNonNegativeNumber(maxAICredits);
if (typeof maxAICreditsValue === "number") {
  attributes.push(buildAttr("gh-aw.max_ai_credits", maxAICreditsValue));
}
if (typeof maxAICreditsExceeded === "boolean") {
  attributes.push(buildAttr("gh-aw.max_ai_credits_exceeded", maxAICreditsExceeded));
}
if (typeof aiCreditsRateLimitError === "boolean") {
  attributes.push(buildAttr("gh-aw.ai_credits_rate_limit_error", aiCreditsRateLimitError));
}

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
@mnkiefer mnkiefer marked this pull request as ready for review June 11, 2026 07:59
Copilot AI review requested due to automatic review settings June 11, 2026 07:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request extends the OTLP conclusion span telemetry emitted by actions/setup to include AI credit cap context (configured cap and cap/rate-limit failure signals), enabling downstream observability to distinguish runs by budget configuration alongside existing gh-aw.aic cost data.

Changes:

  • Emit gh-aw.max_ai_credits, gh-aw.max_ai_credits_exceeded, and gh-aw.ai_credits_rate_limit_error as conclusion-span attributes via resolveAICreditsFailureState().
  • Add Vitest assertions ensuring the new fields are emitted on conclusion spans (and not on dedicated agent spans), including omission behavior for invalid cap values.
  • Update the OpenTelemetry reference docs and the observability spec to include the new attributes.
Show a summary per file
File Description
actions/setup/js/send_otlp_span.cjs Adds conclusion-span attribute emission for max AI credits and cap/rate-limit failure state.
actions/setup/js/send_otlp_span.test.cjs Adds tests validating emission/omission and conclusion-only placement of the new attributes.
docs/src/content/docs/reference/open-telemetry.mdx Documents the new OTel attributes in the reference table.
specs/otel-observability-spec.md Updates the conclusion-span conditional attributes table to include the new fields.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 2

Comment thread actions/setup/js/send_otlp_span.test.cjs Outdated
Comment thread actions/setup/js/send_otlp_span.cjs Outdated
@mnkiefer

Copy link
Copy Markdown
Collaborator

@copilot Address all remaining review comments.

Copilot AI and others added 2 commits June 11, 2026 09:19
…t to use fs mocks

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot Address all remaining review comments.

Done in commits b2a7b29 and 1fc3c4c:

Console logging noise (send_otlp_span.cjs:2363): Added a { logProvenance = true } options parameter to resolveAICreditsFailureState() in ai_credits_context.cjs. All [ai-credits] ... console.log lines are now guarded by if (logProvenance), so existing callers that rely on provenance logs are unaffected. The call in send_otlp_span.cjs passes { logProvenance: false } to keep OTLP exports silent.

Flaky test (send_otlp_span.test.cjs:3612-3614): Replaced the real-file approach (which was broken because mkdirSync is globally mocked in the describe block's beforeEach) with spies on the low-level fs calls consumed by parseAICreditsExceededFromAgentStdioexistsSync, statSync, openSync, readSync, and closeSync — each restored in a finally block.

Copilot AI requested a review from mnkiefer June 11, 2026 09:21
@mnkiefer

Copy link
Copy Markdown
Collaborator

@copilot Fix tests

… tests

Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot Fix tests

Fixed in the latest commit. The root cause was a cross-test contamination in shard 1/4:

  1. sendJobConclusionSpan (called from my new send_otlp_span.test.cjs tests) writes spans to /tmp/gh-aw/otel.jsonl, including "gh-aw.ai_credits_rate_limit_error" as an attribute key.
  2. handle_agent_failure.cjs's hasEngineRateLimit429InOTELMirror() reads that file and applies ENGINE_RATE_LIMIT_429_RE. The pattern rate_limit_(?:error|exceeded) matched the substring rate_limit_error in the attribute key name — a false positive.
  3. This caused buildEngineFailureContext() to detect a 429 and try to render a missing template, catching the exception and returning "".

Two fixes:

  • Regex: Added \b word boundaries around rate_limit_(?:error|exceeded) so it no longer matches when rate_limit is part of a larger identifier like ai_credits_rate_limit_error.
  • Test isolation: Added GH_AW_OTEL_JSONL_PATH to the buildEngineFailureContext with terminal_reason guard describe block's beforeEach/afterEach, following the same pattern already used in the sibling buildEngineFailureContext describe block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants