A. Executive Summary
Overall Status: WARN
The gh-aw.agent.setup span is correctly shaped, fully attributed, and successfully exported to Sentry. However, the Grafana Tempo endpoint — the second configured OTLP fan-out target — received zero connections during this run. This represents a silent fan-out failure: no export errors were written, yet no spans reached Grafana. The run is still in-flight (conclusion and agent spans are pending).
Main risks:
- Grafana Tempo is receiving no telemetry from gh-aw workflows (silent data loss)
- If the failure is systemic, historical span coverage in Grafana is zero for this workflow
Most likely root cause: Missing /v1/traces path suffix on the Grafana endpoint URL (https://otlp-gateway-prod-eu-west-2.grafana.net/otlp → should be .../otlp/v1/traces), causing a silent 404 that is non-retried and not logged to the error file, or the export error is swallowed before reaching /tmp/gh-aw/otlp-export-errors.jsonl.
B. Trace Completeness
| Metric |
Value |
| Validation window |
2026-05-26 05:32:44 UTC — in-flight |
service.name |
gh-aw.otlp-data-quality-validator |
github.run_id |
26434253996 |
Unique traceId in JSONL |
1 (1bc136b146e27e5d4f880a21b39aaca3) |
Unique span identity (traceId+spanId) |
1 |
| Duplicate spans |
0 |
| Spans in JSONL mirror |
1 (gh-aw.agent.setup) |
| Spans exported to Sentry |
13 TCP connections observed (all status 200) — covers activation.setup, activation.conclusion, agent.setup, and prior retries |
| Spans exported to Grafana |
0 connections observed |
| Confidence |
High for Sentry (firewall evidence); High for Grafana gap (no *.grafana.net connections in 257-entry audit log) |
Note: Conclusion span (gh-aw.agent.conclusion) and agent span (gh-aw.agent.agent) are not yet emitted — the agent job is still running. These are expected absent at validation time, not missing.
JSONL mirror path discrepancy: The workflow prompt and spec reference /tmp/gh-aw/agent/otel.jsonl, but the actual mirror is at /tmp/gh-aw/otel.jsonl. The agent/ subdirectory is empty. This path mismatch would cause any JSONL-based tooling pointed at the documented path to report false negatives.
C. Span Hierarchy Validation
Single-job agent workflow (activation + agent jobs).
| Check |
Result |
Setup spans share global parent (8b496f47c4e6810f) |
✅ PASS — gh-aw.agent.setup parents under global root span ID |
| Conclusion spans parent under their setup span |
⏳ PENDING — agent still running |
| Agent spans parent under conclusion span |
⏳ PENDING — agent still running |
Span naming pattern gh-aw.<job>.<op> |
✅ PASS — gh-aw.agent.setup matches |
parentSpanId of setup span ≠ empty |
✅ PASS — 8b496f47c4e6810f present |
GITHUB_AW_OTEL_PARENT_SPAN_ID == setup spanId |
✅ PASS — both equal c4daf0dc08e1d824 (conclusion span will parent correctly) |
Per spec §9.3, the single observed setup span correctly parents under the global root span ID.
D. Attribute Contract Validation
Setup span (gh-aw.agent.setup) — spec §10.1
All 7 required attributes present:
| Attribute |
Value |
gh-aw.job.name |
agent |
gh-aw.workflow.name |
OTLP Data Quality Validator |
gh-aw.run.id |
26434253996 |
gh-aw.run.attempt |
1 |
gh-aw.run.actor |
mnkiefer |
gh-aw.repository |
github/gh-aw |
gh-aw.staged |
false (boolValue) |
Additional attributes present: gh-aw.cli.version, gen_ai.system (github_models), gh-aw.engine.id, gh-aw.event_name, gh-aw.episode.id, gh-aw.episode.kind, gh-aw.hop.id, gh-aw.workflow_call.id.
Conclusion span — spec §10.2
⏳ PENDING (agent still running). Cannot validate required attributes: gh-aw.run.status, gh-aw.error_count, gh-aw.warning_count, gh-aw.action_minutes, gh-aw.output.item_count, gh-aw.otlp.export_errors.
Agent span — spec §10.3
⏳ PENDING. Cannot validate GenAI semantic conventions: gen_ai.system, gen_ai.request.model, gen_ai.operation.name, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens.
Resource attributes — spec §11.1
All 6 required resource attributes present:
| Attribute |
Value |
service.name |
gh-aw.otlp-data-quality-validator ✅ |
service.version |
1.0.52 ✅ |
github.repository |
github/gh-aw ✅ |
github.run_id |
26434253996 ✅ |
github.run_attempt |
1 ✅ |
github.actions.run_url |
https://github.com/github/gh-aw/actions/runs/26434253996 ✅ |
Additional resource attributes present: github.event_name, github.ref, github.ref_name, github.sha, github.job, github.workflow_ref, github.actor_id, runner.*, gh-aw.awf.version, deployment.environment.
Instrumentation scope — spec §11.3
✅ PASS — scope.name = gh-aw, scope.version = 1.0.52 (matches service.version).
E. Export and Fan-out Health
| Endpoint |
URL |
Status |
Firewall connections |
| Sentry |
https://o205451.ingest.us.sentry.io/api/4511347087179777/integration/otlp |
✅ Reachable (TCP_TUNNEL 200) |
13 connections observed |
| Grafana Tempo |
https://otlp-gateway-prod-eu-west-2.grafana.net/otlp |
⚠️ Zero connections |
0 in 257 audit entries |
Sentry header rewrite: ✅ Applied — Authorization header correctly rewritten to x-sentry-auth in GH_AW_OTLP_ENDPOINTS.
Export error log: No /tmp/gh-aw/agent/otlp-export-errors.jsonl or /tmp/gh-aw/agent/otlp-export-errors.count found. This means either Grafana errors are not being captured, or the export attempt never reaches the HTTP layer.
JSONL mirror write: ✅ /tmp/gh-aw/otel.jsonl contains the gh-aw.agent.setup span with correct content.
Multi-endpoint fan-out independence: Sentry is receiving spans independently of Grafana failure. Fan-out independence is structurally preserved on the Sentry side. Grafana side is silent-failing without affecting Sentry delivery.
F. Root-Cause Hypothesis
Primary hypothesis (confidence: MEDIUM-HIGH)
Grafana endpoint URL missing /v1/traces suffix.
- Configured:
https://otlp-gateway-prod-eu-west-2.grafana.net/otlp
- Expected:
https://otlp-gateway-prod-eu-west-2.grafana.net/otlp/v1/traces
Grafana Cloud's OTLP gateway requires the full path. Without /v1/traces, the server returns an HTTP 404. If the send_otlp_span.cjs error handler swallows non-retryable 4xx errors without writing to the error log, this would explain zero firewall connections — wait, we'd still see the TCP connection attempt. The fact that there are zero TCP connections to *.grafana.net suggests the export attempt is abandoned before the HTTP call is made.
Secondary hypothesis (confidence: MEDIUM)
Grafana GH_AW_OTEL_GRAFANA_ENDPOINT secret resolves to an empty string at compile time.
If the secret was missing when the lock file was compiled, the URL in GH_AW_OTLP_ENDPOINTS would be an empty string. The parseOTLPEndpoints() function in send_otlp_span.cjs filters out entries with empty .url (. filter(e => e && typeof e.url === "string" && e.url.trim() !== "")). However, GH_AW_OTLP_ENDPOINTS shows a non-empty Grafana URL, so this is less likely.
Tertiary hypothesis (confidence: LOW)
Exception in Grafana export branch swallowed before fetch() call.
A JavaScript error (e.g., header parsing failure) in the Grafana branch could abort the export before TCP connects. The lack of any error log makes this possible but unconfirmed.
Alternative explanations ruled out:
- Network allowlist blocking:
*.grafana.net is in shared/otlp.md network allowlist ✅
if-missing: error blocking: Would fail the job entirely, not silently skip Grafana ✅
- Ingestion delay: Zero TCP connections eliminates this ✅
G. Recommended Fixes (Prioritized)
-
Investigate and fix Grafana fan-out silence (P0 — data loss)
- Check whether
send_otlp_span.cjs is actually iterating over all GH_AW_OTLP_ENDPOINTS entries for the Grafana entry
- Verify the Grafana URL has the correct
/v1/traces suffix in the secret GH_AW_OTEL_GRAFANA_ENDPOINT
- Add explicit per-endpoint error logging to
/tmp/gh-aw/agent/otlp-export-errors.jsonl for all HTTP failures, including 4xx responses
-
Fix JSONL mirror path discrepancy (P1 — observability tooling breakage)
- The spec and workflow prompt reference
/tmp/gh-aw/agent/otel.jsonl, but the actual file is at /tmp/gh-aw/otel.jsonl
- Update
specs/otel-observability-spec.md or the runtime to align the documented and actual path
-
Ensure export errors are always written (P2 — silent failure prevention)
- Zero export errors file despite apparent Grafana data loss indicates error capture is incomplete
send_otlp_span.cjs should write a record to otlp-export-errors.jsonl for every failed endpoint, including pre-HTTP failures (DNS, URL parse errors, empty URL after filtering)
-
Add backend visibility verification to validation workflow (P3 — diagnostic coverage)
- Once Grafana export is restored, add a step that queries the Grafana Tempo backend via API to confirm span ingestion after the run completes
H. Validation Queries and Commands Used
# JSONL mirror span summary
jq -c '.resourceSpans[].scopeSpans[].spans[] | {name, traceId, spanId, parentSpanId, kind}' /tmp/gh-aw/otel.jsonl
# Resource attributes
jq -c '.resourceSpans[].resource.attributes[] | {(.key): .value}' /tmp/gh-aw/otel.jsonl | sort -u
# Setup span required attribute check
python3 -c "
import json
data=json.load(open('/tmp/gh-aw/otel.jsonl'))
attrs={a['key']:list(a['value'].values())[0] for rs in data['resourceSpans'] for ss in rs['scopeSpans'] for s in ss['spans'] for a in s.get('attributes',[])}
required=['gh-aw.job.name','gh-aw.workflow.name','gh-aw.run.id','gh-aw.run.attempt','gh-aw.run.actor','gh-aw.repository','gh-aw.staged']
print('Missing:', [k for k in required if k not in attrs])
"
# Trace ID consistency check
echo "JSONL:"; jq -r '.resourceSpans[].scopeSpans[].spans[].traceId' /tmp/gh-aw/otel.jsonl | sort -u
echo "ENV: $GITHUB_AW_OTEL_TRACE_ID"
# Firewall OTLP endpoint connections
python3 -c "
import json
entries={}
for line in open('/tmp/gh-aw/sandbox/firewall/logs/audit.jsonl'):
d=json.loads(line)
h=d.get('host','').split(':')[0]
entries[h]=entries.get(h,0)+1
for h,c in sorted(entries.items(),key=lambda x:-x[1])[:10]: print(h,c)
"
# Export errors
cat /tmp/gh-aw/agent/otlp-export-errors.jsonl 2>/dev/null || echo "No export errors file"
cat /tmp/gh-aw/agent/otlp-export-errors.count 2>/dev/null || echo "0"
Validated against: specs/otel-observability-spec.md §9.3, §10.1, §11.1, §11.3
Workflow run: 26434253996
Validation time: 2026-05-26T05:35 UTC (run in-flight)
Generated by 🧭 OTLP Data Quality Validator · sonnet46 2.2M · ◷
A. Executive Summary
Overall Status:
WARNThe
gh-aw.agent.setupspan is correctly shaped, fully attributed, and successfully exported to Sentry. However, the Grafana Tempo endpoint — the second configured OTLP fan-out target — received zero connections during this run. This represents a silent fan-out failure: no export errors were written, yet no spans reached Grafana. The run is still in-flight (conclusion and agent spans are pending).Main risks:
Most likely root cause: Missing
/v1/tracespath suffix on the Grafana endpoint URL (https://otlp-gateway-prod-eu-west-2.grafana.net/otlp→ should be.../otlp/v1/traces), causing a silent 404 that is non-retried and not logged to the error file, or the export error is swallowed before reaching/tmp/gh-aw/otlp-export-errors.jsonl.B. Trace Completeness
service.namegh-aw.otlp-data-quality-validatorgithub.run_id26434253996traceIdin JSONL1bc136b146e27e5d4f880a21b39aaca3)traceId+spanId)gh-aw.agent.setup)activation.setup,activation.conclusion,agent.setup, and prior retries*.grafana.netconnections in 257-entry audit log)Note: Conclusion span (
gh-aw.agent.conclusion) and agent span (gh-aw.agent.agent) are not yet emitted — the agent job is still running. These are expected absent at validation time, not missing.JSONL mirror path discrepancy: The workflow prompt and spec reference
/tmp/gh-aw/agent/otel.jsonl, but the actual mirror is at/tmp/gh-aw/otel.jsonl. Theagent/subdirectory is empty. This path mismatch would cause any JSONL-based tooling pointed at the documented path to report false negatives.C. Span Hierarchy Validation
Single-job agent workflow (
activation+agentjobs).8b496f47c4e6810f)gh-aw.agent.setupparents under global root span IDgh-aw.<job>.<op>gh-aw.agent.setupmatchesparentSpanIdof setup span ≠ empty8b496f47c4e6810fpresentGITHUB_AW_OTEL_PARENT_SPAN_ID== setupspanIdc4daf0dc08e1d824(conclusion span will parent correctly)Per spec §9.3, the single observed setup span correctly parents under the global root span ID.
D. Attribute Contract Validation
Setup span (
gh-aw.agent.setup) — spec §10.1All 7 required attributes present:
gh-aw.job.nameagentgh-aw.workflow.nameOTLP Data Quality Validatorgh-aw.run.id26434253996gh-aw.run.attempt1gh-aw.run.actormnkiefergh-aw.repositorygithub/gh-awgh-aw.stagedfalse(boolValue)Additional attributes present:
gh-aw.cli.version,gen_ai.system(github_models),gh-aw.engine.id,gh-aw.event_name,gh-aw.episode.id,gh-aw.episode.kind,gh-aw.hop.id,gh-aw.workflow_call.id.Conclusion span — spec §10.2
⏳ PENDING (agent still running). Cannot validate required attributes:
gh-aw.run.status,gh-aw.error_count,gh-aw.warning_count,gh-aw.action_minutes,gh-aw.output.item_count,gh-aw.otlp.export_errors.Agent span — spec §10.3
⏳ PENDING. Cannot validate GenAI semantic conventions:
gen_ai.system,gen_ai.request.model,gen_ai.operation.name,gen_ai.usage.input_tokens,gen_ai.usage.output_tokens.Resource attributes — spec §11.1
All 6 required resource attributes present:
service.namegh-aw.otlp-data-quality-validator✅service.version1.0.52✅github.repositorygithub/gh-aw✅github.run_id26434253996✅github.run_attempt1✅github.actions.run_urlhttps://github.com/github/gh-aw/actions/runs/26434253996✅Additional resource attributes present:
github.event_name,github.ref,github.ref_name,github.sha,github.job,github.workflow_ref,github.actor_id,runner.*,gh-aw.awf.version,deployment.environment.Instrumentation scope — spec §11.3
✅ PASS —
scope.name = gh-aw,scope.version = 1.0.52(matchesservice.version).E. Export and Fan-out Health
https://o205451.ingest.us.sentry.io/api/4511347087179777/integration/otlphttps://otlp-gateway-prod-eu-west-2.grafana.net/otlpSentry header rewrite: ✅ Applied —
Authorizationheader correctly rewritten tox-sentry-authinGH_AW_OTLP_ENDPOINTS.Export error log: No
/tmp/gh-aw/agent/otlp-export-errors.jsonlor/tmp/gh-aw/agent/otlp-export-errors.countfound. This means either Grafana errors are not being captured, or the export attempt never reaches the HTTP layer.JSONL mirror write: ✅
/tmp/gh-aw/otel.jsonlcontains thegh-aw.agent.setupspan with correct content.Multi-endpoint fan-out independence: Sentry is receiving spans independently of Grafana failure. Fan-out independence is structurally preserved on the Sentry side. Grafana side is silent-failing without affecting Sentry delivery.
F. Root-Cause Hypothesis
Primary hypothesis (confidence: MEDIUM-HIGH)
Grafana endpoint URL missing
/v1/tracessuffix.https://otlp-gateway-prod-eu-west-2.grafana.net/otlphttps://otlp-gateway-prod-eu-west-2.grafana.net/otlp/v1/tracesGrafana Cloud's OTLP gateway requires the full path. Without
/v1/traces, the server returns an HTTP 404. If thesend_otlp_span.cjserror handler swallows non-retryable 4xx errors without writing to the error log, this would explain zero firewall connections — wait, we'd still see the TCP connection attempt. The fact that there are zero TCP connections to*.grafana.netsuggests the export attempt is abandoned before the HTTP call is made.Secondary hypothesis (confidence: MEDIUM)
Grafana
GH_AW_OTEL_GRAFANA_ENDPOINTsecret resolves to an empty string at compile time.If the secret was missing when the lock file was compiled, the URL in
GH_AW_OTLP_ENDPOINTSwould be an empty string. TheparseOTLPEndpoints()function insend_otlp_span.cjsfilters out entries with empty.url(. filter(e => e && typeof e.url === "string" && e.url.trim() !== "")). However,GH_AW_OTLP_ENDPOINTSshows a non-empty Grafana URL, so this is less likely.Tertiary hypothesis (confidence: LOW)
Exception in Grafana export branch swallowed before
fetch()call.A JavaScript error (e.g., header parsing failure) in the Grafana branch could abort the export before TCP connects. The lack of any error log makes this possible but unconfirmed.
Alternative explanations ruled out:
*.grafana.netis inshared/otlp.mdnetwork allowlist ✅if-missing: errorblocking: Would fail the job entirely, not silently skip Grafana ✅G. Recommended Fixes (Prioritized)
Investigate and fix Grafana fan-out silence (P0 — data loss)
send_otlp_span.cjsis actually iterating over allGH_AW_OTLP_ENDPOINTSentries for the Grafana entry/v1/tracessuffix in the secretGH_AW_OTEL_GRAFANA_ENDPOINT/tmp/gh-aw/agent/otlp-export-errors.jsonlfor all HTTP failures, including 4xx responsesFix JSONL mirror path discrepancy (P1 — observability tooling breakage)
/tmp/gh-aw/agent/otel.jsonl, but the actual file is at/tmp/gh-aw/otel.jsonlspecs/otel-observability-spec.mdor the runtime to align the documented and actual pathEnsure export errors are always written (P2 — silent failure prevention)
send_otlp_span.cjsshould write a record tootlp-export-errors.jsonlfor every failed endpoint, including pre-HTTP failures (DNS, URL parse errors, empty URL after filtering)Add backend visibility verification to validation workflow (P3 — diagnostic coverage)
H. Validation Queries and Commands Used
Validated against:
specs/otel-observability-spec.md§9.3, §10.1, §11.1, §11.3Workflow run: 26434253996
Validation time: 2026-05-26T05:35 UTC (run in-flight)