update(link): Relax link schemas to support domain-level identifiers#292
update(link): Relax link schemas to support domain-level identifiers#292xibz wants to merge 1 commit intocdevents:mainfrom
Conversation
This change updates all link schemas (START, END, RELATION, and embedded variants) to allow references to either a CDEvent contextId, a domainId, or both. Previously, links could only reference event context IDs. This limited cross-system connectivity and encouraged embedding execution identifiers in customData purely for graph reconstruction. By allowing domainId alongside contextId: - Links can represent relationships between domain executions (e.g., pipelinerun) as well as individual events. - Connectivity metadata no longer needs to be embedded in event payloads. - Chain-first modeling constraints are relaxed, enabling relation-first graph modeling. - The change remains backward compatible. At least one of contextId or domainId is now required for link endpoints. AdditionalProperties are restricted to prevent schema drift. This preserves existing semantics while improving flexibility and reducing customData pollution. Signed-off-by: xibz <bjp@apple.com>
|
Thanks @xibz - could you clarify the definition of |
The Core ProblemcontextId requires the publisher to know the parent event's context ID. But if the parent isn't a CDEvent, there is no context ID to know. Today: GitHub doesn't emit CDEvents. So we use domainId to link to GitHub PRs explicitly. SolutionIntroduce a domain specific identifier which can be used to relate information URNs will be used for domain IDs, where it follows the format of Examples:
Example 1 (GH to CI)Build event wants to link to GitHub PR Publisher asks: "What is the GitHub PR's contextId?" How domainId solves this Example 2 (Jira to CI)Imagine CircleCI wants to relate a Jira ticket Result: CircleCI task can't link to Jira. Forced into customData. Example 3: Datadog Alert Triggers Rollback PipelineImagine you have a Datadog alert that monitors system health during releases. The rollback pipeline needs to link back to the alert that triggered it: Example 4 Linking to Events Without Knowing Their Context IDImagine a consumer (like a dashboard or audit system) receives an event and wants to query for all related events, but doesn't know their context IDs upfront. A deployment fails. You want to find:
Without Problem: You have to parse customData and hope the IDs are there. No standardized way to query back. With domainId, you can link forward AND backward: with this:
This shows that domainId isn't just for "non-CDEvent systems", but it's also useful for querying across systems when you don't have context IDs. Why it worksEach system uses what it knows. Systems knows its own context IDs (contextId). Systems also knows how to identify triggering systems (domainId URN). No system needs to know another system's internal IDs or context IDs. FAQS
Because causality exists outside CDEvents.
If you don't link them, you lose that causality. If you can't link them with contextId (because they're not CDEvents), you're forced to hide it in customData. domainId lets you link anything, anywhere. That's why it matters.
Your engineers will. They'll put it in customData. Because causality is real whether CDEvents acknowledges it or not.
No. domainId is a stopgap until systems emit CDEvents natively. |
|
Thanks, I now understand your proposal better, I guess.
Side questions: Is "links" just for "tigger", "causeby", or can it be used to define other types of relation? (eg for a test to define what the system under test is (a source, a change, an artifact), in which context (ci, environment), triggered by what (a scheduler, a change, a deployment, ...) |
The proposal does not require links to always be present. It simply provides a standardized way to express causality when it is known. If a producer does not know the trigger, it emits no link. The key difference is: Today: With domainId: This proposal does not require perfect causality capture. It enables correct modeling when information exists.
Correlation is exactly one of the motivations. The intention is that domainId represents the canonical identity of an entity within its domain. subject.id is too flexible, hence the strict URN format. If a system later emits a native CDEvent for that entity, the subject.id of that event should correspond to the same logical identifier represented in the domainId. This allows dashboards, SIEM systems, and audit systems to correlate across both: domainId is not meant to replace subject.id, but to provide a stable cross-domain reference when contextId is unavailable.
Links are not limited to trigger/cause relationships. They are intended to model typed relationships between entities. Examples include: The goal is not only causality modeling, but explicit relationship modeling. This allows us to describe: |
This change updates all link schemas (START, END, RELATION, and embedded variants) to allow references to either a CDEvent contextId, a domainId, or both.
Previously, links could only reference event context IDs. This limited cross-system connectivity and encouraged embedding execution identifiers in customData purely for graph reconstruction.
By allowing domainId alongside contextId:
At least one of contextId or domainId is now required for link endpoints. AdditionalProperties are restricted to prevent schema drift.
This preserves existing semantics while improving flexibility and reducing customData pollution.