Skip to content

fix(sdk): regenerate service.instance.id post-fork in MeterProvider and TracerProvider#5000

Open
sterchelen wants to merge 2 commits intoopen-telemetry:mainfrom
sterchelen:fix/post-fork-service-instance-id
Open

fix(sdk): regenerate service.instance.id post-fork in MeterProvider and TracerProvider#5000
sterchelen wants to merge 2 commits intoopen-telemetry:mainfrom
sterchelen:fix/post-fork-service-instance-id

Conversation

@sterchelen
Copy link

Description

When a prefork server (e.g. gunicorn) forks workers, all workers inherit the same Resource from the master process — including the same service.instance.id. The SDK already restarts background threads post-fork (PeriodicExportingMetricReader, BatchSpanProcessor) but never updates the resource identity. This causes metric and trace collisions in OTLP backends where multiple workers exporting with the same resource identity result in incorrect aggregation (last-write-wins) instead of correct summation.

This PR registers an os.register_at_fork(after_in_child=...) hook on both MeterProvider and TracerProvider that replaces service.instance.id with a fresh UUID in each forked worker. All other resource attributes are preserved via Resource.merge(). WeakMethod is used for the hook reference, consistent with the existing pattern in PeriodicExportingMetricReader and BatchSpanProcessor.

Fixes #4390
Related: #3885

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Two new test files covering both providers:

  • tests/metrics/test_meter_provider_fork.py
  • tests/trace/test_tracer_provider_fork.py

Each covers:
_at_fork_reinit assigns a new service.instance.id
Other resource attributes are preserved

A real fork() produces a distinct ID in the child vs the parent
4 concurrent forks each produce a unique ID

Run with:
pytest opentelemetry-sdk/tests/metrics/test_meter_provider_fork.py
pytest opentelemetry-sdk/tests/trace/test_tracer_provider_fork.py

  • Unit tests (fork-based, *nix only)

Does This PR Require a Contrib Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

This hook runs post-fork in each worker and replaces service.instance.id
with a fresh UUID, ensuring each worker is a distinct instance.
"""
self._sdk_config.resource = self._sdk_config.resource.merge(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not setting this in the resource attributes, also not sure the issue is metrics specific.

Copy link
Author

@sterchelen sterchelen Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not setting this in the resource attributes

Not sure to get your point but if the user didn't set it, workers would be completely indistinguishable in the backend after fork, which is worse than having no ID at all. By always generating one post-fork we ensure every worker has a distinct identity regardless of whether the user configured it upfront.

On the second point: agreed, the problem affects both metrics and traces 👍🏼

Copy link
Contributor

@xrmx xrmx Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that if we have a solution that work fine after forks() since the semantic conventions is now stable we can add this to the default resource detector. This way we have the very same attributes before and after fork.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xrmx just pushed changes with two distinct commits,

  1. 4b7334a to auto-generate service.instance.id
  2. 3256bf0 to fix post-fork issue

changelog has been changed as well 👍🏼

@sterchelen sterchelen force-pushed the fix/post-fork-service-instance-id branch 2 times, most recently from 63676db to 70b6fff Compare March 23, 2026 08:02
@xrmx xrmx moved this to Reviewed PRs that need fixes in Python PR digest Mar 23, 2026
The Python SDK did not auto-generate service.instance.id, unlike the
Java SDK and the stable semantic convention recommendation. Add it to
_DEFAULT_RESOURCE so every process gets a unique instance identity at
startup without any user configuration.
…nd TracerProvider

When a prefork server (e.g. gunicorn) forks workers, all workers inherit
the same Resource from the master process, including the same
service.instance.id. Register an os.register_at_fork(after_in_child=...)
hook on both MeterProvider and TracerProvider that replaces
service.instance.id with a fresh UUID in each forked worker, ensuring
distinct resource identities without any user configuration.

Resource.merge() preserves all other resource attributes. WeakMethod is
used for the hook reference, consistent with the existing pattern in
PeriodicExportingMetricReader and BatchSpanProcessor.

Fixes: open-telemetry#4390
Related: open-telemetry#3885
@sterchelen sterchelen force-pushed the fix/post-fork-service-instance-id branch from 70b6fff to 3256bf0 Compare March 24, 2026 12:30
@sterchelen sterchelen requested a review from xrmx March 25, 2026 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Reviewed PRs that need fixes

Development

Successfully merging this pull request may close these issues.

[Auto-instrumentation] Add unique instance.id to all post-fork processes

2 participants