Add sampler API, use in SDK tracer by c24t · Pull Request #225 · open-telemetry/opentelemetry-python

c24t · 2019-10-17T23:10:54Z

This PR adds the sampler API and changes the SDK tracer so that it creates either a Span or DefaultSpan depending on the sampling decision at span creation time.

The sampling spec is a moving target. The recently-merged open-telemetry/opentelemetry-specification#296 moves samplers out of the API, adds a distinction between "sampled" and "recorded" traces, and changes the behavior of the tracer with respect to links and attributes.

This PR doesn't implement the recent spec changes yet. I thought it'd be easier to write and review this in two stages: one to add basic samplers, and another to bring samplers up to speed with the spec.

See also:

This reverts commit e743fdc.

opentelemetry-api/src/opentelemetry/trace/sampling.py

reyang · 2019-10-18T04:37:09Z

opentelemetry-api/src/opentelemetry/trace/sampling.py

+
+    @classmethod
+    def get_bound_for_rate(cls, rate: float) -> int:
+        return round(rate * (cls.CHECK_BYTES + 1))


If rate is less than 0.0000000000000000000271, bound will be rounded to 0 which means it never gets sampled.
Do we consider to round up or this is just too pedantic?

Actually it is 0.0000000000000000000813 since bound is rounded to 1 (and all-zero trace id is illegal according to the TraceContext spec). 😄

I used round intentionally here for what I thought were pedantic reasons: this way each of the 2^64 sampling rate "buckets" are split exactly in half:

r <= 2^-65 : 0 2^-65 < r <= 3 * 2^-65 : 1 3 * 2^-65 < r <= 5 * 2^-65 : 2

and etc.

I think if you pass in a sampling rate <= 2^-65 it's more correct for us to make the actual rate 0 than 2^-64.

This doesn't exactly work because float precision is worse than 2^-64 for values close to 1, but it works near 0 and is at least theoretically correct (or should be, there may still be bugs here).

all-zero trace id is illegal according to the TraceContext spec.

Oh that's interesting, so the trace ID space is [0x1, 0xf...f], not [0x0, 0xf...f]. So would you expect this line to be:

return 1 + round(rate * cls.CHECK_BYTES)

I think that'd give us the same behavior as above, but always sample trace ID 0x0.

Should we have special handling for 0x0 trace IDs?

I'm also happy to leave this as-is and treat 0x0 as a regular ID here, but treat it differently at span creation time. Up to you. :D

I don't have any preference here as I cannot think of a scenario where people would need such low but non-zero sample rate (unless someone is instrumenting SETI@home). 😆

Something I missed above: there are and 2^64 - 1 trace IDs besides 0x0 that end in 0x0000000000000000. If we want to always/never sample the all-zero trace ID we have to make that decision before masking.

opentelemetry-api/tests/trace/test_sampling.py

reyang · 2019-10-18T04:53:52Z

opentelemetry-api/src/opentelemetry/trace/sampling.py

+        if parent_context is not None:
+            return Decision(parent_context.trace_options.recorded, {})
+
+        return Decision(trace_id & self.CHECK_BYTES < self.bound, {})


This could be a problem for systems that generate shorter trace_id and use zero-padding / const part, especially under low sample rate case.

https://www.w3.org/TR/trace-context/#trace-id

Do you think we should check fewer bytes? Or make it configurable?

This follows the OC behavior, I actually don't know why we only check the lower 8.

I haven't dug into this.

If we're at the same bar of OpenCensus, we probably should be okay.

If everyone follows the W3C spec here, we should be covered:

When a system operates with a trace-id that is shorter than 16 bytes, it SHOULD fill-in the extra bytes with random values rather than zeroes.

https://www.w3.org/TR/trace-context/#trace-id

As long as they follow the SHOULDs, which they MAY not. :P

reyang

Overall structure looks great!

I've left some math related comments.

Co-Authored-By: Reiley Yang <reyang@microsoft.com>

opentelemetry-api/src/opentelemetry/trace/__init__.py

opentelemetry-api/src/opentelemetry/trace/sampling.py

Oberon00 · 2019-10-21T10:37:34Z

opentelemetry-api/src/opentelemetry/trace/sampling.py

+
+    # The sampler checks the last 8 bytes of the trace ID to decide whether to
+    # sample a given trace.
+    CHECK_BYTES = 0xFFFFFFFFFFFFFFFF


According to the W3 spec, you should use the high ("left") 8 bytes instead:

Many unique identification generation algorithms create IDs where one part of the value is constant (often time- or host-based), and the other part is a randomly generated value. Because tracing systems may make sampling decisions based on the value of trace-id, for increased interoperability vendors MUST keep the random part of trace-id ID on the left side.

But I wonder if we can find a more robust way to maybe randomly mix some bits here and there together.

I considered this but decided to stick to the OC behavior (or at least the intended behavior: this also fixes a rounding/OBO bug). FWIW checking the high bytes also seems more correct to me, but in practice -- if people are using short trace IDs in the wild -- sampling every request seems worse than sampling based on the non-random part.

I think you're right about this. I suggested it as a change to the spec in open-telemetry/opentelemetry-specification#331.

Another fun benefit of checking bytes high-to-low is that the sampling decision should be mostly consistent between samplers that check different numbers of bytes. Unlike checking low-to-high where every incremental bit is effectively another coin toss. Ultimately we'll probably just put this number in the spec, but it's a neat property in any case. :D

FYI - keep an eye on https://github.com/w3c/trace-context/pull/344/files.

Let's revisit this once open-telemetry/opentelemetry-specification#331 and w3c/trace-context#344 are resolved.

In particular, the following errors are fixed in this commit: * Don't return False in __exit__ Returning a literal causes a mypy error when combined with the `typing.Optional[bool]` type hint. Furthermore, exception handling is the same when returning `False` and when returning `None` (the exception is re-raised). Therefore, it's simpler to remove the return statement and change the type hint to `None`. * Correctly initialize nested tuple Tuples of length 1 should be initialized with a trailing comma to be properly interpreted. * Pass correct type to use_context() in test * Add type annotations for test helper functions Since we have `disallow_untyped_calls = True` in our mypy config for tests, we must add type annotations to any function that is called from a test. Addditionally, bump minimal mypy version to 0.740 to consistently reproduce these errors.

Oberon00 · 2019-10-24T08:36:00Z

opentelemetry-api/src/opentelemetry/trace/sampling.py

+
+    # The sampler checks the last 8 bytes of the trace ID to decide whether to
+    # sample a given trace.
+    CHECK_BYTES = 0xFFFFFFFFFFFFFFFF


Let's revisit this once open-telemetry/opentelemetry-specification#331 and w3c/trace-context#344 are resolved.

c24t added 15 commits October 15, 2019 17:59

Add sampling API

496320d

Kowtow to mypy

21dd081

Use sampler in span creation

9ac4516

Docstring updates

5e0370a

DefaultSpan doesn't record events

bc8782d

Splat typing imports

9a8f3c1

Add ProbabilitySampler

4264f16

Add ProbabilitySampler tests

567a7da

s/ALWAYS_ON/DEFAULT_ON/

e743fdc

Catch a sneaky bug, add ProbabilitySampler tests

d90a8bc

Uglify via black

7e0330b

Revert "s/ALWAYS_ON/DEFAULT_ON/"

a0978b2

This reverts commit e743fdc.

Add trace sampler tests

7b1fccb

Less flippant comment

9b935ad

Shuffle imports

8de632d

c24t added api Affects the API package. sdk Affects the SDK package. tracing labels Oct 17, 2019

c24t requested review from Oberon00, a-feld, carlosalberto, lzchen, reyang and toumorokoshi as code owners October 17, 2019 23:10

reyang reviewed Oct 18, 2019

View reviewed changes

opentelemetry-api/src/opentelemetry/trace/sampling.py Outdated Show resolved Hide resolved

reyang reviewed Oct 18, 2019

View reviewed changes

opentelemetry-api/src/opentelemetry/trace/sampling.py Outdated Show resolved Hide resolved

reyang reviewed Oct 18, 2019

View reviewed changes

opentelemetry-api/tests/trace/test_sampling.py Outdated Show resolved Hide resolved

reyang reviewed Oct 18, 2019

View reviewed changes

reyang approved these changes Oct 18, 2019

View reviewed changes

c24t and others added 2 commits October 18, 2019 12:24

Apply suggestions from code review

644cf9d

Co-Authored-By: Reiley Yang <reyang@microsoft.com>

Follow up on @reyang's comments

6f0a33d

Oberon00 reviewed Oct 21, 2019

View reviewed changes

c24t and others added 5 commits October 21, 2019 13:45

s/RECORDED/SAMPLED/

0de147e

Address @Oberon00's comments

c3efeb0

Add set_status to Span (open-telemetry#213)

b24a465

Add contextmanager for Context (open-telemetry#215)

235d74f

c24t force-pushed the sampler-api branch from 51dbbed to 3594e32 Compare October 21, 2019 20:45

c24t added 3 commits October 21, 2019 14:23

Lint config fixes

1a4f96b

Use pre-3.6 inline type annotations

d6127b0

Merge branch 'master' into sampler-api

0e370ad

c24t mentioned this pull request Oct 22, 2019

Add probability sampler details open-telemetry/opentelemetry-specification#331

Closed

Oberon00 approved these changes Oct 24, 2019

View reviewed changes

c24t merged commit 5c89850 into open-telemetry:master Oct 24, 2019

c24t deleted the sampler-api branch October 24, 2019 14:57

c24t mentioned this pull request Oct 24, 2019

Clean up ProbabilitySampler for 64 bit trace IDs #238

Merged

reyang mentioned this pull request Oct 30, 2019

Add sampling API #53

Closed

Conversation

c24t commented Oct 17, 2019

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

reyang Oct 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reyang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

reyang Oct 18, 2019 •

edited

Loading