Fix CounterGroup timer to use Stopwatch instead of DateTime.UtcNow#127303
Conversation
noahfalk
left a comment
There was a problem hiding this comment.
This looks reasonable to me. Any concerns @tarekgh?
The previous truncation behavior for the initial nextPollingTimestamp does look like a bug and I'm fine to see it fixed. Sampling with fractional second periods is possible, but I expect it was rarely done in practice so its likely the fix will go unobserved.
🤖 Copilot Code Review — PR #127303Note This review was AI/Copilot-generated. Holistic AssessmentMotivation: Replacing Approach: The core approach is sound. However, the manual conversion between TimeSpan ticks and Stopwatch ticks introduces a formula that is repeated 3 times and is vulnerable to integer overflow on Linux, as detailed below. Summary: Detailed Findings❌ Integer Overflow in Stopwatch-Tick Conversion (flagged by 2/2 models)The expression Concrete numbers (Linux,
With a 60-second polling interval, a laptop sleep, VM suspension, or long GC pause causing just 15 missed cycles (~15 minutes) silently overflows On Windows ( Suggested fix: Cache the per-interval Stopwatch-tick value once in private long _pollingIntervalSwTicks;
// In EnableTimer:
_pollingIntervalSwTicks = _pollingInterval.Ticks * Stopwatch.Frequency / TimeSpan.TicksPerSecond;
_nextPollingTimeStamp = now + _pollingIntervalSwTicks;
// In OnTimer:
_nextPollingTimeStamp += (long)Math.Ceiling(delta / _pollingInterval) * _pollingIntervalSwTicks;This keeps the multiplicand bounded and eliminates 2 of the 3 copies of the fragile formula.
|
| Finding | Opus 4.7 | Sonnet 4.5 | Consensus |
|---|---|---|---|
| Overflow in OnTimer chain | ❌ | ❌ | Strong |
| Sub-second behavioral change | Strong | ||
| No tests | Strong | ||
| Cache SW ticks field | 💡 | 💡 | Strong |
| Thread safety OK | ✅ | ✅ | Strong |
| PollForValues correct | ✅ | ✅ | Strong |
| FP precision (pre-existing) | noted | Moderate |
tarekgh
left a comment
There was a problem hiding this comment.
modulo the copilot comments, LGTM!
|
The expression should be cleaned up here, but the math given by @copilot is all wrong. At a frequency of In fact, you'd need a frequency of nearly The key really is in that we're only scaling the |
Agree with your assessment. But does it hurt to check the overflow anyway (in correct way)? or you think this will add complication to the code? |
|
@tarekgh , @tannergooding do you mean keeping the scheduling math entirely in stopwatch ticks (drop Stopwatch.GetElapsedTime + TimeSpan division for delta), and only converting to TimeSpan where it actually leaves the timer (i.e. for WritePayload)? Want to confirm before doing any changes. Also, do you want me to add a defensive |
|
I was thinking the inverse rather, do it entirely in TimeSpan units instead, as that's the "canonical" form for .NET The general point being that mixing two formats can be error prone, so keeping it as one format (ideally the canonical one) helps limit bugs/risk. |
tannergooding
left a comment
There was a problem hiding this comment.
LGTM. CC. @tarekgh for the changes
tarekgh
left a comment
There was a problem hiding this comment.
Added a minor comment. Otherwise, the latest changes LGTM.
Background
DateTime.UtcNowcan jump due to NTP sync, causing elapsed time reported to EventCounter subscribers to be incorrect for that interval — affecting rate calculations like requests/sec in monitoring dashboards. Stopwatch is monotonic and not subject to clock adjustments.CounterGroupis only directly referenced fromDiagnosticCounter.DiagnosticCounteris the base class ofEventCounter,PollingCounter,IncrementingPollingCounter,IncrementingEventCounter, so they all are affected.