Fix MAG L1C gap-fill bugs for validation tests T015 and T016 by sapols · Pull Request #2899 · IMAP-Science-Operations-Center/imap_processing

sapols · 2026-04-01T07:18:36Z

Summary

Closes issue #1993

Six localized bug fixes to the existing MAG L1C gap-filling code. These fixes un-skip and pass validation tests T015 and T016 (both mago and magi) while preserving the existing passes on T013, T014, and T024. All 10 L1C validation cases now pass with exact row-count matches against the checked-in expected outputs.

This PR intentionally keeps the current production algorithm shape. It does not rewrite the gap-fill pipeline, add new dataclasses, or replace the simplified L1C flow that was developed collaboratively between SDC and the MAG team. It only patches the specific correctness bugs that were causing T015 and T016 to fail.

Background

What was failing and why

Current dev skips T015 and T016 because they produce wrong output. Running them un-skipped shows three concrete failures:

Test	Sensor	Failure	Root cause fixes
T015	magi	+5 extra rows	Fix 5 alone
T016	mago	-663 missing rows	Fix 1 + Fix 3 + Fix 4 jointly
T016	magi	+330 extra rows	Fix 1 + Fix 3 + Fix 4 jointly

T015-mago and T024 (both sensors) already passed on dev.

The validation tests exercise increasingly complex mode transitions:

T013/T014: Simple NM-to-BM-to-NM transitions, no Config mode. Already passing because the NM rate is always 2 Hz (matching the old hardcoded cadence) and there are no Config-mode boundary artifacts.
T015: NM-to-BM-to-NM with a Config mode transition at the start of burst. NM rate changes from 2 Hz to 4 Hz (mago) or 2 Hz to 1 Hz (magi).
T016: NM-to-BM-to-NM with a Config mode transition at the end of burst. NM rate changes from 4 Hz to 2 Hz (mago) or 1 Hz to 2 Hz (magi).
T024: Burst-only, no normal data at all.

Why T015 magi failed (+5 rows): When MAG transitions through Config mode, the vectors_per_second metadata timestamp doesn't perfectly align with where the cadence actually changes in the data. The old find_all_gaps() split the timeline at the declared metadata timestamp, and find_gaps() misidentified the boundary misalignment as 1-sample phantom gaps, generating 5 spurious gap-fill rows. T015 mago was unaffected because its Config transition boundary happened to align cleanly. Fix 5 (_find_rate_segments walkback) eliminates these phantom gaps at the root.

Why T016 failed (-663 mago / +330 magi rows): Three bugs interacted to produce the T016 failures. No single fix alone was sufficient:

Fix 1 (rate-aware cadence): Gap-fill timestamps were generated at a hardcoded 0.5s (2 Hz) regardless of the actual NM rate. T016 mago has a 4 Hz pre-gap rate (needs 0.25s cadence) and T016 magi has a 1 Hz pre-gap rate (needs 1.0s cadence). The wrong cadence produced the wrong number of gap-fill timestamps. Reverting Fix 1 alone: T016 mago -662, T016 magi +332.
Fix 3 (off-by-one burst endpoint): The interpolation window silently dropped the last burst sample, producing wrong interpolated values near the burst-resume boundary. Reverting Fix 3 alone: row counts matched, but first value mismatch at index 2475 (mago) and 618 (magi).
Fix 4 (CIC filter clipping): The CIC delay compensation clipped the final interpolated timestamp at the tail of the burst region. Reverting Fix 4 alone: T016 mago -1, T016 magi -1.

Reverting all three together reproduces the exact dev failures (-663 mago, +330 magi), confirming they interact non-linearly.

Why T024 was already passing but needed Fix 6: T024 passed on dev because it's burst-only and doesn't exercise the broken transition paths. However, Fix 4 (extrapolate=True) causes T024 to produce one extra trailing sample. Fix 6 (burst-only trailing-edge trim) compensates for this, preventing a regression.

How this was investigated

This issue was investigated using two AI coding agents (Claude Code and Codex) in parallel, guided by manual review. The process was:

Initial exploration: Both agents independently identified the same set of bugs from the failing tests. Their proposed fixes overlapped significantly but differed in approach.
Spec review: The MAG Science Algorithm Document (IMAP-MAG-SW-009-01B, i5r1) and the SDC Data Validation Document (IMAP-OPS-TP-ICL-001) were consulted. Both agents initially proposed full spec-aligned rewrites of section 7.3.4 step 3, but this was scaled back after MAG team feedback (see below).
MAG team feedback: @alastairtree and @maxinelasp clarified that the current code was developed in collaboration with the MAG team and that deviations from the algorithm document may be intentional simplifications. The team requested a conservative fix rather than a full rewrite.
Minimal patches: Both agents produced minimal draft PRs that preserved the existing algorithm shape. These were iteratively compared and refined until both converged on identical code.
CDF validation on flight data: The code was run on real commissioning data (2025-09-27, 2025-09-28, 2025-10-03, 2025-12-05, 2025-12-06) to compare outputs between dev and the patched code. An exhaustive search of the MAG L1B archive through 2026-03-31 found only 8 same-day norm+burst sensor-day cases. Unfortunately, none of these real-flight cases separated current dev from the patched code (they both produced identical CDFs; the available real data does not exercise the exact broken transition patterns).
Semi-synthetic validation: Because real flight data was insufficient to demonstrate the fix, semi-synthetic input data was derived from real 2025-09-27 MAG transitions to create cases that specifically exercise the T015/T016 failure modes. These cases demonstrated clear semantic differences between dev and the patched code, with dev producing incorrect row counts and the patched code matching expectations.

Relationship to the algorithm document

The MAG Science Algorithm Document (section 7.3.4) describes a six-step L1C process. The current code implements a simplified version of these steps that was developed collaboratively between SDC and the MAG team. This PR preserves that simplified implementation. Specifically:

Step 3 (Generate a new NM time sequence) is the area with the largest gap between the document and the code. The document describes a burst-driven approach: find the BM anchor time tC closest to the gap start tA, decimate BM timestamps, shift by (tA - tC), and trim. The current code instead generates gap timestamps arithmetically from the gap boundaries at the appropriate NM cadence, then interpolates burst data onto those timestamps.

Both approaches produce a regular NM time sequence that bridges the gap. The current approach is simpler and was explicitly approved by the MAG team. The key fix in this PR is that the cadence used for gap-fill timestamps now respects the actual NM rate declared in the gap metadata, rather than always using 0.5s (2 Hz). This aligns with the document's intent that "the rate depends on the rate of the NM data before the gap."

Step 5 (Identify remaining gaps with >1.1s threshold) is not changed by this PR. The current implementation uses a per-rate tolerance check rather than a fixed 1.1s threshold, which is arguably more correct for non-2 Hz rates.

The six bugs and their fixes

Fix 1: Rate-aware gap cadence (`generate_missing_timestamps`)

Bug: Gap-fill timestamps were always generated at 0.5s intervals (2 Hz) regardless of the actual normal mode rate. For non-2 Hz gaps, this was wrong by construction.

Fix: Read the rate from gap[2] (the third element of each gap tuple) and compute the cadence as int(1e9 / rate). Fall back to 0.5s if no rate is provided. Use integer arithmetic throughout to avoid float precision loss at TTJ2000-scale nanosecond timestamps.

Which tests this fixed: T016 (jointly with Fixes 3 and 4). Reverting this fix alone produces T016 mago -662 rows and T016 magi +332 rows -- the dominant contributor to the T016 failure. Also covered by the new test_generate_missing_timestamps_uses_gap_rate() and the updated rate-aware case in test_generate_timeline().

Spec alignment: This directly implements the document's instruction that the interpolated rate should depend on "the rate of the NM data before the gap."

Fix 2: 2D array indexing in `interpolate_gaps`

Bug: filled_norm_timeline is an (n, 8) array, but the gap-timeline extraction compared the entire 2D array against scalar timestamps without specifying a column. The boolean mask broadcasted across all 8 columns instead of just the epoch column (index 0).

Fix: Extract norm_epochs = filled_norm_timeline[:, 0] once at the top of the function and filter against it explicitly.

Which tests this fixed: No isolated validation delta reproduced when this line was reverted in the current patch set. This is still a real correctness fix: it makes the gap-timeline mask operate on the epoch column by construction instead of relying on non-epoch columns being numerically unrelated to TTJ2000 timestamps.

Fix 3: Off-by-one burst slice endpoint

Bug: burst_end was computed as an inclusive index (via min(len-1, ...)), but then used in a Python slice burst_epochs[burst_start:burst_end] which is end-exclusive, silently dropping the last intended burst sample from the interpolation window.

Fix: Make burst_end exclusive from the start: burst_end = min(len(burst_epochs), burst_gap_end + burst_buffer + 1).

Which tests this fixed: T016. In isolated replay, reverting this fix preserved row counts but changed the first mismatched interpolated vector at the burst-resume boundary for both mago and magi.

Fix 4: CIC filter edge clipping (`linear_filtered`)

Bug: After the CIC filter compensates for group delay, the filtered timestamp range is shorter than the input by delay samples at the tail. linear() then calls remove_invalid_output_timestamps(), which clips output timestamps to this shortened filtered range, dropping valid gap timestamps near the burst boundary.

Fix: Pass extrapolate=True to linear() inside linear_filtered(). This matches the behavior of the spec's reference CIC implementation, which uses InterpolatedUnivariateSpline (extrapolates by default).

Which tests this fixed: T016. In isolated replay, reverting this fix alone causes both T016-mago and T016-magi to lose the final interpolated row.

Interaction with T024: Adding extrapolate=True alone would cause a +1 row regression on T024 (burst-only). Fix 6 below compensates for this.

Fix 5: Spurious micro-gaps at Config mode transitions (`_find_rate_segments`)

Bug: When MAG transitions between modes via Config mode, the vectors_per_second metadata records a transition timestamp that does not perfectly align with where the cadence actually changes in the data. The existing find_all_gaps() split the timeline at the declared metadata timestamps, and find_gaps() misidentified the boundary misalignment as 1-sample phantom gaps. These caused extra rows in the T015 output.

Fix: New _find_rate_segments() function that walks each configured transition backward while the observed cadence already matches the new rate. This aligns rate segment boundaries to the actual cadence in the data, preventing spurious micro-gaps from being generated in the first place. Also extracts the existing 0.075 tolerance magic number into a shared GAP_TOLERANCE constant used by find_gaps().

Which tests this fixed: T015-magi (had 5 extra rows due to phantom gaps at the Config boundary). Reverting the walkback logic alone reintroduces that exact +5 row failure.

Why not a post-hoc filter: An earlier approach filtered out gaps shorter than 2 cadence periods after detection. This is simpler (around 5 lines vs around 35 lines) but cannot distinguish Config-transition artifacts from genuine single-sample data loss. The root-cause fix via _find_rate_segments() preserves detection of legitimate short gaps while eliminating only the artifacts.

Fix 6: Burst-only trailing-edge trim for T024

Bug: Fix 4 (extrapolate=True) allows interpolation slightly past the CIC-filtered range, which is correct for T016 but causes T024 (burst-only, no normal data) to produce one extra trailing sample beyond the expected output.

Fix: Detect the burst-only case by checking whether any samples in the filled timeline have ModeFlags.NORM status (has_norm_context). When there is no normal-mode context, shorten the burst coverage upper bound by one output cadence to compensate for the CIC filter delay. This removes the extra trailing sample without affecting the normal+burst path.

Which tests this fixed: T024-mago and T024-magi (regressed to 687 rows with Fix 4 alone, expected 686). Reverting this trim alone reproduces that exact +1 row failure.

Additional cleanup: Dead `np.delete` removal

The existing code called np.delete(filled_norm_timeline, timeline_index) without assigning the result. Since np.delete() returns a new array, this was a no-op. Removed.

Defensive additions

Guard clause in find_gaps(): Returns empty when timeline_data has fewer than 2 elements, preventing crashes on single-timestamp rate segments.
Integer dtype consistency: Gap arrays use dtype=np.int64 throughout to prevent silent float precision loss on nanosecond timestamps.
generate_timeline() bug fix: Update last_index when generated timestamps are empty, preventing duplicate epoch data copies with adjacent gaps.

Test changes

Un-skip T015 and T016: Removed the pytest.skip() guard in test_mag_l1c_validation.
Exact row-count assertion: Added an assert checking len(expected_output.index) == len(l1c["epoch"].data) before the per-row value checks. This catches off-by-one regressions immediately rather than silently passing when the output has extra rows.
Updated test_process_mag_l1c expectations: The existing unit test now expects rate-aware cadence (17 timestamps instead of 15 in the 4 Hz gap region, with updated flag indices).
New: test_find_all_gaps_uses_observed_transition_boundary(): Regression test for the _find_rate_segments() micro-gap fix.
New: test_generate_missing_timestamps_uses_gap_rate(): Regression test for rate-aware cadence, including the 2 Hz fallback.
New: Rate-aware gap fill case in test_generate_timeline(): Verifies 4 Hz gap timestamps are spaced at 0.25s.
New: Adjacent-boundary case in test_generate_timeline(): Verifies no duplicate timestamps when adjacent gaps share a real epoch boundary, with a mixed-rate transition.

Files changed

File	Lines	What
`imap_processing/mag/l1c/mag_l1c.py`	`+131/-32`	Fixes 1-3, 5-6, guard clauses, cleanup, and helper docstrings required by `numpydoc`
`imap_processing/mag/l1c/interpolation_methods.py`	`+1/-1`	Fix 4 (`extrapolate=True`)
`imap_processing/tests/mag/test_mag_l1c.py`	`+73/-6`	Updated expectations + 4 new regression tests
`imap_processing/tests/mag/test_mag_validation.py`	`+1/-4`	Remove skip guard + add exact row-count assertion
`poetry.lock`	`+10/-0`	Required `poetry-lock` hook refresh

What this PR does NOT change

No rewrite of the gap-fill pipeline architecture
No GapFillPlan dataclass or BM-anchor/shift/trim step-3 implementation
No day-boundary / inherited-timeline work (T017-T019 scope)
No changes to public API or CLI interfaces
No timestamp precision fixes outside the gap-fill path

Validation results

Rerun locally on mag-l1c-fix:

imap_processing/tests/mag/test_mag_l1c.py: 18 passed in 0.84s
compare_mag_l1c_validation.py on T013/T014/T015/T016/T024 (mago,magi): all 10 cases PASS
pre-commit run --all-files: PASS

Exact row-count validation (previously skipped T015/T016 now pass):

Test	Sensor	Expected	Got	Status
T013	mago	1856	1856	MATCH
T013	magi	1856	1856	MATCH
T014	mago	1792	1792	MATCH
T014	magi	1792	1792	MATCH
T015	mago	2498	2498	MATCH
T015	magi	1538	1538	MATCH
T016	mago	3052	3052	MATCH
T016	magi	1195	1195	MATCH
T024	mago	686	686	MATCH
T024	magi	686	686	MATCH

Six localized bug fixes to the existing MAG L1C gap-filling algorithm, plus targeted regression tests. Keeps the current production algorithm shape — no architectural rewrite. Fixes: - Rate-aware gap cadence in generate_missing_timestamps() - 2D array indexing in interpolate_gaps() (column 0 explicitly) - Off-by-one burst slice endpoint - CIC filter edge clipping via extrapolate=True in linear_filtered() - Spurious micro-gaps at Config transitions via _find_rate_segments() - Burst-only trailing-edge trim for T024 exact row count - Remove dead np.delete() no-op Refs IMAP-Science-Operations-Center#1993 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Codex <noreply@openai.com>

sapols · 2026-04-01T09:26:56Z

One scope clarification: My read is that this PR is enough to close the issue, but we haven't completed every historical MAG L1C TODO that came up in the discussion.

Relative to @maxinelasp’s earlier checklist:

done: one-input-file handling
done/effectively already present: CIC delay compensation and 2x buffer sizing
partial: beginning/end-of-day handling (synthetic day bounds are handled, but previous/next-day files are still not loaded)
not done here: using FILLVAL instead of zero placeholders

Relative to @alastairtree’s broader “L1C should” list:

this PR improves the validated transition/gap-fill behavior substantially
but it does not implement true previous/next-day context stitching
it does not add broader proof for out-of-order data handling
it does not claim comprehensive MAG OFF / no-science coverage

sapols · 2026-04-01T13:13:00Z

Another clarification: we're probably sweeping a lot under the rug by not counting day-boundary / inherited-timeline work (T017-T019) as in-scope for this ticket. E.g. Alastair's comment about gaps in the Dec 5-6 data appears to be directly related to L1C not treating leading/trailing burst-only intervals on mixed norm+burst days as fillable gaps. In those windows the parent burst L1B data does exist, but the current mixed-input L1C path collapses the output window to the existing normal-mode coverage instead of extending it across those burst-only spans.

maxinelasp · 2026-04-01T14:40:09Z

@sapols Agreed, would you be able to open a follow-up ticket for the day boundary coverage if one does not exist?

maxinelasp · 2026-04-01T15:20:42Z

Initial thoughts from reading through the PR description only:

We should get a thumbs up from the MAG team around using extrapolation in the CIC filter only. Am I correct in understanding that these extrapolated data points are later clipped?
for _find_rate_segements - sounds like a nice change. Does this still use the vecsec transitions attribute in any way?
I'm a little confused by step 6 - what is the root of this extra timestamp getting produced? I worry this solution is a little too closely aligned with the validation test, will there ever be more than one extra timestamps?

Nice work! I'll review the code more thoroughly in a bit.

sapols · 2026-04-01T20:23:24Z

@maxinelasp

Nah, they're not later clipped, and that's intentional. linear_filtered() passes extrapolate=True to linear(), which skips the remove_invalid_output_timestamps() clipping; that's the whole point of the fix. interpolate_gaps() bounds the candidate gap timestamps to the burst window via the short mask (lines 506/514) before interpolation, and within that window the boundary samples get extrapolated, filled, marked burst, and kept. So the T016 boundary sample is intentionally preserved.
The MAG team should know that in this small window (less than one output cadence), the linear spline is extrapolating beyond the CIC-filtered input rather than interpolating within it.
Yes, _find_rate_segments still uses vectors_per_second. process_mag_l1c() parses the attribute and passes it through find_all_gaps() into _find_rate_segments(), which iterates the configured (start_time, rate) pairs. The only change is that it walks each boundary backward while the observed cadence already matches the new rate—so it still uses the metadata, just not as an exact sample-level split point.
The root of that extra timestamp is the CIC filter's group delay shortening the filtered range by delay = decimation_factor - 1 burst-rate samples at the tail. With extrapolate=True, the burst-only path can fill trailing NM timestamps past that shortened range. The delay in output-cadence terms is (decimation_factor - 1) / decimation_factor, which is always < 1.0 (e.g. 0.75 for 4x, 0.5 for 2x decimation). So at most one extra output timestamp can appear—this is a mathematical bound, not just empirically matched to the test.

maxinelasp · 2026-04-01T15:58:34Z

    input_filtered, vectors_filtered = cic_filter(
        input_vectors, input_timestamps, output_timestamps, input_rate, output_rate
    )
-    return linear(vectors_filtered, input_filtered, output_timestamps)


Probably, we should add extrapolate=true to all the _filtered methods if we're doing it for one.

Good catch, you're right, the same post-CIC tail-clipping bug exists in quadratic_filtered and cubic_filtered. It's dormant because L1C_INTERPOLATION_METHOD = "linear_filtered", but it'd bite us if the config changed. I'll add it to the other methods, and add an edge-case test that hits all three at the tail boundary.

maxinelasp · 2026-04-01T15:59:26Z


 logger = logging.getLogger(__name__)

+GAP_TOLERANCE = 0.075


Should go in mag/constants.py. Might need a clearer name too (L1C_GAP_TOLERANCE) and a comment describing what this is.

Done. Moved to constants.py under the new name L1C_CADENCE_TOLERANCE (to make clear that it applies to cadence matching generally, not just gap detection) and I added this comment:

# Relative tolerance (7.5%) for L1C cadence checks; allows small clock-drift # variation around the expected cadence before a spacing is treated as a gap.

(If you like L1C_GAP_TOLERANCE better we can use it just lmk)

maxinelasp · 2026-04-01T16:04:14Z

        6-7 - compression flags.
    """
    burst_epochs = burst_dataset["epoch"].data
+    norm_epochs = filled_norm_timeline[:, 0]


Could you rename this variable to make it clear it's the full or filled timeline?

Renamed norm_epochs to filled_timeline_epochs

maxinelasp · 2026-04-01T16:05:26Z

+            # In the burst-only fallback, CIC delay compensation shortens the usable
+            # filtered range at the trailing edge by roughly one output cadence.
+            short_end -= int(1e9 / norm_rate.value)


See my comment in the main section about my concerns about this section.

Assuming this refers to your question 3, I replied above. Happy to expand here too if useful.

maxinelasp · 2026-04-01T16:06:47Z

-        ]
+        gap_timeline = norm_epochs[(norm_epochs > gap[0]) & (norm_epochs < gap[1])]
+
+        short_end = burst_epochs[burst_end - 1]


The name of this variable is kind of confusing to me

Totally fair. It was called short_end because it’s the shortened upper bound used for the short mask after the optional one-cadence trim in the burst-only fallback, but that name wasn’t clear about what was being shortened or that it was an epoch. So I renamed it to usable_burst_end_epoch.

maxinelasp · 2026-04-01T16:07:40Z

        )
        generated_timestamps = generate_missing_timestamps(gap)
        if generated_timestamps.size == 0:
+            last_index = int(np.searchsorted(epoch_data, gap[1], side="left"))


maxinelasp · 2026-04-02T14:34:21Z

-    difference_ns = 0.5 * 1e9
-    output: np.ndarray = np.arange(gap[0], gap[1], difference_ns)
+    difference_ns = int(0.5 * 1e9)
+    if len(gap) > 2:


What is the reasoning for this if statement? Add a comment plz

The if len(gap) > 2 is there to support both legacy (start, end) gaps, which keep the historical 0.5 s cadence, and newer (start, end, rate) gaps, which use the declared cadence.

I added this comment:

# Support both legacy (start, end) gaps, which use the historical 0.5 s cadence, # and newer (start, end, rate) gaps, which use the declared cadence.

sapols · 2026-04-06T22:34:33Z

@sapols Agreed, would you be able to open a follow-up ticket for the day boundary coverage if one does not exist?

Created #2925

sapols · 2026-04-09T20:51:16Z

@maxinelasp I addressed all comments. Ready for review again!

(I also updated the PR description to better explain exactly which changes fixed which tests)

alastairtree · 2026-04-27T17:25:04Z

        Not required for this interpolation method.
+    extrapolate : bool, optional
+        Whether to allow extrapolation of output timestamps outside the range of input
+        timestamps. Default is False.


Can you convince/reassure me this does not lead to any extrapolated science. We are only allowed to filter/decimate science between any two MAG timestamps. those timestamps could be organic/real ones or ones where we have created a timestamp between 2 real MAG timestamps because we are downsampling/decimating the data.
Wherever we are gap filling we should usually have a period where we have data in both BM and NM streams. this allows you throw away data in the overlap period when filtering (because you need to throw away points like in the CIC filter) and still have a complete series across the gap because of the overlap.
Sometimes we change mode and do not overlap because if instrument buffer requirements. In these cases there is a short gap (up to a few secs) between streams. This should appear as a real gap and not be extrapolated.
So at least the science should never be extrapolated, the overlapping data streams allow that to be possible without gaps most of the time, and gaps should not be filled with extrapolation usually. Adding an extrapolation option makes me nervous as it sounds like making up science data which is obliviously a bad look.

I agree with this concern - from our discussions, I believe Shawn that this will never extrapolate timestamps that end up in the final output, but maybe we could add another step in here that tracks and removes all extrapolated timestamps before returning anything, to ensure they are always cropped out. It's my understanding that in this method, the extrapolated values would never be passed out of the interpolation step - so it shouldn't affect anything to call remove_invalid_output_timestamps as a final step, right?

@alastairtree @maxinelasp Okay, I think the requirement Alastair is describing here is stricter than I had been interpreting. So unfortunately, we need to stop here and discuss before I know how to move forward.

The current code is not extrapolating outside the original burst timestamp range, but it is technically extrapolating science beyond the CIC-filtered support a tiny amount (by less than one output cadence). For T016, the difference is exactly one L1C output row in each affected product. Without extrapolate=True, T016 is short by one row (and therefore fails); with it, that final boundary row is kept (so it passes).

That extrapolation is bounded by the CIC group delay. The filtered range is shortened at the trailing edge by decimation_factor - 1 burst-rate samples, which is (decimation_factor - 1) / decimation_factor output cadences. That is always less than one output cadence, so at most one output timestamp can land in that tail interval. In the T016 case, the extrapolated row is ~21.4 ms past the last CIC-filtered MAGO sample and ~5.8 ms past the last CIC-filtered MAGI sample.

That seems to put two expectations in conflict. The T016 validation case expects a full gapless Normal product across this Config-mode transition, including that final boundary row. But the requirement Alastair is describing says that an output row should only be produced where the CIC-filtered burst stream has real support; if the BM/NM streams do not overlap enough after filtering, the remaining boundary interval should stay a real gap.

So I think that leaves us with two clean paths:

If MAG considers this one sub-cadence CIC-edge row acceptable, then the current extrapolate=True approach is doing what T016 expects, and I can tighten the comments/tests around that bounded behavior.

If MAG says absolutely no science may be produced outside the CIC-filtered support, then I should remove the extrapolate=True changes from the filtered interpolation methods. That would shrink this PR and guarantee no post-CIC extrapolated samples leave interpolation, but T016 will go back to failing by one row in each affected product because the available burst data does not contain enough CIC-supported samples to produce that final boundary row.

So I need MAG direction before changing this further: should the T016 gapless-output expectation win for this one bounded edge sample, or should the no-extrapolated-science rule win even though that means T016 is one row short with the current validation inputs/expected output?

alastairtree · 2026-04-27T17:32:05Z

 FILLVAL = -1e31
+# Relative tolerance (7.5%) for L1C cadence checks; allows small clock-drift
+# variation around the expected cadence before a spacing is treated as a gap.
+L1C_CADENCE_TOLERANCE = 0.075


What are the units of this? Percentage of what? Might be useful to see what this actually is for 1/2/4/8Hz products. We are pretty sensitive to timestamp jitter (appears as sin waves in our data due to s/c spin)

This is a percentage of gap, which is defined as expected_gap = 1 / vectors_per_second * 1e9. So this attempts to compensate for very small amounts of timestamp jitter which may appear as a data gap to a more strict gap-checking system.

This is already in the existing code as a magic number. I believe we landed on that percentage when meeting in person about L1C to compensate for potential clock drift.

@sapols, can you expand the comment a little to address Alastair's question?

And possibly rename to something like, L1C_TIMESTAMP_GAP_TOLERANCE?

Sure thing! Renamed to L1C_TIMESTAMP_GAP_TOLERANCE and updated the comment to:

# Relative tolerance for L1C timestamp-gap checks; allows small clock-drift # variation around the expected cadence before a spacing is treated as a gap. # This is 7.5% of expected_gap = 1e9 / vectors_per_second ns # (75, 37.5, 18.75, or 9.375 ms at 1, 2, 4, or 8 Hz, respectively). L1C_TIMESTAMP_GAP_TOLERANCE = 0.075

alastairtree · 2026-04-27T17:33:32Z

Thanks for this. Have added a couple of comments inline. Feel free to send over some CDFs generated with this to the MAG team and we can take a look at the results if that helps.

maxinelasp

This looks good to me, as long as we're able to address @alastairtree's concerns. I'll approve once we get Alastair's thumbs up, but no major changes requested from me :)
Thanks Shawn!!

maxinelasp · 2026-04-28T20:04:37Z

        Not required for this interpolation method.
+    extrapolate : bool, optional
+        Whether to allow extrapolation of output timestamps outside the range of input
+        timestamps. Default is False.


I agree with this concern - from our discussions, I believe Shawn that this will never extrapolate timestamps that end up in the final output, but maybe we could add another step in here that tracks and removes all extrapolated timestamps before returning anything, to ensure they are always cropped out. It's my understanding that in this method, the extrapolated values would never be passed out of the interpolation step - so it shouldn't affect anything to call remove_invalid_output_timestamps as a final step, right?

maxinelasp · 2026-04-28T20:25:49Z

 FILLVAL = -1e31
+# Relative tolerance (7.5%) for L1C cadence checks; allows small clock-drift
+# variation around the expected cadence before a spacing is treated as a gap.
+L1C_CADENCE_TOLERANCE = 0.075


This is a percentage of gap, which is defined as expected_gap = 1 / vectors_per_second * 1e9. So this attempts to compensate for very small amounts of timestamp jitter which may appear as a data gap to a more strict gap-checking system.

This is already in the existing code as a magic number. I believe we landed on that percentage when meeting in person about L1C to compensate for potential clock drift.

@sapols, can you expand the comment a little to address Alastair's question?

maxinelasp · 2026-04-28T20:26:00Z

+    )
+
+
+def _find_rate_segments(


maxinelasp · 2026-04-28T20:26:36Z

 FILLVAL = -1e31
+# Relative tolerance (7.5%) for L1C cadence checks; allows small clock-drift
+# variation around the expected cadence before a spacing is treated as a gap.
+L1C_CADENCE_TOLERANCE = 0.075


And possibly rename to something like, L1C_TIMESTAMP_GAP_TOLERANCE?

maxinelasp · 2026-04-28T20:29:58Z

+    )
+    assert np.array_equal(output_rate_gap, expected_rate_gap)
+
+    # Test Case: Adjacent gaps share a real epoch boundary at gap[0].


Nice, I like this test case

sapols and others added 3 commits April 1, 2026 01:12

Fix MAG L1C pre-commit failures

251f43f

Remove redundant empty gap_timeline guard

66fc696

sapols marked this pull request as ready for review April 1, 2026 09:37

sapols requested a review from maxinelasp April 1, 2026 09:38

maxinelasp reviewed Apr 2, 2026

View reviewed changes

sapols mentioned this pull request Apr 6, 2026

BUG - MAG L1C validation tests #1993

Open

sapols mentioned this pull request Apr 6, 2026

BUG - MAG L1C does not support T017-T019 (day-boundary / inherited-timeline cases) #2925

Open

sapols added 6 commits April 9, 2026 11:26

Move and rename constant

f613d16

Format constant rename

e58e871

Add extrapolate=true to filtered interpolation methods

a1a4e2d

Rename norm_epochs to filled_timeline_epochs

43eb575

Rename short_end to usable_burst_end_epoch

d0b0959

Clarify if statement with comment

9352b98

sapols mentioned this pull request Apr 15, 2026

Mag L1C - Fix mixed norm+burst day-boundary gap collapse (#2925) sapols/imap_processing#5

Draft

6 tasks

alastairtree reviewed Apr 27, 2026

View reviewed changes

maxinelasp reviewed Apr 28, 2026

View reviewed changes

Clarify MAG L1C timestamp gap tolerance

d874dc2

		)


		def _find_rate_segments(

Conversation

sapols commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

What was failing and why

How this was investigated

Relationship to the algorithm document

The six bugs and their fixes

Fix 1: Rate-aware gap cadence (generate_missing_timestamps)

Fix 2: 2D array indexing in interpolate_gaps

Fix 3: Off-by-one burst slice endpoint

Fix 4: CIC filter edge clipping (linear_filtered)

Fix 5: Spurious micro-gaps at Config mode transitions (_find_rate_segments)

Fix 6: Burst-only trailing-edge trim for T024

Additional cleanup: Dead np.delete removal

Defensive additions

Test changes

Files changed

What this PR does NOT change

Validation results

Uh oh!

sapols commented Apr 1, 2026

Uh oh!

sapols commented Apr 1, 2026

Uh oh!

maxinelasp commented Apr 1, 2026

Uh oh!

maxinelasp commented Apr 1, 2026

Uh oh!

sapols commented Apr 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sapols Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sapols commented Apr 6, 2026

Uh oh!

sapols commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sapols Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

sapols commented Apr 1, 2026 •

edited

Loading

Fix 1: Rate-aware gap cadence (`generate_missing_timestamps`)

Fix 2: 2D array indexing in `interpolate_gaps`

Fix 4: CIC filter edge clipping (`linear_filtered`)

Fix 5: Spurious micro-gaps at Config mode transitions (`_find_rate_segments`)

Additional cleanup: Dead `np.delete` removal

sapols Apr 9, 2026 •

edited

Loading

sapols commented Apr 9, 2026 •

edited

Loading

sapols Apr 29, 2026 •

edited

Loading