fix(platform): dedupe overlapping PII matches and add CVC pattern#1624
Conversation
) The phone regex matches a 14-char prefix of any credit-card number, so detectPii returned both ranges with the same start. maskPii then spliced each replacement using original-text indices into a string already mutated by an earlier splice, which corrupted the output and could swallow the next [EMAIL] token entirely (e.g. "4532123456789010 test@example.com 123" became "[CREDIT_CARD] 123" — the email vanished). Fixed by sweeping overlapping matches in detectPii: longer match wins, on equal length the earlier-inserted pattern wins. Once detectPii returns no overlaps, the existing end-to-start splice in maskPii is correct. Also adds a context-anchored CVC pattern. Bare 3-digit numbers are intentionally NOT detected (would false-positive on ages, room numbers, error codes); Microsoft Presidio, AWS Comprehend, and Cloudflare WAF skip CVV detection for the same reason. The pattern catches labeled cases like "my CVC is 123", "cvv: 456", "card security code 789". Existing orgs need to toggle the new "Card security codes (CVC/CVV)" pattern on in Settings → Guardrails; new orgs pick it up via the PATTERN_NAMES default.
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis PR addresses a PII masking bug where multiple PII items in a single input weren't all detected. It modifies the Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Closes #1618.
Summary
detectPiiso the masker never sees two ranges sharing a span (longer match wins; on tie, earlier built-in wins).cvcpattern: matchesCVC: 123,my cvv is 4567,card security code 890, etc. Bare 3-digit numbers are intentionally not detected.Why the bug looks like "email wasn't masked"
The reporter said only the credit card got masked and email/CVC came through. Tracing it on real inputs revealed two distinct issues:
The masker was eating the
[EMAIL]token. The built-inphoneregex matches a 14-char prefix of any credit-card number (e.g.4532-1234-5678inside4532-1234-5678-9010).detectPiireturned both ranges;maskPiisorted DESC by start and spliced each replacement using original-text indices into a string already mutated by earlier splices. Once two matches shared a start, the later splice'sresult.slice(match.end)landed at the wrong offset and consumed adjacent characters — sometimes the[EMAIL]placeholder entirely.Concrete repro (verified against the unfixed code):
4532123456789010 test@example.com 123[CREDIT_CARD] 123(email vanishes)cc 4111-1111-1111-1111 mail user@foo.org cvv 999cc [CREDIT_CARD]EMAIL] cvv 999([of[EMAIL]eaten)My credit card is 4532-1234-5678-9010, my email is alice@example.com.My credit card is [CREDIT_CARD]ail is [EMAIL].(, my emeaten)Fix: dedupe overlaps in
detectPiivia a sort-and-sweep. Once the input has no overlaps, the existing end-to-start splice inmaskPiiis correct, somaskPiiitself is unchanged.CVC was never detected.
BUILT_IN_PII_PATTERNShad no CVC entry. Adding one is non-trivial because bare 3-digit numbers can't be reliably distinguished from ages, room numbers, error codes, etc. — Microsoft Presidio, AWS Comprehend, and Cloudflare WAF all skip CVV detection for the same reason. We add a context-anchored pattern that requires a label keyword (cvc/cvv/cv2/card security code) near the digits. Bare123is intentionally not detected.Out of scope (called out for the reviewer)
Behavior change for existing orgs
Existing orgs' saved
enabledPatternsarrays don't listcvc, so it's off-by-default for them — admins toggle the new "Card security codes (CVC/CVV)" switch on in Settings → Guardrails. New orgs pick it up via thePATTERN_NAMESdefault inpii-config.tsx. No DB migration; the schema storesenabledPatternsasz.array(z.string())(no enum).Test plan
npm run lint --workspace=@tale/platform— 0 warnings, 0 errorsnpx tsc --noEmitfromservices/platform/— cleannpx vitest run convex/governance/pii— 33/33 (12 new + 21 existing)npx vitest run convex/agents/__tests__/unified_chat_ttft.test.ts— 3/3npx vitest run app/features/settings/governance/components/pii-config.test.tsx— 1/1My card is 4532-1234-5678-9010, email user@example.com, CVC: 987→ all three masked, no eaten text.Summary by CodeRabbit
New Features
Bug Fixes
Tests