Skip to content

fix: preserve string literals during schema qualification stripping (#371)#372

Open
tianzhou wants to merge 1 commit intomainfrom
fix/issue-371-preserve-string-literals-in-schema-stripping
Open

fix: preserve string literals during schema qualification stripping (#371)#372
tianzhou wants to merge 1 commit intomainfrom
fix/issue-371-preserve-string-literals-in-schema-stripping

Conversation

@tianzhou
Copy link
Contributor

Summary

  • stripSchemaQualificationsFromText used regex patterns that matched schema prefixes inside single-quoted SQL string literals, corrupting e.g. 's.manage' into 'manage' when the target schema was s
  • This caused pgschema plan to generate destructive false-positive ALTER POLICY statements that silently truncated function arguments in USING/WITH CHECK expressions
  • Added stripSchemaQualificationsPreservingStrings which splits text on single-quoted string boundaries before applying schema stripping, preserving string literal content
  • Added fast-path strings.Contains check to skip all work when the schema name is absent

Fixes #371

Test plan

  • Unit tests: go test -v ./internal/postgres -run TestStripSchemaQualifications_PreservesStringLiterals (6 cases covering string preservation, escaped quotes, mixed identifiers/literals)
  • Diff tests: PGSCHEMA_TEST_FILTER="create_policy/" go test -v ./internal/diff -run TestDiffFromFiles (added scope_check policy with has_scope('public.manage') to existing alter_policy_using test case)
  • Integration tests: PGSCHEMA_TEST_FILTER="create_policy/" go test -v ./cmd -run TestPlanAndApply (all 10 policy tests pass, no false-positive diff for string literal policy)

🤖 Generated with Claude Code

…371)

stripSchemaQualificationsFromText used regex patterns that matched
schema prefixes inside single-quoted SQL string literals. For example,
with schema "s", Pattern 4 treated the single quote in 's.manage' as a
valid non-double-quote prefix character, corrupting 's.manage' into
'manage'. This caused pgschema plan to generate destructive false-positive
ALTER POLICY statements that silently truncated function arguments.

Add stripSchemaQualificationsPreservingStrings which splits text on
single-quoted string boundaries (handling '' escapes) and applies schema
stripping only to non-string segments. Also add a fast-path
strings.Contains check to skip all work when the schema name is absent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 24, 2026 16:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes schema-qualification stripping during pgschema plan so that schema prefixes inside single-quoted SQL string literals (e.g. 's.manage') are preserved, preventing destructive false-positive diffs for RLS policies.

Changes:

  • Adds string-literal–aware schema qualification stripping by splitting non–dollar-quoted SQL on single-quoted boundaries before applying regex-based stripping.
  • Adds a fast-path to skip processing when the schema name does not appear in the SQL text.
  • Adds unit tests and extends diff fixtures to cover the RLS policy string-literal regression (Issue #371).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/postgres/desired_state.go Updates schema qualification stripping to preserve single-quoted string literals and adds a strings.Contains fast-path.
internal/postgres/desired_state_test.go Adds unit tests validating schema stripping while preserving string literals and escaped quotes.
testdata/diff/create_policy/alter_policy_using/old.sql Extends policy diff fixture with a policy using a dotted string literal ('public.manage').
testdata/diff/create_policy/alter_policy_using/new.sql Extends policy diff fixture with the same string-literal policy to ensure no false-positive diff.
testdata/diff/create_policy/alter_policy_using/plan.json Updates the expected plan fingerprint hash after fixture changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sql: "SELECT 'public.a', public.t, 'public.b';",
schema: "public",
expected: "SELECT 'public.a', t, 'public.b';",
},
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new unit tests cover basic string-literal preservation, but they don’t cover the failure mode where apostrophes appear inside SQL comments (e.g., -- don't ...) and interfere with the string-splitting state machine. Adding a test case with a comment containing a single quote followed by a schema-qualified identifier would help prevent regressions once comment-handling is fixed.

Suggested change
},
},
{
name: "handles apostrophe in comment followed by schema-qualified identifier",
sql: "SELECT 1; -- don't drop public.t\nDROP TABLE public.t;",
schema: "public",
expected: "SELECT 1; -- don't drop public.t\nDROP TABLE t;",
},

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +153
inString := false

i := 0
segStart := 0
for i < len(text) {
ch := text[i]
if ch == '\'' {
if !inString {
// End of non-string segment — strip schema qualifications from it
result.WriteString(stripSchemaQualificationsFromText(text[segStart:i], schemaName))
segStart = i
inString = true
i++
} else {
// Check for escaped quote ('')
if i+1 < len(text) && text[i+1] == '\'' {
i += 2 // skip ''
} else {
// End of string literal — write it as-is
inString = false
i++
result.WriteString(text[segStart:i])
segStart = i
}
}
} else {
i++
}
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stripSchemaQualificationsPreservingStrings treats every single quote as the start/end of a SQL string literal, but it doesn’t account for quotes appearing inside SQL comments (e.g., -- don't ...). A single apostrophe in a -- or /* ... */ comment can flip inString and cause the rest of the SQL text to be treated as “string”, preventing schema qualification stripping for subsequent statements (potentially creating objects in the real schema instead of the temp schema). Consider extending the splitter to recognize and skip over line/block comments (and only toggle inString when not inside a comment).

Suggested change
inString := false
i := 0
segStart := 0
for i < len(text) {
ch := text[i]
if ch == '\'' {
if !inString {
// End of non-string segment — strip schema qualifications from it
result.WriteString(stripSchemaQualificationsFromText(text[segStart:i], schemaName))
segStart = i
inString = true
i++
} else {
// Check for escaped quote ('')
if i+1 < len(text) && text[i+1] == '\'' {
i += 2 // skip ''
} else {
// End of string literal — write it as-is
inString = false
i++
result.WriteString(text[segStart:i])
segStart = i
}
}
} else {
i++
}
}
inString := false
inLineComment := false
inBlockComment := false
i := 0
segStart := 0
for i < len(text) {
ch := text[i]
if inString {
if ch == '\'' {
// Check for escaped quote ('')
if i+1 < len(text) && text[i+1] == '\'' {
i += 2 // skip ''
continue
}
// End of string literal — write it as-is
inString = false
i++
result.WriteString(text[segStart:i])
segStart = i
continue
}
// Inside string literal, just advance
i++
continue
}
if inLineComment {
// Line comment ends at newline
if ch == '\n' {
inLineComment = false
}
i++
continue
}
if inBlockComment {
// Block comment ends at */
if ch == '*' && i+1 < len(text) && text[i+1] == '/' {
inBlockComment = false
i += 2
} else {
i++
}
continue
}
// Not in string or comment
if ch == '-' && i+1 < len(text) && text[i+1] == '-' {
// Start of line comment
inLineComment = true
i += 2
continue
}
if ch == '/' && i+1 < len(text) && text[i+1] == '*' {
// Start of block comment
inBlockComment = true
i += 2
continue
}
if ch == '\'' {
// Start of string literal. End of non-string segment — strip schema qualifications from it.
result.WriteString(stripSchemaQualificationsFromText(text[segStart:i], schemaName))
segStart = i
inString = true
i++
continue
}
i++
}

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +135
if !inString {
// End of non-string segment — strip schema qualifications from it
result.WriteString(stripSchemaQualificationsFromText(text[segStart:i], schemaName))
segStart = i
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stripSchemaQualificationsFromText compiles multiple regexes on every call. With the new string-literal splitting, it can now be invoked many times per SQL statement (one per non-string segment), which can significantly increase CPU/time on large schemas. Consider compiling the regexes once per schemaName per stripSchemaQualifications invocation (or caching by schemaName) and reusing them across segments.

Copilot uses AI. Check for mistakes.
@greptile-apps
Copy link

greptile-apps bot commented Mar 24, 2026

Greptile Summary

This PR fixes a correctness bug in stripSchemaQualifications where regex-based schema prefix removal was applied to the contents of single-quoted SQL string literals, corrupting expressions like has_scope('public.manage') into has_scope('manage') and causing false-positive ALTER POLICY statements in plan output.

Key changes:

  • Adds stripSchemaQualificationsPreservingStrings — a character-level scanner that splits text on single-quoted string boundaries, applying stripSchemaQualificationsFromText only to non-string segments and writing string literals through untouched.
  • Adds a strings.Contains fast-path in stripSchemaQualifications to skip all processing when the schema name is absent from the input.
  • Adds 6 unit tests covering string preservation, ''-escaped embedded quotes, mixed identifier/literal inputs, and multiple literals in a single statement.
  • Updates the alter_policy_using diff test case with a scope_check policy that exercises the fix end-to-end; the policy correctly produces no diff.

Two minor edge cases to be aware of:

  • A single quote appearing inside a -- line comment will cause the scanner to enter inString = true, suppressing schema stripping for all subsequent text until a matching closing quote is found. This is unlikely in practice but worth documenting.
  • The E'...' escape-string form (where \' embeds a quote) is not recognised. The failure mode is conservative — identifiers after the real end of such a string may not be stripped — but it's not a correctness regression over the prior code.

Confidence Score: 5/5

  • Safe to merge — fixes a real false-positive corruption bug with correct logic, comprehensive tests, and no regressions.
  • The core fix (splitting on single-quoted string boundaries before applying regex stripping) is logically sound and thoroughly tested for all common SQL string forms. The two noted edge cases (-- comment quotes and E'...' backslash escapes) are pre-existing limitations of the parsing pipeline that fail conservatively (missed stripping, never corruption). The diff test fixture directly validates the end-to-end scenario from the bug report. No existing tests are broken.
  • internal/postgres/desired_state.go — review the two P2 edge cases in stripSchemaQualificationsPreservingStrings (comment-embedded quotes and E'...' escape strings) before expanding the feature further.

Important Files Changed

Filename Overview
internal/postgres/desired_state.go Introduces stripSchemaQualificationsPreservingStrings which splits on single-quoted string boundaries before applying regex-based schema stripping. Logic is correct for standard ''-escaped strings; two minor edge cases exist (single quotes inside -- comments, and E'...' backslash-escape form) that are pre-existing limitations and fail conservatively (missed stripping, never corruption). Also adds a strings.Contains fast-path optimization.
internal/postgres/desired_state_test.go Adds 6 targeted unit tests for stripSchemaQualifications covering: basic stripping, single-quoted string preservation, short schema names, mixed identifier/literal, ''-escaped quotes, and multiple literals. Coverage is thorough for the stated fix.
testdata/diff/create_policy/alter_policy_using/new.sql Adds has_scope helper function and scope_check policy with has_scope('public.manage') to serve as the regression fixture for Issue #371. Correctly absent from plan.json because old.sql and new.sql are identical for this policy.
testdata/diff/create_policy/alter_policy_using/old.sql Mirrors new.sql by adding the same has_scope function and scope_check policy, ensuring no false-positive diff is generated. Only the user_tenant_isolation USING expression differs between the two files.
testdata/diff/create_policy/alter_policy_using/plan.json Hash updated for the new source SQL; plan still only contains the expected ALTER POLICY user_tenant_isolation step, confirming scope_check produces no false-positive diff.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["stripSchemaQualifications(sql, schemaName)"] --> B{schemaName == empty\nor not in sql?}
    B -- Yes --> C[Return sql unchanged\nfast-path]
    B -- No --> D["splitDollarQuotedSegments(sql)"]
    D --> E{Segment\ndollar-quoted?}
    E -- Yes --> F[Write segment as-is\npreserve function bodies]
    E -- No --> G["stripSchemaQualificationsPreservingStrings(seg, schemaName)"]
    G --> H{Scan character by character}
    H --> I{char == single-quote?}
    I -- Not in string --> J["Flush non-string segment\nto stripSchemaQualificationsFromText()\nEnter string mode"]
    I -- In string, next is also quote --> K["Skip '' pair\nstay in string mode"]
    I -- In string, next is not quote --> L["Write string literal as-is\nExit string mode"]
    I -- Not a quote --> M[Advance i]
    J --> H
    K --> H
    L --> H
    M --> H
    H -- End of text --> N{Still inString?}
    N -- Yes --> O[Write remaining as-is\nunterminated literal]
    N -- No --> P["Apply stripSchemaQualificationsFromText()\nto trailing segment"]
    F --> Q[Reassemble result]
    O --> Q
    P --> Q
Loading

Comments Outside Diff (1)

  1. internal/postgres/desired_state.go, line 120-164 (link)

    P2 Single-quote inside -- comment mis-parses subsequent identifiers

    If a SQL -- line comment contains a single quote (e.g. -- it's ready), the parser enters inString = true and treats every subsequent character on all following lines as inside a string literal. This means schema qualifications on those lines will silently not be stripped.

    Example:

    -- it's a note
    SELECT public.users;

    With the current code, public.users will not be stripped because the ' in the comment triggers the "start of string" branch.

    This is a pre-existing limitation of the un-comment-aware parsing pipeline, but the introduction of stripSchemaQualificationsPreservingStrings makes this concern real for multi-statement DDL files containing annotated SQL (which the test data in this very PR does — see the -- Policy with string literal… comments). Since those comments do not contain quotes there is no regression in the test cases, but the boundary case is unguarded.

    Consider either:

    • Skipping characters after -- until \n before entering the quote-scanning loop, or
    • Documenting this limitation in the function's doc comment.

Reviews (1): Last reviewed commit: "fix: preserve string literals during sch..." | Re-trigger Greptile

Comment on lines +138 to +142
} else {
// Check for escaped quote ('')
if i+1 < len(text) && text[i+1] == '\'' {
i += 2 // skip ''
} else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 E'...' backslash-escaped quote not handled

The escape-string syntax E'it\'s public.test' uses \' (backslash + single quote) to embed a quote rather than '' (doubling). The current parser only recognises the '' form:

if i+1 < len(text) && text[i+1] == '\'' {
    i += 2 // skip ''
}

With E'content\'' suffix, the \ is consumed as an ordinary character; then when ' (the real escaped quote) is reached and the following char is ' (the string closer), the code treats both as a '' pair and skips them — leaving inString = true beyond the true end of the string. Any schema identifiers after the closing quote will therefore be treated as string content and not stripped.

The failure mode is a false-negative (missed stripping) rather than a false-positive (corruption), so it is safe but subtly incorrect. A doc comment or a test case capturing this limitation would make the boundary explicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

plan generates destructive ALTER POLICY when USING expression contains dot in string literal

2 participants