Enforce SQLToolset allowed_tables on queries, not just discovery#68487
Merged
kaxil merged 2 commits intoJun 16, 2026
Conversation
SQLToolset's allowed_tables on queries6130da2 to
eae9492
Compare
allowed_tables previously restricted only metadata discovery (list_tables /
get_schema); the query and check_query tools never checked it, so an agent
could read any table by name. It is now enforced on the query tools as well:
the SQL is parsed with sqlglot and rejected before execution if it reaches a
table that is not on the list, resolved with its database/catalog and with CTE
references excluded by lexical scope.
Constructs a schema.table list cannot describe are rejected while a list is
active: table-valued functions, TABLE('name') row sources, the TABLE <name>
shorthand, SHOW, dynamic SQL, quoted identifiers, cross-database references,
and inline comments (where MySQL executable /*! ... */ comments hide).
This is application-level defense-in-depth, not a substitute for database
permissions: data reached through a function whose argument is itself SQL or a
path (pg_read_file, query_to_xml, scalar dblink) is out of its reach, so a
least-privilege DB role remains the hard boundary.
eae9492 to
cb1c4a6
Compare
gopidesupavan
approved these changes
Jun 14, 2026
gopidesupavan
left a comment
Member
There was a problem hiding this comment.
LGTM, one comment it would be good to resolve them.. :)
A CTE used as a DML source (WITH src AS (...) INSERT INTO orders SELECT * FROM src) was falsely rejected: CTE scoping was disabled for the whole DML statement, so src was checked against allowed_tables as if it were a base table. Now only the DML target is exempt from CTE resolution (you cannot write to a CTE, so a same-named CTE never shadows the target); sources follow normal lexical CTE scoping. The target, and any off-list table a CTE body actually reads, are still enforced.
RulerChen
pushed a commit
to RulerChen/airflow
that referenced
this pull request
Jun 16, 2026
…che#68487) allowed_tables previously restricted only metadata discovery (list_tables / get_schema); the query and check_query tools never checked it, so an agent could read any table by name. It is now enforced on the query tools as well: the SQL is parsed with sqlglot and rejected before execution if it reaches a table that is not on the list, resolved with its database/catalog and with CTE references excluded by lexical scope. Constructs a schema.table list cannot describe are rejected while a list is active: table-valued functions, TABLE('name') row sources, the TABLE <name> shorthand, SHOW, dynamic SQL, quoted identifiers, cross-database references, and inline comments (where MySQL executable /*! ... */ comments hide). This is application-level defense-in-depth, not a substitute for database permissions: data reached through a function whose argument is itself SQL or a path (pg_read_file, query_to_xml, scalar dblink) is out of its reach, so a least-privilege DB role remains the hard boundary. * Allow CTE sources in DML against allowed_tables A CTE used as a DML source (WITH src AS (...) INSERT INTO orders SELECT * FROM src) was falsely rejected: CTE scoping was disabled for the whole DML statement, so src was checked against allowed_tables as if it were a base table. Now only the DML target is exempt from CTE resolution (you cannot write to a CTE, so a same-named CTE never shadows the target); sources follow normal lexical CTE scoping. The target, and any off-list table a CTE body actually reads, are still enforced.
This was referenced Jun 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SQLToolset(allowed_tables=[...])previously restricted only metadata discovery (list_tables/get_schema); thequeryandcheck_querytools never checked it, so an agent could read any table by name (SELECT * FROM secret). This makesallowed_tablesan enforced boundary on the query tools too: the SQL is parsed with sqlglot and rejected before execution if it reaches any table not on the list.What it catches
Every table a query reaches must be on the list, resolved with its database/catalog: direct, subquery, CTE body, JOIN, set operations,
DESCRIBE,information_schema/pg_catalog, and DML (withallow_writes=True). CTE references are excluded by lexical scope, so a same-named CTE in another scope cannot hide a real table.Constructs a
schema.tablelist cannot describe are rejected while the list is active: table-valued functions (dblink),TABLE('name')row sources, theTABLE <name>shorthand,SHOW, dynamic SQL (EXEC), quoted identifiers (case-sensitive on the engine), cross-database references (otherdb.public.orders), and inline comments.Design rationale
SELECT * FROM orders/*!UNION SELECT * FROM secret*/runs the UNION arm on MySQL/MariaDB but is inert to sqlglot and other engines (a parser/engine differential). Rejecting comments closes the class instead of chasing variants.Tradeoffs and limitations
Application-level guardrail (parse-then-check): strong defense-in-depth, not a substitute for database permissions. It cannot police data reached through a function whose argument is itself SQL or a path:
pg_read_file('...')(a file) orquery_to_xml('SELECT ... FROM other', ...)/ scalardblink(a table, via a string the parser cannot read). For a hard boundary, run the connection as a least-privilege role withSELECTlimited to the same tables. Quoted identifiers and comments are rejected while a list is active, so agents should send unquoted, comment-free SQL on restricted connections.