[fix](fe) Fix VSlotRef invalid slot id in INTERSECT/EXCEPT with ExprId reuse#62296
[fix](fe) Fix VSlotRef invalid slot id in INTERSECT/EXCEPT with ExprId reuse#62296yujun777 wants to merge 2 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Findings
- High - The fix is incomplete because the same ExprId->SlotRef overwrite pattern still exists in the functionally parallel recursive CTE translator path (
visitPhysicalRecursiveUnion()), so recursive unions can still bind child result expressions to the wrong slots when ExprIds are reused.
Critical checkpoints
- Goal of current task: Partially achieved.
visitPhysicalSetOperation()is fixed and covered by regression tests, but the same translator invariant is still violated invisitPhysicalRecursiveUnion(). - Modification size/focus: Yes, the patch is small and focused.
- Concurrency: Not involved.
- Special lifecycle/static init: No special lifecycle concerns in the changed code.
- Configuration: None added.
- Compatibility: No FE/BE protocol or storage compatibility change.
- Parallel code paths: Not fully covered.
visitPhysicalRecursiveUnion()is a parallel path that still uses the old translation order. - Special conditional checks: The new ordering comment is clear; no extra conditional-risk issue found.
- Test coverage: Added good regression coverage for non-recursive set operations, but there is no recursive CTE regression for the same ExprId-reuse invariant.
- Observability: No additional observability needed for this translator-only fix.
- Transaction/persistence: Not involved.
- Data writes/modifications: Not involved.
- FE/BE variable passing: Not involved.
- Performance: Acceptable; the change is still linear and does not add meaningful overhead.
- Other issues: None beyond the missed recursive CTE path.
| @Override | ||
| public PlanFragment visitPhysicalSetOperation( | ||
| PhysicalSetOperation setOperation, PlanTranslatorContext context) { | ||
| // Translate each child's result exprs and distribute exprs immediately after |
There was a problem hiding this comment.
High: this only fixes visitPhysicalSetOperation(), but visitPhysicalRecursiveUnion() still follows the old order: visit both children, call generateTupleDesc(recursiveCte.getOutput(), ...), then translate recursiveCte.getRegularChildrenOutputs() (see lines 2302-2328 in the current file). That path uses the same global PlanTranslatorContext.exprIdToSlotRef map, so later child visits and the parent tuple creation still overwrite earlier mappings. Because LogicalRecursiveUnion.buildNewOutputs() can preserve an existing output ExprId, a recursive CTE with reused ExprIds can still translate to the wrong SlotRef and hit the same invalid-slot failure. Please apply the same interleaving strategy here and add a regression under recursive_cte.
FE UT Coverage ReportIncrement line coverage |
bf4213e to
8aa64f0
Compare
|
run buildall |
|
/review |
|
Re: review comment on |
There was a problem hiding this comment.
One issue found.
regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.outencodes the wrong expected result forintersect_nulls. Doris already treatsNULLas equal for set operations: the existingquery_p0/set_operations/set_with_nullsuite expectsINTERSECT ... SELECT NULLto return\N, not to drop it. The new expected output only contains1, so this regression test would bless a semantic regression instead of protecting against it.
Critical checkpoint conclusions:
- Goal / correctness / test proof: The goal is to fix
ExprId -> SlotRefoverwrite during set-operation translation. The code change inPhysicalPlanTranslator.visitPhysicalSetOperation()does address that overwrite hazard by translating each child immediately after visiting it, and the added suite exercises the reported failing shapes. However, one expected result is incorrect, so the current tests do not yet prove the fix safely. - Scope / minimality: The code change is small and focused. Extending the same interleaving pattern to
visitPhysicalRecursiveUnion()is also consistent with the same mutable-map mechanism. - Concurrency: No new concurrency or lock-safety concerns in the touched code path.
- Lifecycle / static initialization: No special lifecycle or static-init concerns introduced here.
- Config changes: None.
- Compatibility: No FE/BE protocol or storage-format compatibility changes detected.
- Parallel code paths: The analogous recursive CTE path was updated too, which is good. I did not find another remaining set-operation translation path with the same overwrite pattern in this file.
- Special condition checks: The new comments explain the
ExprIdreuse hazard clearly enough. - Test coverage: Coverage improved for INTERSECT / EXCEPT / UNION DISTINCT and nested shapes, but the
intersect_nullsoracle is inconsistent with existing Doris semantics and should be corrected. No recursive CTE regression was added for the newly touched parallel path. - Observability: No additional observability seems necessary for this FE translation fix.
- Transaction / persistence: Not applicable.
- Data writes / modifications: Not applicable beyond regression-test DDL/DML setup.
- FE/BE variable passing: No new variables added.
- Performance: The interleaving avoids extra retranslation hazards without adding meaningful overhead.
- Other issues: None beyond the incorrect expected result above.
regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.out
Show resolved
Hide resolved
FE UT Coverage ReportIncrement line coverage |
morrySnow
left a comment
There was a problem hiding this comment.
should not has same exprid for different slot reference in a plan
8aa64f0 to
843dadf
Compare
|
run buildall |
|
/review |
PushDownFilterThroughSetOperation introduce duplicate exprid, fix it |
There was a problem hiding this comment.
Findings:
- High -
regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.outrecords the wrong expected result for the newINTERSECTNULL case. Doris' existingset_with_nullregression suite treatsNULLas equal in set semantics, soorder_qt_intersect_nullsshould return both\Nand1, not just1. As written, this new regression would bless a semantic regression instead of proving the ExprId fix.
Critical checkpoint conclusions:
- Goal of the task: Partially accomplished. The code change is aimed at removing ExprId collisions for promoted UNION constant rows, and the FE unit test covers that helper behavior. However, the new end-to-end regression currently has an incorrect golden result, so it does not reliably prove the user-visible fix.
- Modification size/focus: Yes. The Java change is small and focused on
PushDownFilterThroughSetOperation. - Concurrency: Not involved in this diff.
- Special lifecycle/static init: Not involved.
- Configuration changes: None.
- Compatibility/incompatible changes: None observed.
- Functionally parallel paths: No additional unaddressed path found in the current diff beyond the added regression issue.
- Special conditional checks: The new project-wrapping branch is understandable and locally justified to generate fresh ExprIds.
- Test coverage: FE unit coverage is good for the rewrite helper, but regression coverage is not yet trustworthy because of the wrong NULL expectation.
- Observability: No additional observability needed for this FE rewrite fix.
- Transaction/persistence: Not involved.
- Data writes/modifications: Not involved.
- FE-BE variable passing: Not involved.
- Performance: No obvious performance regression; the new project wrapper is only on the promoted-constant rewrite path.
- Other issues: The incorrect regression output above is the remaining blocker I found.
I did not run FE or regression tests locally in this runner; conclusions are based on code inspection and existing Doris regression semantics.
regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.out
Show resolved
Hide resolved
…d in UNION ### What problem does this PR solve? Issue Number: close #CIR-19889 Related PR: apache#56366 Problem Summary: PR apache#56366 (master, commit e27ceb3, 2025-09-28) introduced a bug in PushDownFilterThroughSetOperation.addFiltersToNewChildren(). When a UNION constant expression cannot be statically eliminated by a filter (e.g., NULL IN (4,2) evaluates to UNKNOWN), it is promoted from constantExprsList to a regular child via LogicalOneRowRelation + LogicalFilter. The else branch directly used eliminateFilter.getOutput() as regularChildrenOutputs, inheriting the same ExprId as the UNION output slot. This ExprId collision is then manifested in PhysicalPlanTranslator .visitPhysicalSetOperation(): generateTupleDesc(output) registers ExprId->SlotRef for the UNION output, which overwrites the child's earlier ExprId->SlotRef mapping (both share the same ExprId). The child result expression then resolves to the wrong slot, causing BE crash: "VSlotRef have invalid slot id". The fix wraps the promoted constant child in a LogicalProject with fresh ExprIds (via new Alias(ne.toSlot(), ne.getName()) which auto-generates a fresh ExprId), so that regularChildrenOutputs never share ExprIds with the UNION output. Also removed dead code: a Builder<List<SlotReference>> newChildrenOutput that was constructed but never used. ### Release note Fix INTERSECT/EXCEPT/UNION DISTINCT query crash with "VSlotRef have invalid slot id" when a UNION constant row cannot be statically eliminated during filter push-down. ### Check List (For Author) - Test: Regression test (query_p0/set_operations/set_operation_exprid_reuse.groovy) + FE unit test (PushDownFilterThroughSetOperationTest, added testConstantPromotionGeneratesFreshExprIds) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
843dadf to
cac8ff4
Compare
|
run buildall |
|
/review |
|
No issues found in the reviewed changes. Critical checkpoints:
Residual risk:
|
FE UT Coverage ReportIncrement line coverage |
Refresh the regression output for set_operation_exprid_reuse after the set operation ExprId reuse fix changed NULL handling in the result. Key changes: - regenerate regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.out - update expected rows for intersect_cte_expr, except_cte_expr, and intersect_nulls Unit Test: - run-regression-test.sh --run --conf regression-test/conf/regression-conf-custom.groovy -g p0 -d query_p0/set_operations -s set_operation_exprid_reuse -forceGenOut Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number:
Problem Summary:
Introduced by #56366.
The real bug is not in the physical translator. It starts earlier in
PushDownFilterThroughSetOperationwhen a filter is pushed through a set operation that still contains constant children.For UNION/INTERSECT/EXCEPT, constant children intentionally reuse the set operation output ExprIds. When filter push-down cannot fold one of those constant children away, the rewrite promotes that constant expression into a regular child. Before this fix, the promoted child kept the same ExprIds as the set operation output, so
regularChildrenOutputsand the set operation output collided on ExprId. That duplicate ExprId state later surfaced as wrong slot binding and errors such asVSlotRef have invalid slot id.The fix handles the problem at the source: when
PushDownFilterThroughSetOperationpromotes a constant child into a regular child, it wraps the child withLogicalProjectand freshAliasoutputs so the promoted child gets new ExprIds while preserving names and output order.Reproducing query:
Release note
Fixed a bug where set-operation queries could fail because filter push-down promoted a constant child into a regular child while reusing the set operation output ExprIds.
Check List (For Author)