[fix](fe) Fix VSlotRef invalid slot id in INTERSECT/EXCEPT with ExprId reuse by yujun777 · Pull Request #62296 · apache/doris

yujun777 · 2026-04-09T12:43:07Z

What problem does this PR solve?

Issue Number:

Problem Summary:

Introduced by #56366.

The real bug is not in the physical translator. It starts earlier in PushDownFilterThroughSetOperation when a filter is pushed through a set operation that still contains constant children.

For UNION/INTERSECT/EXCEPT, constant children intentionally reuse the set operation output ExprIds. When filter push-down cannot fold one of those constant children away, the rewrite promotes that constant expression into a regular child. Before this fix, the promoted child kept the same ExprIds as the set operation output, so regularChildrenOutputs and the set operation output collided on ExprId. That duplicate ExprId state later surfaced as wrong slot binding and errors such as VSlotRef have invalid slot id.

The fix handles the problem at the source: when PushDownFilterThroughSetOperation promotes a constant child into a regular child, it wraps the child with LogicalProject and fresh Alias outputs so the promoted child gets new ExprIds while preserving names and output order.

Reproducing query:

WITH
  tbl0(`n`) AS (SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT NULL),
  tbl1(`n`) AS (SELECT 2 UNION ALL SELECT NULL UNION ALL SELECT 1)
(
  SELECT (`n` * 2) AS `n` FROM tbl0
  INTERSECT
  SELECT (`n` * 2) AS `n` FROM tbl1
);

Release note

Fixed a bug where set-operation queries could fail because filter push-down promoted a constant child into a regular child while reusing the set operation output ExprIds.

Check List (For Author)

Test: Unit test (PushDownFilterThroughSetOperationTest) / Regression test (set_operation_exprid_reuse, forceGenOut)
Behavior changed: No
Does this need documentation: No

Thearas · 2026-04-09T12:43:14Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

yujun777 · 2026-04-09T12:43:14Z

run buildall

yujun777 · 2026-04-09T12:43:16Z

/review

github-actions

Findings

High - The fix is incomplete because the same ExprId->SlotRef overwrite pattern still exists in the functionally parallel recursive CTE translator path (visitPhysicalRecursiveUnion()), so recursive unions can still bind child result expressions to the wrong slots when ExprIds are reused.

Critical checkpoints

Goal of current task: Partially achieved. visitPhysicalSetOperation() is fixed and covered by regression tests, but the same translator invariant is still violated in visitPhysicalRecursiveUnion().
Modification size/focus: Yes, the patch is small and focused.
Concurrency: Not involved.
Special lifecycle/static init: No special lifecycle concerns in the changed code.
Configuration: None added.
Compatibility: No FE/BE protocol or storage compatibility change.
Parallel code paths: Not fully covered. visitPhysicalRecursiveUnion() is a parallel path that still uses the old translation order.
Special conditional checks: The new ordering comment is clear; no extra conditional-risk issue found.
Test coverage: Added good regression coverage for non-recursive set operations, but there is no recursive CTE regression for the same ExprId-reuse invariant.
Observability: No additional observability needed for this translator-only fix.
Transaction/persistence: Not involved.
Data writes/modifications: Not involved.
FE/BE variable passing: Not involved.
Performance: Acceptable; the change is still linear and does not add meaningful overhead.
Other issues: None beyond the missed recursive CTE path.

github-actions · 2026-04-09T13:05:10Z

fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/PhysicalPlanTranslator.java

    @Override
    public PlanFragment visitPhysicalSetOperation(
            PhysicalSetOperation setOperation, PlanTranslatorContext context) {
+        // Translate each child's result exprs and distribute exprs immediately after


High: this only fixes visitPhysicalSetOperation(), but visitPhysicalRecursiveUnion() still follows the old order: visit both children, call generateTupleDesc(recursiveCte.getOutput(), ...), then translate recursiveCte.getRegularChildrenOutputs() (see lines 2302-2328 in the current file). That path uses the same global PlanTranslatorContext.exprIdToSlotRef map, so later child visits and the parent tuple creation still overwrite earlier mappings. Because LogicalRecursiveUnion.buildNewOutputs() can preserve an existing output ExprId, a recursive CTE with reused ExprIds can still translate to the wrong SlotRef and hit the same invalid-slot failure. Please apply the same interleaving strategy here and add a regression under recursive_cte.

hello-stephen · 2026-04-09T14:18:40Z

FE UT Coverage Report

Increment line coverage 100.00% (12/12) 🎉
Increment coverage report
Complete coverage report

yujun777 · 2026-04-09T14:48:39Z

run buildall

yujun777 · 2026-04-09T14:48:41Z

/review

yujun777 · 2026-04-09T14:50:24Z

Re: review comment on visitPhysicalRecursiveUnion() — Applied the same interleaving fix. Result exprs and distribute exprs are now translated immediately after each child visit, before generateTupleDesc() can overwrite the ExprId→SlotRef mappings. Verified with recursive CTE tests.

github-actions

One issue found.

regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.out encodes the wrong expected result for intersect_nulls. Doris already treats NULL as equal for set operations: the existing query_p0/set_operations/set_with_null suite expects INTERSECT ... SELECT NULL to return \N, not to drop it. The new expected output only contains 1, so this regression test would bless a semantic regression instead of protecting against it.

Critical checkpoint conclusions:

Goal / correctness / test proof: The goal is to fix ExprId -> SlotRef overwrite during set-operation translation. The code change in PhysicalPlanTranslator.visitPhysicalSetOperation() does address that overwrite hazard by translating each child immediately after visiting it, and the added suite exercises the reported failing shapes. However, one expected result is incorrect, so the current tests do not yet prove the fix safely.
Scope / minimality: The code change is small and focused. Extending the same interleaving pattern to visitPhysicalRecursiveUnion() is also consistent with the same mutable-map mechanism.
Concurrency: No new concurrency or lock-safety concerns in the touched code path.
Lifecycle / static initialization: No special lifecycle or static-init concerns introduced here.
Config changes: None.
Compatibility: No FE/BE protocol or storage-format compatibility changes detected.
Parallel code paths: The analogous recursive CTE path was updated too, which is good. I did not find another remaining set-operation translation path with the same overwrite pattern in this file.
Special condition checks: The new comments explain the ExprId reuse hazard clearly enough.
Test coverage: Coverage improved for INTERSECT / EXCEPT / UNION DISTINCT and nested shapes, but the intersect_nulls oracle is inconsistent with existing Doris semantics and should be corrected. No recursive CTE regression was added for the newly touched parallel path.
Observability: No additional observability seems necessary for this FE translation fix.
Transaction / persistence: Not applicable.
Data writes / modifications: Not applicable beyond regression-test DDL/DML setup.
FE/BE variable passing: No new variables added.
Performance: The interleaving avoids extra retranslation hazards without adding meaningful overhead.
Other issues: None beyond the incorrect expected result above.

regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.out

hello-stephen · 2026-04-09T16:15:34Z

FE UT Coverage Report

Increment line coverage 100.00% (25/25) 🎉
Increment coverage report
Complete coverage report

morrySnow

should not has same exprid for different slot reference in a plan

yujun777 · 2026-04-11T15:37:22Z

run buildall

yujun777 · 2026-04-11T15:37:23Z

/review

yujun777 · 2026-04-11T15:41:41Z

should not has same exprid for different slot reference in a plan

PushDownFilterThroughSetOperation introduce duplicate exprid, fix it

github-actions

Findings:

High - regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.out records the wrong expected result for the new INTERSECT NULL case. Doris' existing set_with_null regression suite treats NULL as equal in set semantics, so order_qt_intersect_nulls should return both \N and 1, not just 1. As written, this new regression would bless a semantic regression instead of proving the ExprId fix.

Critical checkpoint conclusions:

Goal of the task: Partially accomplished. The code change is aimed at removing ExprId collisions for promoted UNION constant rows, and the FE unit test covers that helper behavior. However, the new end-to-end regression currently has an incorrect golden result, so it does not reliably prove the user-visible fix.
Modification size/focus: Yes. The Java change is small and focused on PushDownFilterThroughSetOperation.
Concurrency: Not involved in this diff.
Special lifecycle/static init: Not involved.
Configuration changes: None.
Compatibility/incompatible changes: None observed.
Functionally parallel paths: No additional unaddressed path found in the current diff beyond the added regression issue.
Special conditional checks: The new project-wrapping branch is understandable and locally justified to generate fresh ExprIds.
Test coverage: FE unit coverage is good for the rewrite helper, but regression coverage is not yet trustworthy because of the wrong NULL expectation.
Observability: No additional observability needed for this FE rewrite fix.
Transaction/persistence: Not involved.
Data writes/modifications: Not involved.
FE-BE variable passing: Not involved.
Performance: No obvious performance regression; the new project wrapper is only on the promoted-constant rewrite path.
Other issues: The incorrect regression output above is the remaining blocker I found.

I did not run FE or regression tests locally in this runner; conclusions are based on code inspection and existing Doris regression semantics.

regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.out

…d in UNION ### What problem does this PR solve? Issue Number: close #CIR-19889 Related PR: apache#56366 Problem Summary: PR apache#56366 (master, commit e27ceb3, 2025-09-28) introduced a bug in PushDownFilterThroughSetOperation.addFiltersToNewChildren(). When a UNION constant expression cannot be statically eliminated by a filter (e.g., NULL IN (4,2) evaluates to UNKNOWN), it is promoted from constantExprsList to a regular child via LogicalOneRowRelation + LogicalFilter. The else branch directly used eliminateFilter.getOutput() as regularChildrenOutputs, inheriting the same ExprId as the UNION output slot. This ExprId collision is then manifested in PhysicalPlanTranslator .visitPhysicalSetOperation(): generateTupleDesc(output) registers ExprId->SlotRef for the UNION output, which overwrites the child's earlier ExprId->SlotRef mapping (both share the same ExprId). The child result expression then resolves to the wrong slot, causing BE crash: "VSlotRef have invalid slot id". The fix wraps the promoted constant child in a LogicalProject with fresh ExprIds (via new Alias(ne.toSlot(), ne.getName()) which auto-generates a fresh ExprId), so that regularChildrenOutputs never share ExprIds with the UNION output. Also removed dead code: a Builder<List<SlotReference>> newChildrenOutput that was constructed but never used. ### Release note Fix INTERSECT/EXCEPT/UNION DISTINCT query crash with "VSlotRef have invalid slot id" when a UNION constant row cannot be statically eliminated during filter push-down. ### Check List (For Author) - Test: Regression test (query_p0/set_operations/set_operation_exprid_reuse.groovy) + FE unit test (PushDownFilterThroughSetOperationTest, added testConstantPromotionGeneratesFreshExprIds) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

yujun777 · 2026-04-13T07:44:29Z

run buildall

yujun777 · 2026-04-13T07:44:30Z

/review

github-actions · 2026-04-13T07:49:53Z

No issues found in the reviewed changes.

Critical checkpoints:

Goal and correctness: The patch addresses the reported ExprId collision in by assigning fresh ExprIds when a constant child is promoted to a regular child. The downstream flow now receives distinct child/output slot mappings, so the reported failure path is covered.
Scope and minimality: The code change is narrowly scoped to the constant-promotion branch and removes one dead local builder. I did not find unrelated behavioral changes.
Concurrency: Not applicable. The modified code is planner rewrite logic with no new shared-state or lock interactions.
Lifecycle / initialization: Not applicable. No special lifecycle or static initialization changes.
Configuration: None added.
Compatibility: No incompatible FE/BE protocol or storage-format change. The fix stays within existing logical-plan semantics.
Parallel code paths: I checked the related set-operation rewrite/translation flow and did not find another sibling promotion path in this area that still reuses the colliding ExprId mapping.
Conditional checks: The new conditional branch is justified and documented inline; it matches the specific constant-to-regular-child promotion case.
Test coverage: Adequate for this fix. The FE unit tests cover fresh-ExprId generation on promotion, and the regression case exercises INTERSECT/EXCEPT/UNION DISTINCT queries including the reported reproducer.
Observability: No additional observability appears necessary for this planner-only fix.
Transaction / persistence: Not applicable.
Data write / atomicity: Not applicable.
FE/BE variable passing: No new variables added.
Performance: The added is only introduced on the rare promoted-constant path and is an appropriate, minimal cost for preserving ExprId correctness.
Other issues: None identified during review.

Residual risk:

I did not run FE unit tests or regression tests in this runner, so the conclusion is based on static review of the changed paths and tests.

hello-stephen · 2026-04-13T09:19:52Z

FE UT Coverage Report

Increment line coverage 100.00% (8/8) 🎉
Increment coverage report
Complete coverage report

Refresh the regression output for set_operation_exprid_reuse after the set operation ExprId reuse fix changed NULL handling in the result. Key changes: - regenerate regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.out - update expected rows for intersect_cte_expr, except_cte_expr, and intersect_nulls Unit Test: - run-regression-test.sh --run --conf regression-test/conf/regression-conf-custom.groovy -g p0 -d query_p0/set_operations -s set_operation_exprid_reuse -forceGenOut Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

yujun777 · 2026-04-13T10:35:31Z

run buildall

hello-stephen · 2026-04-13T11:57:25Z

FE UT Coverage Report

Increment line coverage 100.00% (8/8) 🎉
Increment coverage report
Complete coverage report

github-actions bot reviewed Apr 9, 2026

View reviewed changes

yujun777 force-pushed the fix/intersect-exprid-reuse branch from bf4213e to 8aa64f0 Compare April 9, 2026 14:48

github-actions bot reviewed Apr 9, 2026

View reviewed changes

regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.out Show resolved Hide resolved

morrySnow requested changes Apr 10, 2026

View reviewed changes

yujun777 force-pushed the fix/intersect-exprid-reuse branch from 8aa64f0 to 843dadf Compare April 11, 2026 15:35

github-actions bot reviewed Apr 11, 2026

View reviewed changes

regression-test/data/query_p0/set_operations/set_operation_exprid_reuse.out Show resolved Hide resolved

yujun777 force-pushed the fix/intersect-exprid-reuse branch from 843dadf to cac8ff4 Compare April 13, 2026 07:32

Conversation

yujun777 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Uh oh!

Thearas commented Apr 9, 2026

Uh oh!

yujun777 commented Apr 9, 2026

Uh oh!

yujun777 commented Apr 9, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

hello-stephen commented Apr 9, 2026

FE UT Coverage Report

Uh oh!

yujun777 commented Apr 9, 2026

Uh oh!

yujun777 commented Apr 9, 2026

Uh oh!

yujun777 commented Apr 9, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hello-stephen commented Apr 9, 2026

FE UT Coverage Report

Uh oh!

morrySnow left a comment

Choose a reason for hiding this comment

Uh oh!

yujun777 commented Apr 11, 2026

Uh oh!

yujun777 commented Apr 11, 2026

Uh oh!

yujun777 commented Apr 11, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yujun777 commented Apr 13, 2026

Uh oh!

yujun777 commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

hello-stephen commented Apr 13, 2026

FE UT Coverage Report

Uh oh!

yujun777 commented Apr 13, 2026

Uh oh!

hello-stephen commented Apr 13, 2026

FE UT Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yujun777 commented Apr 9, 2026 •

edited

Loading