[feat](search) Support field-grouped query syntax field:(term1 OR term2)#60786
[feat](search) Support field-grouped query syntax field:(term1 OR term2)#60786airborne12 merged 6 commits intoapache:masterfrom
Conversation
Cherry-pick missing test cases from selectdb/selectdb-core#7369. All code logic was already in master; only these two test methods covering the lucene mode + best_fields combination were absent. - testMultiFieldExplicitFieldNotExpanded: verifies that explicit field:term syntax (e.g., title:music) is not expanded across fields in lucene+best_fields mode, matching ES query_string behavior. - testMultiFieldMixedExplicitAndBareTerms: verifies that explicit field prefix is pinned while bare terms are expanded across fields in lucene+best_fields mode.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
Add support for ES query_string field-grouped syntax where all terms
inside parentheses inherit the field prefix, e.g.:
title:(rock OR jazz) → (title:rock OR title:jazz)
title:(rock jazz) → (+title:rock +title:jazz) [with AND operator]
Previously this syntax caused a parse error because the grammar only
allowed leaf values after the colon in fieldQuery.
Changes:
- SearchParser.g4: add fieldGroupQuery rule (fieldPath COLON LPAREN clause RPAREN)
and add it as alternative in atomClause before fieldQuery
- SearchDslParser.java:
- Add markExplicitFieldRecursive() helper to mark all leaf nodes in a
group as explicit (prevents multi-field expander from re-expanding them)
- Modify visitBareQuery() in both QsAstBuilder and QsLuceneModeAstBuilder
to use currentFieldName as field group context when set
- Add visitFieldGroupQuery() to both AST builders: sets field context,
visits inner clause, then marks all leaves as explicit
- Update visitAtomClause() and collectTermsFromNotClause() in both
builders to handle the new fieldGroupQuery alternative
- SearchDslParserTest.java: add 10 unit tests covering simple OR/AND,
phrase inside group, wildcard+regexp, mixed with bare query, multi-field
explicit preservation, lucene mode, and subcolumn dot-notation paths
- regression-test: add test_search_field_group_query.groovy with end-to-end
tests against a running cluster
|
run buildall |
TPC-H: Total hot run time: 28825 ms |
TPC-DS: Total hot run time: 183787 ms |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
…ssing brace, exception types - Fix markExplicitFieldRecursive overriding inner explicit field bindings (e.g., title:(content:foo OR bar) now correctly keeps content:foo) - Add recursion depth limit (MAX_FIELD_GROUP_DEPTH=32) to prevent StackOverflow - Replace RuntimeException with SearchDslSyntaxException for consistency - Fix missing closing brace in testFieldGroupQuerySubcolumnPath (merge artifact) - Add test cases for inner explicit field preservation and NOT operator in groups
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
…ent CI flakiness When CI fuzzy testing sets default_variant_enable_doc_mode=true, variant subcolumns are stored in document mode, causing inverted index iterators to be unavailable in BE (VSearchExpr: No indexed columns available). This results in empty query results for search() on variant subcolumns. Fix: explicitly set default_variant_enable_doc_mode=false in both variant search tests, following the pattern from variant_p0/predefine/ tests.
451344c to
f073270
Compare
|
run buildall |
TPC-H: Total hot run time: 29027 ms |
TPC-DS: Total hot run time: 183853 ms |
FE Regression Coverage ReportIncrement line coverage |
…n tests CI fuzzy testing randomly sets enable_common_expr_pushdown=false, which prevents search() expressions from being pushed to the inverted index evaluation path, causing "SearchExpr should not be executed without inverted index" errors. Pin the variable to true at session level in all 15 search test files that were missing this protection.
|
run buildall |
TPC-H: Total hot run time: 28745 ms |
TPC-DS: Total hot run time: 183470 ms |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
…m2) (apache#60786) ### What problem does this PR solve? Issue Number: close #N/A Problem Summary: The `search()` function did not support ES `query_string` field-grouped syntax where all terms inside parentheses inherit the field prefix: ```sql -- Previously failed with syntax error SELECT * FROM t WHERE search('title:(rock OR jazz)', '{"fields":["title","content"]}'); ``` ES semantics: | Input | Expansion | |-------|-----------| | `title:(rock OR jazz)` | `(title:rock OR title:jazz)` | | `title:(rock jazz)` with `default_operator:AND` | `(+title:rock +title:jazz)` | | `title:(rock OR jazz) AND music` with `fields:[title,content]` | `(title:rock OR title:jazz) AND (title:music OR content:music)` | | `title:("rock and roll" OR jazz)` | `(title:"rock and roll" OR title:jazz)` | ### Root cause The ANTLR grammar `SearchParser.g4` defined `fieldQuery : fieldPath COLON searchValue` where `searchValue` only accepts leaf values (TERM, QUOTED, etc.), not a parenthesized sub-clause. So `title:(` caused a syntax error. ### Solution **Grammar** (`SearchParser.g4`): - Add `fieldGroupQuery : fieldPath COLON LPAREN clause RPAREN` rule - Add it as alternative in `atomClause` before `fieldQuery` **Visitor** (`SearchDslParser.java`): - Add `markExplicitFieldRecursive()` helper — marks all leaf nodes in a group as `explicitField=true` to prevent `MultiFieldExpander` from re-expanding them across unintended fields - Modify `visitBareQuery()` in both `QsAstBuilder` and `QsLuceneModeAstBuilder` to use `currentFieldName` as field group context when set - Add `visitFieldGroupQuery()` to both AST builders: sets field context, visits inner clause, marks all leaves explicit - Update `visitAtomClause()` and `collectTermsFromNotClause()` to handle
…o branch-4.0 Squashed backport of the following master PRs: - apache#59747 [fix](search) Make AND/OR/NOT operators case-sensitive in search DSL - apache#60654 [refactor](search) Refactor SearchDslParser to single-phase ANTLR parsing and fix ES compatibility issues - apache#60782 [fix](search) Upgrade query type for variant subcolumns with analyzer-based indexes - apache#60784 [fix](search) Fix MATCH_ALL_DOCS query failing in multi-field search mode - apache#60786 [feat](search) Support field-grouped query syntax field:(term1 OR term2) - apache#60790 [fix](search) Add searcher cache reuse and DSL result cache for search() function - apache#60793 [fix](search) Fix wildcard query on variant subcolumns returning empty results - apache#60798 [fix](search) Use FE-provided analyzer key for multi-index columns in search() - apache#60814 [fix](search) Fix implicit conjunction incorrectly modifying preceding term in lucene mode - apache#60834 [test](search) Add regression test for wildcard query on variant subcolumns with multi-index - apache#60873 [fix](search) fix MATCH_ALL_DOCS losing occur attribute in multi-field expansion - apache#60891 [fix](search) inject MATCH_ALL_DOCS for multi-MUST_NOT queries in lucene mode
…o branch-4.0 Squashed backport of the following master PRs: - apache#59747 [fix](search) Make AND/OR/NOT operators case-sensitive in search DSL - apache#60654 [refactor](search) Refactor SearchDslParser to single-phase ANTLR parsing and fix ES compatibility issues - apache#60782 [fix](search) Upgrade query type for variant subcolumns with analyzer-based indexes - apache#60784 [fix](search) Fix MATCH_ALL_DOCS query failing in multi-field search mode - apache#60786 [feat](search) Support field-grouped query syntax field:(term1 OR term2) - apache#60790 [fix](search) Add searcher cache reuse and DSL result cache for search() function - apache#60793 [fix](search) Fix wildcard query on variant subcolumns returning empty results - apache#60798 [fix](search) Use FE-provided analyzer key for multi-index columns in search() - apache#60814 [fix](search) Fix implicit conjunction incorrectly modifying preceding term in lucene mode - apache#60834 [test](search) Add regression test for wildcard query on variant subcolumns with multi-index - apache#60873 [fix](search) fix MATCH_ALL_DOCS losing occur attribute in multi-field expansion - apache#60891 [fix](search) inject MATCH_ALL_DOCS for multi-MUST_NOT queries in lucene mode
… bug fixes (#61028) ### What problem does this PR solve? Squashed backport of all search() function improvements and bug fixes from master to branch-4.0. This PR combines the following master PRs into a single backport: | Master PR | Type | Description | |-----------|------|-------------| | #59747 | fix | Make AND/OR/NOT operators case-sensitive in search DSL | | #60654 | refactor | Refactor SearchDslParser to single-phase ANTLR parsing and fix ES compatibility issues | | #60782 | fix | Upgrade query type for variant subcolumns with analyzer-based indexes | | #60784 | fix | Fix MATCH_ALL_DOCS query failing in multi-field search mode | | #60786 | feat | Support field-grouped query syntax field:(term1 OR term2) | | #60790 | fix | Add searcher cache reuse and DSL result cache for search() function | | #60793 | fix | Fix wildcard query on variant subcolumns returning empty results | | #60798 | fix | Use FE-provided analyzer key for multi-index columns in search() | | #60814 | fix | Fix implicit conjunction incorrectly modifying preceding term in lucene mode | | #60834 | test | Add regression test for wildcard query on variant subcolumns with multi-index | | #60873 | fix | fix MATCH_ALL_DOCS losing occur attribute in multi-field expansion | | #60891 | fix | inject MATCH_ALL_DOCS for multi-MUST_NOT queries in lucene mode | ### Release note Backport search() function improvements including DSL parser refactoring, multi-field search fixes, variant subcolumn support, query caching, and field-grouped query syntax. ### Check List (For Author) - Test - [x] Regression test - [x] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [x] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason - Behavior changed: - [ ] No. - [x] Yes. New search() function features and bug fixes backported from master. - Does this need documentation? - [x] No. - [ ] Yes. ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label
What problem does this PR solve?
Issue Number: close #N/A
Problem Summary:
The
search()function did not support ESquery_stringfield-grouped syntax where all terms inside parentheses inherit the field prefix:ES semantics:
title:(rock OR jazz)(title:rock OR title:jazz)title:(rock jazz)withdefault_operator:AND(+title:rock +title:jazz)title:(rock OR jazz) AND musicwithfields:[title,content](title:rock OR title:jazz) AND (title:music OR content:music)title:("rock and roll" OR jazz)(title:"rock and roll" OR title:jazz)Root cause
The ANTLR grammar
SearchParser.g4definedfieldQuery : fieldPath COLON searchValuewheresearchValueonly accepts leaf values (TERM, QUOTED, etc.), not a parenthesized sub-clause. Sotitle:(caused a syntax error.Solution
Grammar (
SearchParser.g4):fieldGroupQuery : fieldPath COLON LPAREN clause RPARENruleatomClausebeforefieldQueryVisitor (
SearchDslParser.java):markExplicitFieldRecursive()helper — marks all leaf nodes in a group asexplicitField=trueto preventMultiFieldExpanderfrom re-expanding them across unintended fieldsvisitBareQuery()in bothQsAstBuilderandQsLuceneModeAstBuilderto usecurrentFieldNameas field group context when setvisitFieldGroupQuery()to both AST builders: sets field context, visits inner clause, marks all leaves explicitvisitAtomClause()andcollectTermsFromNotClause()to handle the new ruleRelease note
Support ES
query_stringfield-grouped syntax insearch()function:field:(term1 OR term2)now correctly expands to(field:term1 OR field:term2), matching Elasticsearch behavior. Supports standard mode, lucene mode, multi-field mode, and all value types (terms, phrases, wildcards, regexps).Check List (For Author)
Test
test_search_field_group_query.groovy, 23/23 search suites pass)SearchDslParserTest.java, 132/132 tests pass, +10 new tests)Behavior changed:
title:(rock OR jazz)previously threw a syntax error; now it is parsed as(title:rock OR title:jazz).Does this need documentation?
Check List (For Reviewer who merge this PR)