Route analytics queries by index setting by bowenlan-amzn · Pull Request #5432 · opensearch-project/sql

bowenlan-amzn · 2026-05-11T17:21:00Z

Summary

Route queries to the analytics engine based on index settings instead of table-name prefix convention
Check both index.pluggable.dataformat.enabled=true AND index.pluggable.dataformat=composite before routing to the analytics path
If pluggable dataformat is enabled but the value is not composite (e.g. lucene), fall through to the standard Calcite→DSL path
Add pluginSettings to RestUnifiedQueryAction to support cluster-level setting overrides (rex max_match limit)

Test plan

Unit tests for routing logic (RestUnifiedQueryActionTest)
- Composite format routes to analytics
- Lucene format routes to old path
- Missing index routes to old path
- Null/empty queries route to old path

github-actions · 2026-05-11T17:22:29Z

PR Reviewer Guide 🔍

(Review updated until commit `28ac22a`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Misleading test name The test `pluggableDataformatIndexRoutesToAnalytics` registers an index named `parquet_logs` but now sets the dataformat to `composite`. This creates confusion between the index name and its actual format, potentially misleading future maintainers about what the test validates. public void pluggableDataformatIndexRoutesToAnalytics() { registerIndex( "parquet_logs", Settings.builder() .put(IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.getKey(), true) .put(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.getKey(), "composite") .build());

github-actions · 2026-05-11T17:22:50Z

PR Code Suggestions ✨

Latest suggestions up to 28ac22a
Explore these optional code suggestions:

Category	Suggestion	Impact
General	Update test index name for clarity The test index name `parquet_logs` is misleading since the dataformat value has been changed from "parquet" to "composite". Consider renaming the index to reflect the actual dataformat being tested, such as `composite_logs`, to maintain consistency and avoid confusion. plugin/src/test/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryActionTest.java [56-61] registerIndex( - "parquet_logs", + "composite_logs", Settings.builder() .put(IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.getKey(), true) .put(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.getKey(), "composite") .build()); Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that the test index name `parquet_logs` is now misleading after changing the dataformat value from "parquet" to "composite". Renaming to `composite_logs` would improve code clarity and maintainability, though this is not a critical functional issue.	Medium

Previous suggestions

Suggestions up to commit d504f77

Category	Suggestion	Impact
General	Replace hardcoded string with constant The hardcoded string `"composite"` should be replaced with a constant to prevent typos and improve maintainability. Consider defining a constant like `COMPOSITE_FORMAT` or using an existing constant if available in `IndexSettings`. plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [106-114] +private static final String COMPOSITE_FORMAT = "composite"; + private boolean isPluggableDataformatIndex(String indexName) { var indexMetadata = clusterService.state().metadata().index(indexName); if (indexMetadata == null) { return false; } var settings = indexMetadata.getSettings(); return IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.get(settings) - && "composite".equals(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.get(settings)); + && COMPOSITE_FORMAT.equals(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.get(settings)); } Suggestion importance[1-10]: 5 __ Why: While replacing the hardcoded `"composite"` string with a constant improves maintainability and reduces typo risk, this is a minor code quality improvement. The suggestion is valid and follows best practices, but has moderate impact.	Low

Today `RestUnifiedQueryAction.isAnalyticsIndex` dispatches to the analytics engine when the source index name starts with `parquet_`. That's brittle — it conflates naming convention with storage type. An index created without the prefix but with pluggable dataformat enabled is silently sent to the Lucene path; an index named `parquet_foo` without the setting is mis-dispatched to analytics. Use the authoritative signal instead: the `index.pluggable.dataformat.enabled` setting on cluster-state metadata. This is the same setting integration tests (`CoordinatorReduceIT`, `CompositeCommitDeletionIT`, etc.) already use to create analytics-backed indices, and it's what `FieldStorageResolver` reads to decide field-level storage. Behavior: - `index.pluggable.dataformat.enabled=true` → analytics engine (DataFusion) - flag absent / false / index missing → Calcite→OpenSearch DSL path Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

github-actions · 2026-05-11T17:28:11Z

Persistent review updated to latest commit 28ac22a

…search-project#5432) Today `RestUnifiedQueryAction.isAnalyticsIndex` dispatches to the analytics engine when the source index name starts with `parquet_`. That's brittle — it conflates naming convention with storage type. An index created without the prefix but with pluggable dataformat enabled is silently sent to the Lucene path; an index named `parquet_foo` without the setting is mis-dispatched to analytics. Use the authoritative signal instead: the `index.pluggable.dataformat.enabled` setting on cluster-state metadata. This is the same setting integration tests (`CoordinatorReduceIT`, `CompositeCommitDeletionIT`, etc.) already use to create analytics-backed indices, and it's what `FieldStorageResolver` reads to decide field-level storage. Behavior: - `index.pluggable.dataformat.enabled=true` → analytics engine (DataFusion) - flag absent / false / index missing → Calcite→OpenSearch DSL path Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>

@ignore

Single squashed delivery of the long-running feature/mustang-ppl-integration branch into main, consolidating 22 feature-branch PRs plus the conflict-resolved merge of current main. Squashed because the feature branch's history includes commits with missing or mismatched Signed-off-by trailers that block DCO at this scope — the equivalent issue documented for the catch-up squashes (opensearch-project#5397). The feature branch f006e29 is retained for individual-commit lineage. ### What this delivers Analytics-engine PPL integration — a new execution path that routes Parquet-backed (non-Lucene) indices through an analytics engine while keeping Lucene-backed indices on the existing v2 / Calcite paths. Headline pieces: - Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices hand off to the analytics-engine execution path; Lucene-backed indices continue through the legacy path - Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path - Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the unified parser and wires profiling metrics through the analytics path - extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an OpenSearch extension via SPI - SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork applied to the SQL transport, plus delegateToV2Engine extraction in RestSqlAction - Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine plans + version bump to OpenSearch 3.7 - Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional runtime dep so the SQL bundle is shippable without it - Index-setting-based routing (opensearch-project#5429, opensearch-project#5432) — replaces the earlier table-name-prefix heuristic with an authoritative index-setting check Supporting infrastructure: - Gradle wrapper bump to 9.4.1 (opensearch-project#5406) - Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 / httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409) - IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417); CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416) - plugins.calcite.enabled=true defaulted on the unified query path (opensearch-project#5413) - PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418) - Calcite tolerance fixes: array() default type (opensearch-project#5421), containsNestedAggregator flat-leaf schemas (opensearch-project#5423) - Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426) ### Feature-branch commits squashed (22) opensearch-project#5432, opensearch-project#5429, opensearch-project#5426, opensearch-project#5423, opensearch-project#5421, opensearch-project#5418, opensearch-project#5403, opensearch-project#5417, opensearch-project#5415, opensearch-project#5416, opensearch-project#5413, opensearch-project#5407, opensearch-project#5409, opensearch-project#5406, opensearch-project#5400, opensearch-project#5396, opensearch-project#5317, opensearch-project#5302, opensearch-project#5285, opensearch-project#5275, opensearch-project#5267, opensearch-project#5397, opensearch-project#5286 ### Main commits absorbed via the merge (54) Brings the branch up to current upstream/main (54 commits since the last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419, opensearch-project#5408, opensearch-project#5414, opensearch-project#5399, opensearch-project#5394, opensearch-project#5361, opensearch-project#5360, opensearch-project#5240, opensearch-project#5266, opensearch-project#5278, plus 44 others (bugfixes, doc updates, infra). ### Conflict resolutions (7) Resolved during the merge of main into the feature branch. Resolution kept the feature branch's analytics-engine-path semantics where main's changes would have regressed them. - api/.../UnifiedQueryContext.java Blank-line-only conflict; took main's tighter formatting. - core/.../executor/QueryService.java Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...) wrapping (required for analytics-engine classloader isolation) and the matching import. - integ-test/build.gradle Kept feature's detailed root-cause comment on the Gradle 9.4.1 TestEventReporterAsListener workaround; kept ASCII ordering of JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT entries. - integ-test/.../CalciteEvalCommandIT.java Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards on test_eval and test_eval_agent setup (needed for the helper-managed index analytics-engine compatibility run). - legacy/.../RestSqlAction.java Kept feature's delegateToV2Engine(...) (extracted from the analytics-engine routing path). Both sides added handleException / getRestStatus / getRawErrorCode; removed the duplicate set git produced. - plugin/.../SQLPlugin.java Took the union of imports: ExecutionEngine + ExecutionEngine.ExplainResponse + QueryType. - plugin/.../transport/TransportPPLQueryAction.java Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and feature's local pluginSettings / pluginSettingsRef wiring. EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved as-is. ### Compatibility / opt-in The analytics-engine path is gated by the extendedPlugins extension being installed (opensearch-project#5403 makes the dep optional). Clusters without analytics-engine installed see no behavior change. Clusters with analytics-engine installed route only Parquet-backed indices through the new path (opensearch-project#5429 — by index setting). ### Verification - ./gradlew :api:compileJava :core:compileJava :legacy:compileJava :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes locally Signed-off-by: Kai Huang <ahkcs@amazon.com> Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>

@ignore

* Land analytics-engine PPL integration into main Single squashed delivery of the long-running feature/mustang-ppl-integration branch into main, consolidating 22 feature-branch PRs plus the conflict-resolved merge of current main. Squashed because the feature branch's history includes commits with missing or mismatched Signed-off-by trailers that block DCO at this scope — the equivalent issue documented for the catch-up squashes (#5397). The feature branch f006e29 is retained for individual-commit lineage. ### What this delivers Analytics-engine PPL integration — a new execution path that routes Parquet-backed (non-Lucene) indices through an analytics engine while keeping Lucene-backed indices on the existing v2 / Calcite paths. Headline pieces: - Query routing (#5267) — PPL queries against Parquet-backed indices hand off to the analytics-engine execution path; Lucene-backed indices continue through the legacy path - Explain support (#5275) — EXPLAIN covers the analytics-engine path - Profiling + UnifiedQueryParser (#5285) — migrates PPL parsing to the unified parser and wires profiling metrics through the analytics path - extendedPlugins wiring (#5302) — analytics-engine attaches as an OpenSearch extension via SPI - SQL REST endpoint integration (#5317) — same analytics-route fork applied to the SQL transport, plus delegateToV2Engine extraction in RestSqlAction - Async QueryPlanExecutor (#5396) — async execution for analytics-engine plans + version bump to OpenSearch 3.7 - Optional dependency (#5403) — analytics-engine becomes an optional runtime dep so the SQL bundle is shippable without it - Index-setting-based routing (#5429, #5432) — replaces the earlier table-name-prefix heuristic with an authoritative index-setting check Supporting infrastructure: - Gradle wrapper bump to 9.4.1 (#5406) - Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 / httpcore5-reactive / httpclient5 (#5400, #5409) - IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT carried through the helper-managed index path (#5407, #5417); CalciteReplaceCommandIT column-order-agnostic (#5415); @ignore'd Calcite ITs dropped from CalciteNoPushdownIT (#5416) - plugins.calcite.enabled=true defaulted on the unified query path (#5413) - PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (#5418) - Calcite tolerance fixes: array() default type (#5421), containsNestedAggregator flat-leaf schemas (#5423) - Sandbox deps switched to analytics-api JDK 21 surface (#5426) ### Feature-branch commits squashed (22) #5432, #5429, #5426, #5423, #5421, #5418, #5403, #5417, #5415, #5416, #5413, #5407, #5409, #5406, #5400, #5396, #5317, #5302, #5285, #5275, #5267, #5397, #5286 ### Main commits absorbed via the merge (54) Brings the branch up to current upstream/main (54 commits since the last catch-up at #5397, divergence point 513e1b2). Highlights: #5419, #5408, #5414, #5399, #5394, #5361, #5360, #5240, #5266, #5278, plus 44 others (bugfixes, doc updates, infra). ### Conflict resolutions (7) Resolved during the merge of main into the feature branch. Resolution kept the feature branch's analytics-engine-path semantics where main's changes would have regressed them. - api/.../UnifiedQueryContext.java Blank-line-only conflict; took main's tighter formatting. - core/.../executor/QueryService.java Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...) wrapping (required for analytics-engine classloader isolation) and the matching import. - integ-test/build.gradle Kept feature's detailed root-cause comment on the Gradle 9.4.1 TestEventReporterAsListener workaround; kept ASCII ordering of JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT entries. - integ-test/.../CalciteEvalCommandIT.java Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards on test_eval and test_eval_agent setup (needed for the helper-managed index analytics-engine compatibility run). - legacy/.../RestSqlAction.java Kept feature's delegateToV2Engine(...) (extracted from the analytics-engine routing path). Both sides added handleException / getRestStatus / getRawErrorCode; removed the duplicate set git produced. - plugin/.../SQLPlugin.java Took the union of imports: ExecutionEngine + ExecutionEngine.ExplainResponse + QueryType. - plugin/.../transport/TransportPPLQueryAction.java Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and feature's local pluginSettings / pluginSettingsRef wiring. EngineExtensionsHolder.java is a new file from main (#5298) preserved as-is. ### Compatibility / opt-in The analytics-engine path is gated by the extendedPlugins extension being installed (#5403 makes the dep optional). Clusters without analytics-engine installed see no behavior change. Clusters with analytics-engine installed route only Parquet-backed indices through the new path (#5429 — by index setting). ### Verification - ./gradlew :api:compileJava :core:compileJava :legacy:compileJava :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes locally Signed-off-by: Kai Huang <ahkcs@amazon.com> Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com> * Address @penghuo: revert stray blank line in doctest/build.gradle After 'apply plugin: opensearch.testclusters', one blank line is enough — restoring the single-blank spacing to match upstream/main. Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com> Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>

Adds a small, opt-in test infrastructure slice so the PPL integration test suite can run end-to-end against the analytics-engine backend without per-test rewiring. `-Dtests.analytics.parquet_indices=true` makes `TestUtils.createIndexByRestClient` back every test-created index with single-shard composite/parquet storage: index.pluggable.dataformat.enabled = true index.pluggable.dataformat = "composite" index.composite.primary_data_format = "parquet" `RestUnifiedQueryAction.isAnalyticsIndex` (post-opensearch-project#5432) reads these settings and routes any query against such indices to the analytics-engine planner (DataFusion). No additional cluster setting or routing override required — the production routing logic is the single source of truth. Also adds `PPLIntegTestCase.isAnalyticsParquetIndicesEnabled()` as a per-test predicate so individual tests can branch their assertions on engine semantics (DataFusion follows different ordering and null-bucket semantics than the legacy V2 and Calcite-DSL paths). Bulk loads on parquet-backed indices use `refresh=true` because `analytics-backend-lucene`'s `LuceneCommitter.getSafeCommitInfo` is a `TODO:: with index deleter` stub that hangs `refresh=wait_for` until the test framework request timeout (~60s). `integ-test/build.gradle` forwards the property to `:integTestRemote` so the gradle command line is the single knob. Default behavior is unchanged — with the flag unset, every test-created index is Lucene-backed and every IT runs through the existing V2 / Calcite path exactly as before. Signed-off-by: Kai Huang <ahkcs@amazon.com>

@timestamp

…verage (#5436) * Add tests.analytics.parquet_indices test toggle Adds a small, opt-in test infrastructure slice so the PPL integration test suite can run end-to-end against the analytics-engine backend without per-test rewiring. `-Dtests.analytics.parquet_indices=true` makes `TestUtils.createIndexByRestClient` back every test-created index with single-shard composite/parquet storage: index.pluggable.dataformat.enabled = true index.pluggable.dataformat = "composite" index.composite.primary_data_format = "parquet" `RestUnifiedQueryAction.isAnalyticsIndex` (post-#5432) reads these settings and routes any query against such indices to the analytics-engine planner (DataFusion). No additional cluster setting or routing override required — the production routing logic is the single source of truth. Also adds `PPLIntegTestCase.isAnalyticsParquetIndicesEnabled()` as a per-test predicate so individual tests can branch their assertions on engine semantics (DataFusion follows different ordering and null-bucket semantics than the legacy V2 and Calcite-DSL paths). Bulk loads on parquet-backed indices use `refresh=true` because `analytics-backend-lucene`'s `LuceneCommitter.getSafeCommitInfo` is a `TODO:: with index deleter` stub that hangs `refresh=wait_for` until the test framework request timeout (~60s). `integ-test/build.gradle` forwards the property to `:integTestRemote` so the gradle command line is the single knob. Default behavior is unchanged — with the flag unset, every test-created index is Lucene-backed and every IT runs through the existing V2 / Calcite path exactly as before. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Branch 7 stats tests on isAnalyticsParquetIndicesEnabled When -Dtests.analytics.parquet_indices=true is set, every test-created index is parquet-backed and RestUnifiedQueryAction.isAnalyticsIndex routes all queries through the analytics-engine backend (DataFusion). Seven tests need analytics-specific assertions because DataFusion follows different null-bucket, SQL-spec, and TDigest interpolation semantics than the legacy V2 / Calcite-DSL paths: SQL-spec semantics (DataFusion follows the spec; legacy DSL returns 0): - testSumWithNull — SUM of all-null returns null on analytics; the existing isPushdownDisabled branch already handles the Calcite-no- pushdown case. OR analytics into that branch. - testStatsPercentileByNullValue + NonNullBucket — percentile of an all-null / empty group returns null on analytics. Same OR pattern. Non-deterministic head ordering: - testStatsWithLimit Q1+Q2 — head 5 over a 6-bucket result picks a different null-balance row than the legacy / Calcite-DSL path; same effect cascades into `head 2 from 1`. Reuse the size-only assertion branch already present for the no-pushdown case. TDigest interpolation divergence (genuine impl difference): - testStatsPercentileWithNull — DataFusion TDigest interpolates p50 to 35413 vs OpenSearch's 39225; both within compression-bound error. Per-engine expected value. - testStatsPercentileBySpan — same TDigest diff on the age=30 bucket (33194 vs 39225). Coordinated with opensearch-project/OpenSearch#21584 commit 5 which fixes the upstream planner-side type mismatch so the query reaches the data assertion at all. Semantic divergence pending team decision: - testDisableLegacyPreferred — under PPL_SYNTAX_LEGACY_PREFERRED=false, V2 / Calcite-DSL drop the null-age bucket (5 rows) while DataFusion keeps it (6 rows). Skipped on the analytics path via Assume.assumeFalse until the team decides which behaviour is the intended new-syntax semantics. Out of scope for this PR (intentionally left failing on the analytics path so the gap is visible in CI): - testStatsBySpanTimeWithNullBucket — span(@timestamp, 12h) multi-unit time interval unsupported in current SpanAdapter (only interval=1 is rewritten to DATE_TRUNC; multi-unit needs DataFusion's date_bin). Signed-off-by: Kai Huang <ahkcs@amazon.com> * Consolidate analytics parquet config into AnalyticsIndexConfig Per @dai-chen's review feedback — the index-creation settings, the bulk-load refresh strategy, and the system-property gate were spread across three places in TestUtils (the `ANALYTICS_PARQUET_INDICES_PROP` constant, the inline check in `createIndexByRestClient`, the inline check in `loadDataByRestClient`, and the private `makeParquetBacked` helper). That's the same spread-out-conditional pattern the OS-Spark repo's AOSS configs ended up with. Collect everything into a single `TestUtils.AnalyticsIndexConfig` nested helper: * `ENABLED_PROP` — the system property name. * `isEnabled()` — single source for the gate. * `applyIndexCreationSettings(jsonObject)` — injects the parquet-backed composite settings; no-op when disabled. * `bulkLoadRefreshParam()` — returns the right `_bulk` refresh query string for the active index type. `createIndexByRestClient` and `loadDataByRestClient` call these unconditionally — the methods themselves short-circuit when the config is off — so the parquet-specific branching is gone from the helpers and concentrated in one place. `PPLIntegTestCase.isAnalyticsParquetIndicesEnabled()` now delegates to `AnalyticsIndexConfig.isEnabled()`. The legacy `ANALYTICS_PARQUET_INDICES_PROP` constant is kept as a forwarding alias for any external callers. Verified end-to-end against the analytics-engine cluster — same 50 / 63 pass + 1 skip + 12 fail as the pre-refactor run, so the consolidation is behavior-preserving. Signed-off-by: Kai Huang <huangkaics@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> * Revert per-engine assertions on percentile tests Per @dai-chen — since the companion OpenSearch PR (#21584) no longer wires `percentile_approx` on the analytics path (Sandesh's "repurpose for SPAN" scope), all `percentile(...)` queries fail at execution before reaching the assertion. The per-engine branches that this PR introduced for `testStatsPercentile{WithNull,BySpan,ByNullValue, ByNullValueNonNullBucket}` were doing nothing in practice — the tests fail regardless. Reverts those four tests to their pre-PR state. The three remaining intentional changes stand: * `testStatsWithLimit` Q1+Q2 — non-deterministic head ordering on DataFusion hash-bucket order; size-only branch on the analytics path. * `testSumWithNull` — extends the existing `isPushdownDisabled() ? null : 0` pattern. SUM works on the analytics path and DataFusion returns null per SQL spec. * `testDisableLegacyPreferred` — `Assume.assumeFalse` skip for the pending-team-decision null-bucket semantic divergence. Signed-off-by: Kai Huang <huangkaics@gmail.com> Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com> Signed-off-by: Kai Huang <huangkaics@gmail.com>

@ignore

) * Land analytics-engine PPL integration into main Single squashed delivery of the long-running feature/mustang-ppl-integration branch into main, consolidating 22 feature-branch PRs plus the conflict-resolved merge of current main. Squashed because the feature branch's history includes commits with missing or mismatched Signed-off-by trailers that block DCO at this scope — the equivalent issue documented for the catch-up squashes (opensearch-project#5397). The feature branch f006e29 is retained for individual-commit lineage. Analytics-engine PPL integration — a new execution path that routes Parquet-backed (non-Lucene) indices through an analytics engine while keeping Lucene-backed indices on the existing v2 / Calcite paths. Headline pieces: - Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices hand off to the analytics-engine execution path; Lucene-backed indices continue through the legacy path - Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path - Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the unified parser and wires profiling metrics through the analytics path - extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an OpenSearch extension via SPI - SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork applied to the SQL transport, plus delegateToV2Engine extraction in RestSqlAction - Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine plans + version bump to OpenSearch 3.7 - Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional runtime dep so the SQL bundle is shippable without it - Index-setting-based routing (opensearch-project#5429, opensearch-project#5432) — replaces the earlier table-name-prefix heuristic with an authoritative index-setting check Supporting infrastructure: - Gradle wrapper bump to 9.4.1 (opensearch-project#5406) - Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 / httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409) - IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417); CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416) - plugins.calcite.enabled=true defaulted on the unified query path (opensearch-project#5413) - PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418) - Calcite tolerance fixes: array() default type (opensearch-project#5421), containsNestedAggregator flat-leaf schemas (opensearch-project#5423) - Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426) Brings the branch up to current upstream/main (54 commits since the last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419, others (bugfixes, doc updates, infra). Resolved during the merge of main into the feature branch. Resolution kept the feature branch's analytics-engine-path semantics where main's changes would have regressed them. - api/.../UnifiedQueryContext.java Blank-line-only conflict; took main's tighter formatting. - core/.../executor/QueryService.java Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...) wrapping (required for analytics-engine classloader isolation) and the matching import. - integ-test/build.gradle Kept feature's detailed root-cause comment on the Gradle 9.4.1 TestEventReporterAsListener workaround; kept ASCII ordering of JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT entries. - integ-test/.../CalciteEvalCommandIT.java Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards on test_eval and test_eval_agent setup (needed for the helper-managed index analytics-engine compatibility run). - legacy/.../RestSqlAction.java Kept feature's delegateToV2Engine(...) (extracted from the analytics-engine routing path). Both sides added handleException / getRestStatus / getRawErrorCode; removed the duplicate set git produced. - plugin/.../SQLPlugin.java Took the union of imports: ExecutionEngine + ExecutionEngine.ExplainResponse + QueryType. - plugin/.../transport/TransportPPLQueryAction.java Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and feature's local pluginSettings / pluginSettingsRef wiring. EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved as-is. The analytics-engine path is gated by the extendedPlugins extension being installed (opensearch-project#5403 makes the dep optional). Clusters without analytics-engine installed see no behavior change. Clusters with analytics-engine installed route only Parquet-backed indices through the new path (opensearch-project#5429 — by index setting). - ./gradlew :api:compileJava :core:compileJava :legacy:compileJava :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes locally Signed-off-by: Kai Huang <ahkcs@amazon.com> Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com> * Address @penghuo: revert stray blank line in doctest/build.gradle After 'apply plugin: opensearch.testclusters', one blank line is enough — restoring the single-blank spacing to match upstream/main. Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com> Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>

bowenlan-amzn force-pushed the experiment/routing-by-dataformat-setting branch from d504f77 to 28ac22a Compare May 11, 2026 17:26

bowenlan-amzn marked this pull request as ready for review May 11, 2026 17:27

bowenlan-amzn requested review from LantaoJin, RyanL1997, Swiddis, acarbonetto, ahkcs, anirudha, dai-chen, joshuali925, mengweieric, noCharger, penghuo, ps48, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners May 11, 2026 17:27

alchemist51 approved these changes May 11, 2026

View reviewed changes

ahkcs approved these changes May 11, 2026

View reviewed changes

ahkcs merged commit 32d7358 into opensearch-project:feature/mustang-ppl-integration May 11, 2026
56 of 59 checks passed

bowenlan-amzn deleted the experiment/routing-by-dataformat-setting branch May 11, 2026 18:25

ahkcs mentioned this pull request May 12, 2026

Add tests.analytics.parquet_indices toggle for analytics-engine IT coverage #5436

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Route analytics queries by index setting#5432

Route analytics queries by index setting#5432
ahkcs merged 1 commit into
opensearch-project:feature/mustang-ppl-integrationfrom
bowenlan-amzn:experiment/routing-by-dataformat-setting

bowenlan-amzn commented May 11, 2026

Uh oh!

github-actions Bot commented May 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bowenlan-amzn commented May 11, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 28ac22a)

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 11, 2026 •

edited

Loading

(Review updated until commit `28ac22a`)

github-actions Bot commented May 11, 2026 •

edited

Loading