Skip to content

Route analytics queries by index setting#5432

Merged
ahkcs merged 1 commit into
opensearch-project:feature/mustang-ppl-integrationfrom
bowenlan-amzn:experiment/routing-by-dataformat-setting
May 11, 2026
Merged

Route analytics queries by index setting#5432
ahkcs merged 1 commit into
opensearch-project:feature/mustang-ppl-integrationfrom
bowenlan-amzn:experiment/routing-by-dataformat-setting

Conversation

@bowenlan-amzn
Copy link
Copy Markdown
Member

Summary

  • Route queries to the analytics engine based on index settings instead of table-name prefix convention
  • Check both index.pluggable.dataformat.enabled=true AND index.pluggable.dataformat=composite before routing to the analytics path
  • If pluggable dataformat is enabled but the value is not composite (e.g. lucene), fall through to the standard Calcite→DSL path
  • Add pluginSettings to RestUnifiedQueryAction to support cluster-level setting overrides (rex max_match limit)

Test plan

  • Unit tests for routing logic (RestUnifiedQueryActionTest)
    • Composite format routes to analytics
    • Lucene format routes to old path
    • Missing index routes to old path
    • Null/empty queries route to old path

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

PR Reviewer Guide 🔍

(Review updated until commit 28ac22a)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Misleading test name

The test pluggableDataformatIndexRoutesToAnalytics registers an index named parquet_logs but now sets the dataformat to composite. This creates confusion between the index name and its actual format, potentially misleading future maintainers about what the test validates.

public void pluggableDataformatIndexRoutesToAnalytics() {
  registerIndex(
      "parquet_logs",
      Settings.builder()
          .put(IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.getKey(), true)
          .put(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.getKey(), "composite")
          .build());

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

PR Code Suggestions ✨

Latest suggestions up to 28ac22a
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Update test index name for clarity

The test index name parquet_logs is misleading since the dataformat value has been
changed from "parquet" to "composite". Consider renaming the index to reflect the
actual dataformat being tested, such as composite_logs, to maintain consistency and
avoid confusion.

plugin/src/test/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryActionTest.java [56-61]

 registerIndex(
-    "parquet_logs",
+    "composite_logs",
     Settings.builder()
         .put(IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.getKey(), true)
         .put(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.getKey(), "composite")
         .build());
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the test index name parquet_logs is now misleading after changing the dataformat value from "parquet" to "composite". Renaming to composite_logs would improve code clarity and maintainability, though this is not a critical functional issue.

Medium

Previous suggestions

Suggestions up to commit d504f77
CategorySuggestion                                                                                                                                    Impact
General
Replace hardcoded string with constant

The hardcoded string "composite" should be replaced with a constant to prevent typos
and improve maintainability. Consider defining a constant like COMPOSITE_FORMAT or
using an existing constant if available in IndexSettings.

plugin/src/main/java/org/opensearch/sql/plugin/rest/RestUnifiedQueryAction.java [106-114]

+private static final String COMPOSITE_FORMAT = "composite";
+
 private boolean isPluggableDataformatIndex(String indexName) {
   var indexMetadata = clusterService.state().metadata().index(indexName);
   if (indexMetadata == null) {
     return false;
   }
   var settings = indexMetadata.getSettings();
   return IndexSettings.PLUGGABLE_DATAFORMAT_ENABLED_SETTING.get(settings)
-      && "composite".equals(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.get(settings));
+      && COMPOSITE_FORMAT.equals(IndexSettings.PLUGGABLE_DATAFORMAT_VALUE_SETTING.get(settings));
 }
Suggestion importance[1-10]: 5

__

Why: While replacing the hardcoded "composite" string with a constant improves maintainability and reduces typo risk, this is a minor code quality improvement. The suggestion is valid and follows best practices, but has moderate impact.

Low

Today `RestUnifiedQueryAction.isAnalyticsIndex` dispatches to the analytics
engine when the source index name starts with `parquet_`. That's brittle —
it conflates naming convention with storage type. An index created without
the prefix but with pluggable dataformat enabled is silently sent to the
Lucene path; an index named `parquet_foo` without the setting is
mis-dispatched to analytics.

Use the authoritative signal instead: the `index.pluggable.dataformat.enabled`
setting on cluster-state metadata. This is the same setting integration tests
(`CoordinatorReduceIT`, `CompositeCommitDeletionIT`, etc.) already use to
create analytics-backed indices, and it's what `FieldStorageResolver` reads
to decide field-level storage.

Behavior:
- `index.pluggable.dataformat.enabled=true`  → analytics engine (DataFusion)
- flag absent / false / index missing         → Calcite→OpenSearch DSL path

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 28ac22a

@ahkcs ahkcs merged commit 32d7358 into opensearch-project:feature/mustang-ppl-integration May 11, 2026
56 of 59 checks passed
ahkcs pushed a commit to ahkcs/sql that referenced this pull request May 11, 2026
…search-project#5432)

Today `RestUnifiedQueryAction.isAnalyticsIndex` dispatches to the analytics
engine when the source index name starts with `parquet_`. That's brittle —
it conflates naming convention with storage type. An index created without
the prefix but with pluggable dataformat enabled is silently sent to the
Lucene path; an index named `parquet_foo` without the setting is
mis-dispatched to analytics.

Use the authoritative signal instead: the `index.pluggable.dataformat.enabled`
setting on cluster-state metadata. This is the same setting integration tests
(`CoordinatorReduceIT`, `CompositeCommitDeletionIT`, etc.) already use to
create analytics-backed indices, and it's what `FieldStorageResolver` reads
to decide field-level storage.

Behavior:
- `index.pluggable.dataformat.enabled=true`  → analytics engine (DataFusion)
- flag absent / false / index missing         → Calcite→OpenSearch DSL path

Signed-off-by: bowenlan-amzn <bowenlan23@gmail.com>
ahkcs added a commit to ahkcs/sql that referenced this pull request May 11, 2026
Single squashed delivery of the long-running
feature/mustang-ppl-integration branch into main, consolidating 22
feature-branch PRs plus the conflict-resolved merge of current main.
Squashed because the feature branch's history includes commits with
missing or mismatched Signed-off-by trailers that block DCO at this
scope — the equivalent issue documented for the catch-up squashes
(opensearch-project#5397).

The feature branch f006e29 is retained for individual-commit lineage.

### What this delivers

Analytics-engine PPL integration — a new execution path that routes
Parquet-backed (non-Lucene) indices through an analytics engine while
keeping Lucene-backed indices on the existing v2 / Calcite paths.

Headline pieces:
- Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices
  hand off to the analytics-engine execution path; Lucene-backed indices
  continue through the legacy path
- Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path
- Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the
  unified parser and wires profiling metrics through the analytics path
- extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an
  OpenSearch extension via SPI
- SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork
  applied to the SQL transport, plus delegateToV2Engine extraction in
  RestSqlAction
- Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine
  plans + version bump to OpenSearch 3.7
- Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional
  runtime dep so the SQL bundle is shippable without it
- Index-setting-based routing (opensearch-project#5429, opensearch-project#5432) — replaces the earlier
  table-name-prefix heuristic with an authoritative index-setting check

Supporting infrastructure:
- Gradle wrapper bump to 9.4.1 (opensearch-project#5406)
- Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 /
  httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409)
- IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT
  carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417);
  CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd
  Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416)
- plugins.calcite.enabled=true defaulted on the unified query path
  (opensearch-project#5413)
- PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418)
- Calcite tolerance fixes: array() default type (opensearch-project#5421),
  containsNestedAggregator flat-leaf schemas (opensearch-project#5423)
- Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426)

### Feature-branch commits squashed (22)

opensearch-project#5432, opensearch-project#5429, opensearch-project#5426, opensearch-project#5423, opensearch-project#5421, opensearch-project#5418, opensearch-project#5403, opensearch-project#5417, opensearch-project#5415, opensearch-project#5416, opensearch-project#5413,
opensearch-project#5407, opensearch-project#5409, opensearch-project#5406, opensearch-project#5400, opensearch-project#5396, opensearch-project#5317, opensearch-project#5302, opensearch-project#5285, opensearch-project#5275, opensearch-project#5267,
opensearch-project#5397, opensearch-project#5286

### Main commits absorbed via the merge (54)

Brings the branch up to current upstream/main (54 commits since the
last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419,
opensearch-project#5408, opensearch-project#5414, opensearch-project#5399, opensearch-project#5394, opensearch-project#5361, opensearch-project#5360, opensearch-project#5240, opensearch-project#5266, opensearch-project#5278, plus 44
others (bugfixes, doc updates, infra).

### Conflict resolutions (7)

Resolved during the merge of main into the feature branch. Resolution
kept the feature branch's analytics-engine-path semantics where main's
changes would have regressed them.

- api/.../UnifiedQueryContext.java
  Blank-line-only conflict; took main's tighter formatting.

- core/.../executor/QueryService.java
  Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...)
  wrapping (required for analytics-engine classloader isolation) and
  the matching import.

- integ-test/build.gradle
  Kept feature's detailed root-cause comment on the Gradle 9.4.1
  TestEventReporterAsListener workaround; kept ASCII ordering of
  JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT
  entries.

- integ-test/.../CalciteEvalCommandIT.java
  Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards
  on test_eval and test_eval_agent setup (needed for the helper-managed
  index analytics-engine compatibility run).

- legacy/.../RestSqlAction.java
  Kept feature's delegateToV2Engine(...) (extracted from the
  analytics-engine routing path). Both sides added handleException /
  getRestStatus / getRawErrorCode; removed the duplicate set git
  produced.

- plugin/.../SQLPlugin.java
  Took the union of imports: ExecutionEngine +
  ExecutionEngine.ExplainResponse + QueryType.

- plugin/.../transport/TransportPPLQueryAction.java
  Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and
  feature's local pluginSettings / pluginSettingsRef wiring.

EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved
as-is.

### Compatibility / opt-in

The analytics-engine path is gated by the extendedPlugins extension
being installed (opensearch-project#5403 makes the dep optional). Clusters without
analytics-engine installed see no behavior change. Clusters with
analytics-engine installed route only Parquet-backed indices through
the new path (opensearch-project#5429 — by index setting).

### Verification

- ./gradlew :api:compileJava :core:compileJava :legacy:compileJava
  :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes
  locally

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>
@bowenlan-amzn bowenlan-amzn deleted the experiment/routing-by-dataformat-setting branch May 11, 2026 18:25
ahkcs added a commit that referenced this pull request May 11, 2026
* Land analytics-engine PPL integration into main

Single squashed delivery of the long-running
feature/mustang-ppl-integration branch into main, consolidating 22
feature-branch PRs plus the conflict-resolved merge of current main.
Squashed because the feature branch's history includes commits with
missing or mismatched Signed-off-by trailers that block DCO at this
scope — the equivalent issue documented for the catch-up squashes
(#5397).

The feature branch f006e29 is retained for individual-commit lineage.

### What this delivers

Analytics-engine PPL integration — a new execution path that routes
Parquet-backed (non-Lucene) indices through an analytics engine while
keeping Lucene-backed indices on the existing v2 / Calcite paths.

Headline pieces:
- Query routing (#5267) — PPL queries against Parquet-backed indices
  hand off to the analytics-engine execution path; Lucene-backed indices
  continue through the legacy path
- Explain support (#5275) — EXPLAIN covers the analytics-engine path
- Profiling + UnifiedQueryParser (#5285) — migrates PPL parsing to the
  unified parser and wires profiling metrics through the analytics path
- extendedPlugins wiring (#5302) — analytics-engine attaches as an
  OpenSearch extension via SPI
- SQL REST endpoint integration (#5317) — same analytics-route fork
  applied to the SQL transport, plus delegateToV2Engine extraction in
  RestSqlAction
- Async QueryPlanExecutor (#5396) — async execution for analytics-engine
  plans + version bump to OpenSearch 3.7
- Optional dependency (#5403) — analytics-engine becomes an optional
  runtime dep so the SQL bundle is shippable without it
- Index-setting-based routing (#5429, #5432) — replaces the earlier
  table-name-prefix heuristic with an authoritative index-setting check

Supporting infrastructure:
- Gradle wrapper bump to 9.4.1 (#5406)
- Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 /
  httpcore5-reactive / httpclient5 (#5400, #5409)
- IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT
  carried through the helper-managed index path (#5407, #5417);
  CalciteReplaceCommandIT column-order-agnostic (#5415); @ignore'd
  Calcite ITs dropped from CalciteNoPushdownIT (#5416)
- plugins.calcite.enabled=true defaulted on the unified query path
  (#5413)
- PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (#5418)
- Calcite tolerance fixes: array() default type (#5421),
  containsNestedAggregator flat-leaf schemas (#5423)
- Sandbox deps switched to analytics-api JDK 21 surface (#5426)

### Feature-branch commits squashed (22)

#5432, #5429, #5426, #5423, #5421, #5418, #5403, #5417, #5415, #5416, #5413,
#5407, #5409, #5406, #5400, #5396, #5317, #5302, #5285, #5275, #5267,
#5397, #5286

### Main commits absorbed via the merge (54)

Brings the branch up to current upstream/main (54 commits since the
last catch-up at #5397, divergence point 513e1b2). Highlights: #5419,
#5408, #5414, #5399, #5394, #5361, #5360, #5240, #5266, #5278, plus 44
others (bugfixes, doc updates, infra).

### Conflict resolutions (7)

Resolved during the merge of main into the feature branch. Resolution
kept the feature branch's analytics-engine-path semantics where main's
changes would have regressed them.

- api/.../UnifiedQueryContext.java
  Blank-line-only conflict; took main's tighter formatting.

- core/.../executor/QueryService.java
  Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...)
  wrapping (required for analytics-engine classloader isolation) and
  the matching import.

- integ-test/build.gradle
  Kept feature's detailed root-cause comment on the Gradle 9.4.1
  TestEventReporterAsListener workaround; kept ASCII ordering of
  JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT
  entries.

- integ-test/.../CalciteEvalCommandIT.java
  Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards
  on test_eval and test_eval_agent setup (needed for the helper-managed
  index analytics-engine compatibility run).

- legacy/.../RestSqlAction.java
  Kept feature's delegateToV2Engine(...) (extracted from the
  analytics-engine routing path). Both sides added handleException /
  getRestStatus / getRawErrorCode; removed the duplicate set git
  produced.

- plugin/.../SQLPlugin.java
  Took the union of imports: ExecutionEngine +
  ExecutionEngine.ExplainResponse + QueryType.

- plugin/.../transport/TransportPPLQueryAction.java
  Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and
  feature's local pluginSettings / pluginSettingsRef wiring.

EngineExtensionsHolder.java is a new file from main (#5298) preserved
as-is.

### Compatibility / opt-in

The analytics-engine path is gated by the extendedPlugins extension
being installed (#5403 makes the dep optional). Clusters without
analytics-engine installed see no behavior change. Clusters with
analytics-engine installed route only Parquet-backed indices through
the new path (#5429 — by index setting).

### Verification

- ./gradlew :api:compileJava :core:compileJava :legacy:compileJava
  :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes
  locally

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>

* Address @penghuo: revert stray blank line in doctest/build.gradle

After 'apply plugin: opensearch.testclusters', one blank line is enough
— restoring the single-blank spacing to match upstream/main.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>
ahkcs added a commit to ahkcs/sql that referenced this pull request May 12, 2026
Adds a small, opt-in test infrastructure slice so the PPL integration
test suite can run end-to-end against the analytics-engine backend
without per-test rewiring.

`-Dtests.analytics.parquet_indices=true` makes `TestUtils.createIndexByRestClient`
back every test-created index with single-shard composite/parquet storage:

    index.pluggable.dataformat.enabled = true
    index.pluggable.dataformat = "composite"
    index.composite.primary_data_format = "parquet"

`RestUnifiedQueryAction.isAnalyticsIndex` (post-opensearch-project#5432) reads these settings
and routes any query against such indices to the analytics-engine planner
(DataFusion). No additional cluster setting or routing override required —
the production routing logic is the single source of truth.

Also adds `PPLIntegTestCase.isAnalyticsParquetIndicesEnabled()` as a
per-test predicate so individual tests can branch their assertions on
engine semantics (DataFusion follows different ordering and null-bucket
semantics than the legacy V2 and Calcite-DSL paths).

Bulk loads on parquet-backed indices use `refresh=true` because
`analytics-backend-lucene`'s `LuceneCommitter.getSafeCommitInfo` is a
`TODO:: with index deleter` stub that hangs `refresh=wait_for` until the
test framework request timeout (~60s).

`integ-test/build.gradle` forwards the property to `:integTestRemote` so
the gradle command line is the single knob.

Default behavior is unchanged — with the flag unset, every test-created
index is Lucene-backed and every IT runs through the existing V2 /
Calcite path exactly as before.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit that referenced this pull request May 15, 2026
…verage (#5436)

* Add tests.analytics.parquet_indices test toggle

Adds a small, opt-in test infrastructure slice so the PPL integration
test suite can run end-to-end against the analytics-engine backend
without per-test rewiring.

`-Dtests.analytics.parquet_indices=true` makes `TestUtils.createIndexByRestClient`
back every test-created index with single-shard composite/parquet storage:

    index.pluggable.dataformat.enabled = true
    index.pluggable.dataformat = "composite"
    index.composite.primary_data_format = "parquet"

`RestUnifiedQueryAction.isAnalyticsIndex` (post-#5432) reads these settings
and routes any query against such indices to the analytics-engine planner
(DataFusion). No additional cluster setting or routing override required —
the production routing logic is the single source of truth.

Also adds `PPLIntegTestCase.isAnalyticsParquetIndicesEnabled()` as a
per-test predicate so individual tests can branch their assertions on
engine semantics (DataFusion follows different ordering and null-bucket
semantics than the legacy V2 and Calcite-DSL paths).

Bulk loads on parquet-backed indices use `refresh=true` because
`analytics-backend-lucene`'s `LuceneCommitter.getSafeCommitInfo` is a
`TODO:: with index deleter` stub that hangs `refresh=wait_for` until the
test framework request timeout (~60s).

`integ-test/build.gradle` forwards the property to `:integTestRemote` so
the gradle command line is the single knob.

Default behavior is unchanged — with the flag unset, every test-created
index is Lucene-backed and every IT runs through the existing V2 /
Calcite path exactly as before.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Branch 7 stats tests on isAnalyticsParquetIndicesEnabled

When -Dtests.analytics.parquet_indices=true is set, every test-created
index is parquet-backed and RestUnifiedQueryAction.isAnalyticsIndex
routes all queries through the analytics-engine backend (DataFusion).
Seven tests need analytics-specific assertions because DataFusion follows
different null-bucket, SQL-spec, and TDigest interpolation semantics
than the legacy V2 / Calcite-DSL paths:

SQL-spec semantics (DataFusion follows the spec; legacy DSL returns 0):
- testSumWithNull — SUM of all-null returns null on analytics; the
  existing isPushdownDisabled branch already handles the Calcite-no-
  pushdown case. OR analytics into that branch.
- testStatsPercentileByNullValue + NonNullBucket — percentile of an
  all-null / empty group returns null on analytics. Same OR pattern.

Non-deterministic head ordering:
- testStatsWithLimit Q1+Q2 — head 5 over a 6-bucket result picks a
  different null-balance row than the legacy / Calcite-DSL path; same
  effect cascades into `head 2 from 1`. Reuse the size-only assertion
  branch already present for the no-pushdown case.

TDigest interpolation divergence (genuine impl difference):
- testStatsPercentileWithNull — DataFusion TDigest interpolates p50 to
  35413 vs OpenSearch's 39225; both within compression-bound error.
  Per-engine expected value.
- testStatsPercentileBySpan — same TDigest diff on the age=30 bucket
  (33194 vs 39225). Coordinated with opensearch-project/OpenSearch#21584
  commit 5 which fixes the upstream planner-side type mismatch so the
  query reaches the data assertion at all.

Semantic divergence pending team decision:
- testDisableLegacyPreferred — under PPL_SYNTAX_LEGACY_PREFERRED=false,
  V2 / Calcite-DSL drop the null-age bucket (5 rows) while DataFusion
  keeps it (6 rows). Skipped on the analytics path via
  Assume.assumeFalse until the team decides which behaviour is the
  intended new-syntax semantics.

Out of scope for this PR (intentionally left failing on the analytics
path so the gap is visible in CI):
- testStatsBySpanTimeWithNullBucket — span(@timestamp, 12h) multi-unit
  time interval unsupported in current SpanAdapter (only interval=1 is
  rewritten to DATE_TRUNC; multi-unit needs DataFusion's date_bin).

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Consolidate analytics parquet config into AnalyticsIndexConfig

Per @dai-chen's review feedback — the index-creation settings, the
bulk-load refresh strategy, and the system-property gate were spread
across three places in TestUtils (the `ANALYTICS_PARQUET_INDICES_PROP`
constant, the inline check in `createIndexByRestClient`, the inline
check in `loadDataByRestClient`, and the private `makeParquetBacked`
helper). That's the same spread-out-conditional pattern the OS-Spark
repo's AOSS configs ended up with.

Collect everything into a single `TestUtils.AnalyticsIndexConfig`
nested helper:

  * `ENABLED_PROP` — the system property name.
  * `isEnabled()` — single source for the gate.
  * `applyIndexCreationSettings(jsonObject)` — injects the
    parquet-backed composite settings; no-op when disabled.
  * `bulkLoadRefreshParam()` — returns the right `_bulk` refresh query
    string for the active index type.

`createIndexByRestClient` and `loadDataByRestClient` call these
unconditionally — the methods themselves short-circuit when the config
is off — so the parquet-specific branching is gone from the helpers
and concentrated in one place. `PPLIntegTestCase.isAnalyticsParquetIndicesEnabled()`
now delegates to `AnalyticsIndexConfig.isEnabled()`.

The legacy `ANALYTICS_PARQUET_INDICES_PROP` constant is kept as a
forwarding alias for any external callers.

Verified end-to-end against the analytics-engine cluster — same
50 / 63 pass + 1 skip + 12 fail as the pre-refactor run, so the
consolidation is behavior-preserving.

Signed-off-by: Kai Huang <huangkaics@gmail.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Revert per-engine assertions on percentile tests

Per @dai-chen — since the companion OpenSearch PR (#21584) no longer
wires `percentile_approx` on the analytics path (Sandesh's "repurpose
for SPAN" scope), all `percentile(...)` queries fail at execution
before reaching the assertion. The per-engine branches that this PR
introduced for `testStatsPercentile{WithNull,BySpan,ByNullValue,
ByNullValueNonNullBucket}` were doing nothing in practice — the tests
fail regardless.

Reverts those four tests to their pre-PR state. The three remaining
intentional changes stand:

  * `testStatsWithLimit` Q1+Q2 — non-deterministic head ordering on
    DataFusion hash-bucket order; size-only branch on the analytics
    path.
  * `testSumWithNull` — extends the existing
    `isPushdownDisabled() ? null : 0` pattern. SUM works on the
    analytics path and DataFusion returns null per SQL spec.
  * `testDisableLegacyPreferred` — `Assume.assumeFalse` skip for the
    pending-team-decision null-bucket semantic divergence.

Signed-off-by: Kai Huang <huangkaics@gmail.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <huangkaics@gmail.com>
dai-chen pushed a commit to dai-chen/sql-1 that referenced this pull request May 19, 2026
)

* Land analytics-engine PPL integration into main

Single squashed delivery of the long-running
feature/mustang-ppl-integration branch into main, consolidating 22
feature-branch PRs plus the conflict-resolved merge of current main.
Squashed because the feature branch's history includes commits with
missing or mismatched Signed-off-by trailers that block DCO at this
scope — the equivalent issue documented for the catch-up squashes
(opensearch-project#5397).

The feature branch f006e29 is retained for individual-commit lineage.

Analytics-engine PPL integration — a new execution path that routes
Parquet-backed (non-Lucene) indices through an analytics engine while
keeping Lucene-backed indices on the existing v2 / Calcite paths.

Headline pieces:
- Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices
  hand off to the analytics-engine execution path; Lucene-backed indices
  continue through the legacy path
- Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path
- Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the
  unified parser and wires profiling metrics through the analytics path
- extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an
  OpenSearch extension via SPI
- SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork
  applied to the SQL transport, plus delegateToV2Engine extraction in
  RestSqlAction
- Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine
  plans + version bump to OpenSearch 3.7
- Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional
  runtime dep so the SQL bundle is shippable without it
- Index-setting-based routing (opensearch-project#5429, opensearch-project#5432) — replaces the earlier
  table-name-prefix heuristic with an authoritative index-setting check

Supporting infrastructure:
- Gradle wrapper bump to 9.4.1 (opensearch-project#5406)
- Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 /
  httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409)
- IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT
  carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417);
  CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd
  Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416)
- plugins.calcite.enabled=true defaulted on the unified query path
  (opensearch-project#5413)
- PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418)
- Calcite tolerance fixes: array() default type (opensearch-project#5421),
  containsNestedAggregator flat-leaf schemas (opensearch-project#5423)
- Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426)

Brings the branch up to current upstream/main (54 commits since the
last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419,
others (bugfixes, doc updates, infra).

Resolved during the merge of main into the feature branch. Resolution
kept the feature branch's analytics-engine-path semantics where main's
changes would have regressed them.

- api/.../UnifiedQueryContext.java
  Blank-line-only conflict; took main's tighter formatting.

- core/.../executor/QueryService.java
  Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...)
  wrapping (required for analytics-engine classloader isolation) and
  the matching import.

- integ-test/build.gradle
  Kept feature's detailed root-cause comment on the Gradle 9.4.1
  TestEventReporterAsListener workaround; kept ASCII ordering of
  JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT
  entries.

- integ-test/.../CalciteEvalCommandIT.java
  Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards
  on test_eval and test_eval_agent setup (needed for the helper-managed
  index analytics-engine compatibility run).

- legacy/.../RestSqlAction.java
  Kept feature's delegateToV2Engine(...) (extracted from the
  analytics-engine routing path). Both sides added handleException /
  getRestStatus / getRawErrorCode; removed the duplicate set git
  produced.

- plugin/.../SQLPlugin.java
  Took the union of imports: ExecutionEngine +
  ExecutionEngine.ExplainResponse + QueryType.

- plugin/.../transport/TransportPPLQueryAction.java
  Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and
  feature's local pluginSettings / pluginSettingsRef wiring.

EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved
as-is.

The analytics-engine path is gated by the extendedPlugins extension
being installed (opensearch-project#5403 makes the dep optional). Clusters without
analytics-engine installed see no behavior change. Clusters with
analytics-engine installed route only Parquet-backed indices through
the new path (opensearch-project#5429 — by index setting).

- ./gradlew :api:compileJava :core:compileJava :legacy:compileJava
  :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes
  locally

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>

* Address @penghuo: revert stray blank line in doctest/build.gradle

After 'apply plugin: opensearch.testclusters', one blank line is enough
— restoring the single-blank spacing to match upstream/main.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants