Added UDT for IP and Binary support by vinaykpud · Pull Request #5463 · opensearch-project/sql

vinaykpud · 2026-05-22T17:00:14Z

Description

Companion change for opensearch-project/OpenSearch#21807 which adds Calcite UDTs (IpType, BinaryType) for OpenSearch ip and binary columns to the analytics-engine path. This PR teaches the SQL plugin to recognize the new UDTs at the response-schema boundary so:

byte[] cells from the analytics backend get rendered as canonical IP strings / base64 strings (was: unsupported object class [B).
The response schema reports "type": "ip" / "type": "binary" correctly (was: both labelled "binary").
Existing comparison and CIDRMATCH query shapes keep working unchanged — the UDT is invisible to Calcite's planner-internal coercion machinery.

Background

OpenSearchSchemaBuilder (in the OpenSearch sandbox) used to map both ip and binary field types to plain SqlTypeName.VARBINARY. That collapse caused three downstream bugs in this plugin's PPL path through the analytics engine:

The DataFusion backend ships back a 16-byte ipv4-mapped-ipv6 buffer; AnalyticsExecutionEngine.convertRows calls ExprValueUtils.fromObjectValue(byte[]) which has no byte[] case → unsupported object class [B.
The response schema reports "type": "binary" for an IP column (because convertRelDataTypeToExprType(VARBINARY) → BINARY ExprType).
CIDRMATCH against a column was registered with PPLTypeChecker.family(SqlTypeFamily.BINARY, SqlTypeFamily.STRING), with the byte-range expansion living inline at the registration site.

The companion OpenSearch PR introduces IpType / BinaryType (Calcite UDTs that extend AbstractSqlType with VARBINARY underneath) and moves CIDRMATCH dispatch into a backend ScalarFunctionAdapter.

Approach

Two response-boundary changes plus one cleanup, all minimal:

1. Result-column UDT recognition (OpenSearchTypeFactory).
New sibling function convertAnalyticsEngineRelDataTypeToExprType does an instanceof dispatch:

if (type instanceof IpType) return IP;
if (type instanceof BinaryType) return BINARY;
return convertRelDataTypeToExprType(type);

The original convertRelDataTypeToExprType is deliberately unchanged — Calcite's coercion machinery round-trips through it, so returning IP ExprType for a VARBINARY column synthesizes IP(string) casts that DataFusion can't resolve. Keeping UDT recognition in a sibling function isolates it to the response-schema path.

2. Per-column UDT dispatch in row conversion (AnalyticsExecutionEngine.convertRows).
New static helper toExprValue(Object value, RelDataType type):

byte[] + IpType → InetAddresses.toAddrString(InetAddress.getByAddress(bytes)) (matches IpFieldMapper.valueFetcher semantics: dotted-quad for IPv4 / IPv4-mapped, RFC 5952 for pure IPv6).
byte[] + BinaryType → Base64.getEncoder().encodeToString(bytes) (matches the OpenSearch binary field wire contract).
Otherwise → ExprValueUtils.fromObjectValue(value) unchanged.

UnknownHostException from a malformed IP buffer falls through to the default handler so the user sees a clear error rather than a malformed address string.

3. CIDRMATCH cleanup (PPLFuncImpTable).

Removes udf/ip/CidrMatchAdapter and its inline registration. CIDRMATCH dispatch now lives in CidrMatchFunctionAdapter on the backend (in the companion OpenSearch PR), which means it serves both the production SQL-plugin variant and the sandbox test front-end with a single implementation.
Collapses the two-line registration (typecheck + withRexBuilderShim) to a single typecheck-only registerOperator(CIDRMATCH, PPLBuiltinOperators.CIDRMATCH).
The runtime CidrMatchFunction UDF stays as the dynamic last-resort fallback.

Testing

Unit tests added:

OpenSearchTypeFactoryTest:
- testConvertResultColumnIpTypeReturnsIpExprType — IpType → ExprCoreType.IP.
- testConvertResultColumnBinaryTypeReturnsBinaryExprType — BinaryType → ExprCoreType.BINARY.
- testConvertResultColumnPlainVarbinaryFallsBackToBinary — plain VARBINARY (no UDT) keeps returning BINARY.
- testConvertResultColumnDelegatesParityForNonUdtTypes — for every non-UDT RelDataType, the result-column variant must agree with the planner-internal variant. Drift here would mean response-schema labels diverge from what Calcite's coercion sees.
AnalyticsExecutionEngineTest:
- executeRelNode_ipColumnRendersAsAddressString — both ipv4-mapped IPv6 (::ffff:1.2.3.4 → "1.2.3.4") and pure IPv6 (::1) buffers render to the right canonical form, schema reports "ip".
- executeRelNode_binaryColumnRendersAsBase64 — byte[] payload base64-encodes to match OpenSearch wire format, schema reports "binary".

Existing tests still pass. [B-class regression caught at: physicalPlanExecute_callsOnFailure (already in the file, exercises the same converter dispatch).

End-to-end (against a single-node gradle run cluster with the companion OpenSearch PR applied):

cast(host as STRING) returns "1.2.3.4" / "::1" (was a 16-byte garbage buffer).
cast(blob as STRING) matches | fields blob (base64-encoded).
where host = '1.2.3.4' coerces cleanly with no Unable to convert call IP(string) regression.
cidrmatch(host, '1.2.3.0/24') (column form) and CIDRMATCH('1.2.3.4', '1.2.3.0/24') (literal form) both return correct row counts via the backend adapter.
Response schema labels: "type": "ip" for ip columns, "type": "binary" for binary columns.

github-actions · 2026-05-22T17:01:52Z

PR Reviewer Guide 🔍

(Review updated until commit `c854bbd`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Possible Issue The `toExprValue` method catches `UnknownHostException` when converting IP byte arrays but wraps it in an `IllegalStateException` with a message about "invalid IP buffer length". However, `InetAddress.getByAddress(byte[])` only throws `UnknownHostException` when the byte array length is not 4 or 16. This exception handling is correct but the error message could be misleading if the actual cause is something other than length (though in practice length is the only cause). More critically, if the byte array has an invalid length, the exception is thrown but the method continues to fall through to `ExprValueUtils.fromObjectValue(value)` in the catch block's absence of a return/rethrow, which would then fail with the original "unsupported object class [B" error. The catch block should rethrow or the method should not continue past the catch. try { return ExprValueUtils.stringValue( InetAddresses.toAddrString(InetAddress.getByAddress(bytes))); } catch (UnknownHostException e) { throw new IllegalStateException("invalid IP buffer length: " + bytes.length, e); } Design Concern The new `convertAnalyticsEngineRelDataTypeToExprType` method is documented as "result-schema-only" and warns against using it in the general path because it would cause Calcite to synthesize CAST operations that the substrait converter cannot handle. However, there is no runtime enforcement to prevent accidental use in the planner path. If a developer mistakenly calls this method during query planning instead of schema building, it could introduce subtle bugs that only manifest when IP/Binary UDTs are involved. Consider adding a runtime check or making the method package-private with a more restrictive name to reduce misuse risk. public static ExprType convertAnalyticsEngineRelDataTypeToExprType(RelDataType type) { if (type instanceof IpType) { return IP; } if (type instanceof BinaryType) { return BINARY; } return convertRelDataTypeToExprType(type); }

github-actions · 2026-05-22T17:02:09Z

PR Code Suggestions ✨

Latest suggestions up to c854bbd

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Add null value handling When `value` is `null`, the method will pass it to `fromObjectValue`, which may not handle null values correctly for UDT types. Add an explicit null check at the beginning to ensure consistent null handling across all type conversions. core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [156-170] private static ExprValue toExprValue(Object value, RelDataType type) { + if (value == null) { + return ExprValueUtils.nullValue(); + } if (value instanceof byte[] bytes) { if (type instanceof IpType) { try { return ExprValueUtils.stringValue( InetAddresses.toAddrString(InetAddress.getByAddress(bytes))); } catch (UnknownHostException e) { throw new IllegalStateException("invalid IP buffer length: " + bytes.length, e); } } else if (type instanceof BinaryType) { return ExprValueUtils.stringValue(Base64.getEncoder().encodeToString(bytes)); } } return ExprValueUtils.fromObjectValue(value); } Suggestion importance[1-10]: 7 __ Why: The suggestion addresses a potential issue where `null` values might not be handled consistently for UDT types. Adding explicit null handling at the beginning ensures consistent behavior and prevents potential issues with `fromObjectValue`, making this a valuable defensive programming practice.	Medium
General	Improve error message detail The `UnknownHostException` is thrown only when the byte array length is invalid (not 4 or 16 bytes). However, wrapping it in `IllegalStateException` loses the original exception context. Consider rethrowing the original exception or providing more detailed error information including the actual byte array content for debugging. core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [158-164] if (value instanceof byte[] bytes) { if (type instanceof IpType) { try { return ExprValueUtils.stringValue( InetAddresses.toAddrString(InetAddress.getByAddress(bytes))); } catch (UnknownHostException e) { - throw new IllegalStateException("invalid IP buffer length: " + bytes.length, e); + throw new IllegalStateException( + String.format("invalid IP buffer length: %d (expected 4 or 16 bytes)", bytes.length), e); } } else if (type instanceof BinaryType) { return ExprValueUtils.stringValue(Base64.getEncoder().encodeToString(bytes)); } } Suggestion importance[1-10]: 4 __ Why: The suggestion improves error message clarity by explicitly stating the expected byte array lengths (4 or 16 bytes). However, this is a minor enhancement to error handling that doesn't fix a bug or significantly improve functionality.	Low

Previous suggestions

Suggestions up to commit 90defce

Category	Suggestion	Impact
General	Handle null values explicitly When `row.length < fields.size()`, missing cells default to `null` and are passed to `toExprValue`. If `value` is `null` but `type` is `IpType` or `BinaryType`, the method skips the UDT branches and delegates to `fromObjectValue(null)`. Verify that `fromObjectValue` correctly handles `null` for these UDT types, or add an explicit null-check before the `instanceof byte[]` test to ensure consistent null handling. core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [131-135] for (int i = 0; i < fields.size(); i++) { RelDataTypeField field = fields.get(i); Object value = (i < row.length) ? row[i] : null; - valueMap.put(field.getName(), toExprValue(value, field.getType())); + if (value == null) { + valueMap.put(field.getName(), ExprValueUtils.nullValue()); + } else { + valueMap.put(field.getName(), toExprValue(value, field.getType())); + } } Suggestion importance[1-10]: 7 __ Why: Valid suggestion to explicitly handle `null` values before passing them to `toExprValue`. This ensures consistent null handling and avoids potential issues with UDT type checking on null values, though the current code likely works correctly via `fromObjectValue`.	Medium

Suggestions up to commit 90defce

Category	Suggestion	Impact
General	Preserve exception diagnostic information The `UnknownHostException` catch block may mask the actual cause of IP conversion failures. The exception message only reports buffer length but doesn't include the original exception message. Consider including `e.getMessage()` in the error message to preserve diagnostic information about why the conversion failed. core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [158-164] if (value instanceof byte[] bytes) { if (type instanceof IpType) { try { return ExprValueUtils.stringValue( InetAddresses.toAddrString(InetAddress.getByAddress(bytes))); } catch (UnknownHostException e) { - throw new IllegalStateException("invalid IP buffer length: " + bytes.length, e); + throw new IllegalStateException( + "invalid IP buffer length: " + bytes.length + ", cause: " + e.getMessage(), e); } } else if (type instanceof BinaryType) { return ExprValueUtils.stringValue(Base64.getEncoder().encodeToString(bytes)); } } Suggestion importance[1-10]: 5 __ Why: Including `e.getMessage()` in the error message provides better diagnostics. However, `UnknownHostException` from `InetAddress.getByAddress()` typically only occurs for invalid buffer lengths (not 4 or 16 bytes), so the current message already captures the key information. The improvement is marginal.	Low
General	Validate schema and row length match The loop assumes `fields.size()` matches or exceeds `row.length`, but if `fields.size() >` `row.length`, the code silently fills missing columns with `null`. This could mask data integrity issues. Consider validating that `fields.size() == row.length` and logging a warning or throwing an exception when they don't match to catch potential schema mismatches early. core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [131-135] +if (fields.size() != row.length) { + throw new IllegalStateException( + "Schema mismatch: expected " + fields.size() + " columns but got " + row.length); +} for (int i = 0; i < fields.size(); i++) { RelDataTypeField field = fields.get(i); - Object value = (i < row.length) ? row[i] : null; - valueMap.put(field.getName(), toExprValue(value, field.getType())); + valueMap.put(field.getName(), toExprValue(row[i], field.getType())); } Suggestion importance[1-10]: 4 __ Why: The suggestion to validate `fields.size() == row.length` could catch schema mismatches, but the existing code already handles the case where `row.length < fields.size()` by filling with `null`. Throwing an exception might be too strict if the executor legitimately returns partial rows. The suggestion may introduce breaking changes without clear evidence of a bug.	Low

Suggestions up to commit 0918d6b

Category	Suggestion	Impact
General	Validate IP buffer length explicitly The `UnknownHostException` is thrown only for invalid buffer lengths (not 4 or 16 bytes), but the error message doesn't clarify the expected lengths. Add validation to check buffer length explicitly before calling `InetAddress.getByAddress()` and provide a more descriptive error message indicating valid lengths (4 or 16 bytes). core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [157-170] private static ExprValue toExprValue(Object value, RelDataType type) { if (value instanceof byte[] bytes) { if (type instanceof IpType) { + if (bytes.length != 4 && bytes.length != 16) { + throw new IllegalStateException( + "invalid IP buffer length: " + bytes.length + " (expected 4 or 16 bytes)"); + } try { return ExprValueUtils.stringValue( InetAddresses.toAddrString(InetAddress.getByAddress(bytes))); } catch (UnknownHostException e) { - throw new IllegalStateException("invalid IP buffer length: " + bytes.length, e); + throw new IllegalStateException("failed to parse IP address", e); } } else if (type instanceof BinaryType) { return ExprValueUtils.stringValue(Base64.getEncoder().encodeToString(bytes)); } } return ExprValueUtils.fromObjectValue(value); } Suggestion importance[1-10]: 7 __ Why: The suggestion improves error handling by explicitly validating IP buffer lengths before processing, providing clearer error messages. However, `InetAddress.getByAddress()` already validates buffer length internally, so this is primarily a defensive programming improvement rather than fixing a critical bug.	Medium
General	Log warning for row-field mismatch When `row.length < fields.size()`, the code defaults missing values to `null`. However, this silently masks potential data corruption or schema mismatches. Consider logging a warning when row length doesn't match field count to aid debugging of upstream data issues. core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [131-135] +if (row.length != fields.size()) { + logger.warn("Row length mismatch: expected {} fields but got {} values", + fields.size(), row.length); +} for (int i = 0; i < fields.size(); i++) { RelDataTypeField field = fields.get(i); Object value = (i < row.length) ? row[i] : null; valueMap.put(field.getName(), toExprValue(value, field.getType())); } Suggestion importance[1-10]: 6 __ Why: Adding logging for row-field mismatches would help detect data corruption or schema issues during debugging. However, the existing code already handles this case gracefully by defaulting to `null`, and the suggestion only adds observability without changing behavior or fixing a bug.	Low

Suggestions up to commit b7fd95b

Category	Suggestion	Impact
General	Log invalid IP buffer errors The `UnknownHostException` catch block silently falls through to default handling, which may throw an unclear error. Consider logging the exception with the invalid buffer length to aid debugging when the backend sends malformed IP data. core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [158-167] if (value instanceof byte[] bytes) { if (type instanceof IpType) { try { return ExprValueUtils.stringValue( InetAddresses.toAddrString(InetAddress.getByAddress(bytes))); } catch (UnknownHostException e) { - // Defensive: backend gave us a buffer that isn't 4 or 16 bytes. Fall through to - // the default handling so the user sees a clear error rather than a malformed - // address string. + // Log the error with buffer length for debugging + logger.warn("Invalid IP buffer length: {} bytes, expected 4 or 16", bytes.length, e); + // Fall through to default handling } } else if (type instanceof BinaryType) { return ExprValueUtils.stringValue(Base64.getEncoder().encodeToString(bytes)); } } Suggestion importance[1-10]: 5 __ Why: Adding logging for the `UnknownHostException` would help with debugging malformed IP buffers from the backend. However, the current fallthrough behavior is intentional and documented, and the default handling will already surface an error to the user. The logging is a minor enhancement rather than a critical fix.	Low

Suggestions up to commit 540074b

Category	Suggestion	Impact
General	Log invalid IP byte arrays The `UnknownHostException` catch block silently falls through to default handling, which may throw an unclear error. Consider logging the exception with context (e.g., buffer length) to aid debugging when invalid IP byte arrays are encountered. core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java [158-166] if (value instanceof byte[] bytes) { if (type instanceof IpType) { try { return ExprValueUtils.stringValue(InetAddresses.toAddrString(InetAddress.getByAddress(bytes))); } catch (UnknownHostException e) { - // Defensive: backend gave us a buffer that isn't 4 or 16 bytes. Fall through to - // the default handling so the user sees a clear error rather than a malformed - // address string. + // Log the invalid buffer for debugging + logger.warn("Invalid IP address byte array of length {}: {}", bytes.length, e.getMessage()); + // Fall through to default handling } } else if (type instanceof BinaryType) { return ExprValueUtils.stringValue(Base64.getEncoder().encodeToString(bytes)); } } Suggestion importance[1-10]: 5 __ Why: Adding logging for invalid IP byte arrays would help with debugging, but the current defensive approach of falling through to default handling is already reasonable. The suggestion improves observability without fixing a critical issue. The score reflects a moderate improvement in maintainability and debugging capability.	Low

github-actions · 2026-05-22T18:22:52Z

Persistent review updated to latest commit b7fd95b

github-actions · 2026-05-22T20:23:41Z

Persistent review updated to latest commit 0918d6b

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

github-actions · 2026-05-25T17:58:32Z

Persistent review updated to latest commit 90defce

github-actions · 2026-05-25T18:06:08Z

Persistent review updated to latest commit 90defce

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

github-actions · 2026-05-25T18:17:11Z

Persistent review updated to latest commit c854bbd

vinaykpud requested review from LantaoJin, RyanL1997, Swiddis, acarbonetto, ahkcs, anirudha, dai-chen, joshuali925, mengweieric, noCharger, penghuo, ps48, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners May 22, 2026 17:00

vinaykpud marked this pull request as draft May 22, 2026 17:00

penghuo reviewed May 22, 2026

View reviewed changes

Comment thread core/src/main/java/org/opensearch/sql/executor/analytics/AnalyticsExecutionEngine.java

vinaykpud force-pushed the feature/udt_ip_binary branch from 540074b to b7fd95b Compare May 22, 2026 18:21

vinaykpud mentioned this pull request May 22, 2026

Added UDT for IP and Binary support opensearch-project/OpenSearch#21807

Merged

vinaykpud marked this pull request as ready for review May 22, 2026 20:22

penghuo previously approved these changes May 22, 2026

View reviewed changes

vinaykpud added 2 commits May 25, 2026 17:56

Added UDT for IP and Binary support

1de1cb6

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

refactored cidr and updated comments

90defce

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

vinaykpud force-pushed the feature/udt_ip_binary branch from 0918d6b to 90defce Compare May 25, 2026 17:57

vinaykpud added enhancement New feature or request PPL Piped processing language labels May 25, 2026

vinaykpud closed this May 25, 2026

vinaykpud reopened this May 25, 2026

spotless fix

c854bbd

Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>

vinaykpud dismissed penghuo’s stale review via c854bbd May 25, 2026 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added UDT for IP and Binary support#5463

Added UDT for IP and Binary support#5463
vinaykpud wants to merge 3 commits into
opensearch-project:mainfrom
vinaykpud:feature/udt_ip_binary

vinaykpud commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vinaykpud commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Background

Approach

Testing

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit c854bbd)

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vinaykpud commented May 22, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading

(Review updated until commit `c854bbd`)

github-actions Bot commented May 22, 2026 •

edited

Loading