Skip to content

feat(hive): support structure comparison for Hive (#2872)#3322

Merged
Seechi-Yolo merged 18 commits into
mainfrom
hive-compare-2872
May 27, 2026
Merged

feat(hive): support structure comparison for Hive (#2872)#3322
Seechi-Yolo merged 18 commits into
mainfrom
hive-compare-2872

Conversation

@actiontech-bot
Copy link
Copy Markdown
Member

@actiontech-bot actiontech-bot commented May 27, 2026

User description

Summary

实现 Hive 数据源的结构对比能力(issue actiontech/sqle-ee#2872):

  • 在 SQLE 中新增 Hive 内置驱动(sqle/driver/hive/),通过 github.com/beltran/gohive 真实连接 HiveServer2
  • 实现 OptionalGetDatabaseObjectDDL(TABLE/VIEW 取 DDL;FUNCTION 拒绝返回 ddl-compat-unsupported)与 OptionalGetDatabaseDiffModifySQL(ALTER + DROP/CREATE fallback;DROP/CREATE 分支在结果中前置 -- WARNING: ... 注释)
  • 注册 DriverTypeHive 常量;为新错误码 4009 (DatabaseCompareNotSupported) 添加中英文 i18n 文案
  • Schemas 实现走 SHOW DATABASES;Ping/Close 走 gohive;Exec/ExecBatch 提供执行 modify SQL 的入口
  • 修复多个 driver-level 行为:USE/SET 类无结果语句容忍 row-err;空 schema-list 自动通过 SHOW TABLES 兜底发现;FUNCTION 对象在批量中跳过而非中断
  • 新增 gohive v1.6.0 与传递依赖(apache/thrift v0.19.0beltran/gosaslbeltran/gssapigo-zookeeper/zk v1.0.3),版本与 DMS-EE 对齐避免冲突

Test plan

  • go vet ./... 通过
  • go test ./sqle/driver/hive/... 通过
  • CI: build / lint / test

Fixes #2872 (actiontech/sqle-ee)
Refs design.md §3.2.3 / §3.5


Description

  • 新增 Hive 驱动插件及结构对比功能支持

  • 添加大量单元测试覆盖 Hive DDL 与 DiffModifySQL 场景

  • 更新错误码、国际化消息与日志提示信息

  • 集成 Hive 驱动依赖(gohive、gosasl、gssapi、zk)到项目中


File Walkthrough

Relevant files
Tests
2 files
hive_compare_test.go
新增 Hive 对比功能单元测试,覆盖多种场景                                                                   
+1136/-0
hive_test.go
新增 Hive 驱动单元测试文件                                                                                 
+848/-0 
Enhancement
5 files
hive.go
实现 Hive 驱动插件,支持对象 DDL 获取与差异 SQL 生成                                             
+1015/-0
errors.go
更新错误码,将结构对比相关错误统一为 4009                                                                   
+5/-1     
util.go
注册 Hive 驱动类型到驱动常量中                                                                             
+1/-0     
sqled.go
导入 Hive 驱动,集成至服务器启动流程                                                                       
+1/-0     
diff_table.go
新增 Hive 表结构差异处理逻辑实现                                                                           
+784/-0 
Documentation
4 files
message_zh.go
添加 Hive 驱动结构对比相关的中文国际化消息                                                                 
+12/-0   
logo.go
添加 Hive 驱动 logo 文件占位                                                                         
+5/-0     
active.en.toml
更新英文国际化消息,增加 Hive 对比提示                                                                     
+1/-0     
active.zh.toml
更新中文国际化消息,添加 Hive 对比提示                                                                     
+1/-0     
Dependencies
1 files
go.mod
更新依赖,新增 gohive、gosasl、gssapi 和 zk 支持                                         
+5/-0     

Add DriverTypeHive = "Hive" to the driver type constants block in
sqle/driver/v2/util.go, matching DMS-EE's DBTypeHive value exactly.
This enables the SQLE plugin system to recognize Hive as a valid
database driver type.

issue: #2859
Add Hive plugin directory with PluginProcessor and HiveDriverImpl
implementing the full Plugin interface. Features:
- init() registration to BuiltInPluginProcessors
- GetDriverMetas with PluginName="Hive", port=10000
- additionalParams: auth (NOSASL/NONE/LDAP/KERBEROS), transport_mode (binary/http)
- Parse with SQL splitting and keyword-prefix classification (DQL/DML/DDL)
- Audit returning empty results (no rules in skeleton phase)
- Stub implementations for unsupported methods

issue: #2859
Map-case style tests covering:
- Plugin registration in BuiltInPluginProcessors
- GetDriverMetas: PluginName, DefaultPort, auth/transport_mode params,
  empty Rules and EnabledOptionalModule
- classifySQL: DQL (SELECT/WITH/SHOW/DESCRIBE/DESC/EXPLAIN),
  DML (INSERT/UPDATE/DELETE/MERGE/LOAD/EXPORT), DDL (CREATE/ALTER/DROP/GRANT)
- splitSQL: single/multiple/trailing semicolon/empty/whitespace
- Audit returns correct-length empty results
- Parse with single/multiple/empty SQL
- Ping with nil DSN returns error
- Open with nil DSN succeeds (offline mode)

issue: #2859
Replace placeholder Ping/Close with real gohive connectivity:
- Add newHiveConnection() that creates gohive.Connection from DSN
  parameters (host, port, user, password, database, auth, transport_mode)
  following DMS-EE NewHiveConn pattern
- Update Open() to establish real connection when DSN is provided
- Update Ping() to execute SELECT 1 via gohive cursor
- Update Close() to close gohive connection
- Add HiveDriverImpl.conn field for connection lifecycle
- Add unit tests for nil conn boundary cases

Offline audit mode (nil DSN) continues to work without connection.
The Hive built-in plugin's init() function was never called because
sqle/server/sqled.go only imported the MySQL driver. Adding the blank
import for github.com/actiontech/sqle/sqle/driver/hive ensures the
Hive plugin registers itself into BuiltInPluginProcessors at startup,
making the Hive data source type available in the DMS driver list.

Fixes #2859
…ASES

The Schemas() method was returning an error "hive plugin does not support
Schemas", causing the /v1/projects/{project}/instances/{instance}/schemas
API to return 500. This blocked the data export workflow as users could
not select a database from the dropdown.

Now executes SHOW DATABASES via gohive cursor and returns the result list.

Fixes BUG-2.1-1
…17, compat-RISK-10) (#2872)

HiveServer2's FetchResults stage returns a non-fatal ROW-ERR
(StatusCode:ERROR_STATUS, InfoMessages:[Server-side error; please
check HS2 logs.]) for statements that produce no result columns -
USE <db>, SET ..., DDL. Prior to this fix gohiveQueryRunner.run
SingleStringQuery surfaced the ROW-ERR as a hard error, which
broke GetDatabaseObjectDDL / GetDatabaseDiffModifySQL whenever a
USE <schema> was issued and produced TC-HIVE-005 4012 errors in
web tests.

Changes:
- Extract isHS2NoResultRowErr classifier matching the canonical
  HS2 ROW-ERR markers (status + info-message substrings).
- Introduce hiveCursor interface + gohiveCursorAdapter so the
  fetch loop can be exercised by unit tests without a live HS2.
- Move the FetchOne / HasMore loop into fetchAllRows which
  tolerates the classified ROW-ERR (break out, return rows so
  far) and propagates every other cursor error unchanged.
- runSingleStringQuery and Schemas now delegate the loop to
  fetchAllRows so both code paths share the contract.

Tests:
- Test_IsHS2NoResultRowErr (5 sub-cases) covers nil, the real
  HS2 payload, a syntax error and two partial-match negatives.
- Test_FetchAllRows_RowErrTolerant (3 sub-cases) verifies USE
  row-err tolerance, SHOW TABLES with trailing row-err, and
  genuine syntax-error propagation.
- Test_RunSingleStringQuery_NilConn guards the early-return.

References: compat-RISK-10, docs/test/case-TC-HIVE-005.md
… (compat-RISK-10) (#2872)

server/compare/database_compare_ee.go::ExecDatabaseCompare forwards
SchemaName but never populates DatabaseObjects on the
DatabaseSchemaInfo it hands to drivers. MySQL handles this with an
auto-discovery branch (mysql_ee.go::GetDatabaseObjectDDL line 380-389
calling getAllSchemaObjects). The Hive driver previously did not,
so even after the ROW-ERR tolerance fix the API returned
comparison_result=same / inconsistent_num=0 and database_diff_objects
was empty — the structure compare tree always showed zero diffs.

Changes:
- listAllSchemaObjects helper enumerates SHOW TABLES + SHOW VIEWS,
  classifies each name as TABLE/VIEW. SHOW VIEWS failures degrade to
  "all TABLE" so the helper still works on Hive < 2.2.
- GetDatabaseObjectDDL detects empty DatabaseObjects and fills it via
  listAllSchemaObjects after the USE <schema> succeeds.
- GetDatabaseDiffModifySQL takes the union of base + compared
  auto-discovery, then re-USEs both sides before the per-object SHOW
  CREATE TABLE so the diff round-trip is correct.

Tests:
- Test_ListAllSchemaObjects (4 sub-cases): tables+views classified,
  SHOW VIEWS failure degrades, SHOW TABLES error propagates, empty
  names skipped.
- Test_GetDatabaseObjectDDL_DefaultDiscovery: caller passes nil
  DatabaseObjects -> driver discovers t_base_only (TABLE) +
  v_user_summary (VIEW) and returns both DDLs.
- Test_GetDatabaseObjectDDL_EmptyObjectList updated to the new
  contract (auto-discovery vs prior "return empty").

References: compat-RISK-10 (secondary fix), task_test_003 episodic,
mysql/mysql_ee.go::GetDatabaseObjectDDL behavioural parity.
… compat-RISK-9) (#2872)

Align GetDatabaseObjectDDL and GetDatabaseDiffModifySQL FUNCTION branch
with the PROCEDURE/TRIGGER/EVENT short-circuit: silently skip the
unsupported FUNCTION object (no Go error, no placeholder DDL entry) and
emit a structured WARN log carrying objectType=FUNCTION. The previous
behaviour returned a hard error from the FUNCTION branch, which dropped
otherwise valid TABLE/VIEW results in the same batch — observed as
TC-HIVE-016 (mixed TABLE+FUNCTION) FAIL during Task-TEST-005.

Driver guarantees post-fix:
  * GetDatabaseObjectDDL: empty results entry when every requested
    object is FUNCTION; mixed batches return the TABLE/VIEW DDLs only.
  * GetDatabaseDiffModifySQL: USE-header preserved; FUNCTION never
    contributes SQL to ModifySQLs; TABLE main path keeps producing
    ALTER / DROP+CREATE statements.
  * sqled.log: logrus.Fields {objectType=FUNCTION, object=<name>} so
    operators can detect the skipped objects without parsing free text.

Unit tests updated to pin the new contract:
  * Test_GetDatabaseObjectDDL_FunctionRejected: results entry has 0
    DDLs, err == nil.
  * Test_GetDatabaseDiffModifySQL_FunctionRejected: only USE header in
    block; no DROP/CREATE/ALTER; err == nil.
  * Test_GetDatabaseDiffModifySQL_MixedFunctionAndTable (new): TABLE
    ALTER CHANGE COLUMN amt amt BIGINT preserved; fake_fn absent.
  * Test_GetDatabaseObjectDDL_MixedFunctionAndTable (new): TABLE DDL
    preserved; FUNCTION absent from DatabaseObjectDDLs.

Aligned with design §3.2.2 line 239 ("FUNCTION 跳过 objInfo, results
不含该项"). compat-RISK-9 state moves implemented -> verified after
TC-HIVE-015 / TC-HIVE-016 regression PASS.
…modify SQL (#2872)

Until now the Hive plugin returned a hard error "hive plugin does not support
Exec" for both Exec and ExecBatch, which means once the structure-compare
flow produces modify SQL and the user tries to run it through an SQLE work
order, the workflow execution fails immediately.

Rewrite the Hive driver to actually execute statements against HiveServer2:

- Implement Exec: submit one statement per cursor.Exec, strip the trailing
  ';', skip empty / comment-only statements (the modify-SQL splitter can
  produce "-- WARNING: ..." trailers), tolerate the HS2 no-result-column
  ROW-ERR (same classifier as runSingleStringQuery, compat-RISK-10).
- Implement ExecBatch: iterate Exec per statement, stop on the first
  error and return the partial result set (matches MySQL driver's batch
  contract in sqle/driver/mysql/mysql.go::ExecBatch).
- Return a hiveExecResult that satisfies database/sql/driver.Result by
  surfacing a defensive "not supported by hive plugin" error for
  LastInsertId / RowsAffected; HiveServer2 / gohive do not expose either
  reliably and we don't want to fabricate zeros that downstream auditors
  might trust.
- Add execRunnerFactory injection point + fakeExecRunner test double to
  exercise the contract without a live HS2 cluster.
- Cover the new behaviour with eight test cases:
  Test_StripSQLTerminator, Test_IsAllCommentLines,
  Test_Exec_SingleStatement, Test_Exec_EmptyAndCommentStatementsAreNoOp,
  Test_Exec_PropagatesRunnerError, Test_Exec_NilConnAndNoFactoryFails,
  Test_ExecBatch_AllSucceed, Test_ExecBatch_StopsOnFirstError.

Fixes #2872
Add github.com/beltran/gohive v1.6.0 for real HiveServer2 connectivity
in the SQLE Hive plugin. Transitive deps: apache/thrift v0.19.0,
beltran/gosasl, beltran/gssapi, go-zookeeper/zk v1.0.3.

Version aligned with DMS-EE go.mod to avoid dependency conflicts.
…18n keys for Hive structure compare (compat-RISK-1) (#2872)

Add error code 4009 (DatabaseCompareNotSupported) and bilingual i18n
keys for the controller-layer compatibility check that rejects database
types lacking OptionalGetDatabaseObjectDDL / OptionalGetDatabaseDiffModifySQL
capability. EE controller wires these into the database compare whitelist.

Refs design.md §3.2.3 / §3.5 (EE-11 / EE-12).
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 27, 2026

PR Reviewer Guide 🔍

(Review updated until commit 875ac5c)

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

错误码冲突

发现错误码 4009 同时用于 TaskActionInvalid 和 DatabaseCompareNotSupported,这可能导致下游错误处理混淆。请确认这种复用是有意为之且各处均能正确区分。

UserDisabled      ErrorCode = 4005
TaskNotExist      ErrorCode = 4006
TaskActionInvalid ErrorCode = 4009
// DatabaseCompareNotSupported 共用 4009:当 controller 层判定双侧 dbType 缺少
// OptionalGetDatabaseObjectDDL / OptionalGetDatabaseDiffModifySQL 能力时返回。
// 详见 design.md §3.5 / §3.2.3 EE-12。
DatabaseCompareNotSupported ErrorCode = 4009
DataExist                   ErrorCode = 4010
DataNotExist      ErrorCode = 4011

@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

…QL pattern

Modified fetchTableDDL to return an empty string with exists=false for non-existent tables, including query errors. This change harmonizes the error handling with the MySQL implementation, improving consistency in the Hive driver.
@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit 875ac5c

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@Seechi-Yolo Seechi-Yolo merged commit c769d39 into main May 27, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants