[Bug Fix] Allow HiveCatalog to create table with TimestamptzType#585
[Bug Fix] Allow HiveCatalog to create table with TimestamptzType#585HonahX merged 10 commits intoapache:mainfrom
Conversation
pyiceberg/catalog/hive.py
Outdated
| DateType: "date", | ||
| TimeType: "string", | ||
| TimestampType: "timestamp", | ||
| TimestamptzType: "timestamp", |
There was a problem hiding this comment.
In Java, we map TimestamptzType to "timestamp with local time zone" for Hive version >= 3 TimestamptzType.
case TIMESTAMP:
Types.TimestampType timestampType = (Types.TimestampType) type;
if (HiveVersion.min(HiveVersion.HIVE_3) && timestampType.shouldAdjustToUTC()) {
return "timestamp with local time zone";
}
return "timestamp";But in python it seems we do not have a way to check the version. So I put the "timestamp" here to ensure compatibility. Please correct me if I am wrong here. Thanks!
There was a problem hiding this comment.
Can we just set arbitrary strings in there? If so, I think timestamp with local time zone is more accurate. It would be good to validate using an integration test as well, since we have those tests already.
There was a problem hiding this comment.
Theoretically, the content here should be constants in serdeConstants since java use these to test the Hive schema util
Among these types, timestamp with local time zone is a new one added in Hive3: ref.
Interestingly, when doing integration tests (use pyiceberg to write and spark to read) I found that Hive server will raise error for timestampabc but work normally for timestamp arbitrary string. Seems as long as the first word is within serdeConstants there will be no error loading the table. Not sure if timestamp arbitrary string will cause error in other Hive use cases.
How about we default to "timestamp with local time zone" and add a HiveCatalog property (e.g. hive.hive2-compatible-mode) to use "timestamp" here when it set to true?
(Will add the integration test soon)
There was a problem hiding this comment.
Thanks for the context, has been a while since working with the Hive catalog.
How about we default to "timestamp with local time zone" and add a HiveCatalog property (e.g. hive.hive2-compatible-mode) to use "timestamp" here when it set to true?
I think that would be a great solution 👍
Co-authored-by: Fokko Driesprong <fokko@apache.org>
|
@Fokko Thanks for reviewing! |
Fixes: #583
cc @Fokko