Fix: Handle Empty RecordBatch within _task_to_record_batches, fix correctness issue with positional deletes#1026
Conversation
| yield _to_requested_schema( | ||
| projected_schema, file_project_schema, batch, downcast_ns_timestamp_to_us=True, use_large_types=use_large_types | ||
| ) | ||
| current_index += len(batch) |
There was a problem hiding this comment.
When working on fixing #1024 I realized a correctness issue was introduced here because we are using the length of the filtered batch instead of the original one when tracking the current_index. I think it'll be crucial to get this fix in with 0.7.1 as soon as possible to support our MOR users
There was a problem hiding this comment.
Oof, that's a good find. Thanks @vhnguyenae for reporting this!
The order of applying filters also caught me when implementing positional deletes. In the long run, I think it would be good to push this down to Arrow, I created an issue a while ago: apache/arrow#35301 But that hasn't seen much traction.
_task_to_record_batches_task_to_record_batches, fix correctness issue with positional deletes
|
Thanks for fixing this so quickly @sungwy. Let's speed up the release of 0.7.1 and ship this to the users. |
Fixes: #1024
As well as a correctness issue when reading MOR tables with positional deletes