Fix overwrite when filtering all the data#1023
Conversation
|
Just wanted to confirm that I tried this out with the table that caused my issue #1020 and it works as expected |
|
Hi @ndrluis - thanks for testing and fixing this tricky issue. |
1d1f987 to
ac7b4db
Compare
| if len(filtered_df) == 0: | ||
| replaced_files.append((original_file.file, [])) | ||
| elif len(df) != len(filtered_df): |
There was a problem hiding this comment.
nit: is it more readable if inlined?
if filtered_df and len(df) != len(filtered_df):
There was a problem hiding this comment.
To be honest, I don't see much of a difference.
| tbl = _create_table(session_catalog, identifier, data=[data], schema=schema) | ||
| tbl.overwrite(data, In("id", ["1", "2", "3"])) | ||
|
|
||
| assert len(tbl.scan().to_arrow()) == 3 |
There was a problem hiding this comment.
nit: since all data match the filter, the overwrite operation is a no-op, right? if so, can we assert that in the test? maybe show that the files are the same
There was a problem hiding this comment.
It's not a no-op, it's deleting the whole file. The change is in the delete method, not in the overwrite method.
I believe that testing the behavior is enough.
There was a problem hiding this comment.
ah, I see. The change is to make delete a no-op.
Sequence of operation
- pass in `overwrite_filter which matches the entire table
- in
delete, theoverwrite_filteris inversed,preserve_row_filter - use
preserve_row_filteron data files. - if the result is empty, then we don't include this data file in deletion
Previously, we end up trying to write an empty data file.
There was a problem hiding this comment.
Yes exactly. One thing to note is that it would be even more correct to add this to a DELETE snapshot, it is not replaced, but just dropped. Please note that most engines just use OVERWRITE.
ac7b4db to
486dd61
Compare
Fixes #1020