[Variant] VariantMetadata is allowed to contain the empty string#7956
Merged
alamb merged 6 commits intoapache:mainfrom Jul 18, 2025
Merged
[Variant] VariantMetadata is allowed to contain the empty string#7956alamb merged 6 commits intoapache:mainfrom
alamb merged 6 commits intoapache:mainfrom
Conversation
Contributor
|
FYI @codephage2020 |
alamb
approved these changes
Jul 18, 2025
Contributor
alamb
left a comment
There was a problem hiding this comment.
Thanks @scovich
I also pushed another test to this PR that fails without this change:
#[test]
fn test_variant_object_empty_fields() {
let mut builder = VariantBuilder::new();
builder.new_object()
.with_field("", 42)
.finish().unwrap();
let (metadata, value) = builder.finish();
// Resulting object is valid and has a single empty field
let variant = Variant::try_new(&metadata, &value).unwrap();
let variant_obj = variant.as_object().unwrap();
assert_eq!(variant_obj.len(), 1);
assert_eq!(variant_obj.get(""), Some(Variant::from(42)));
}
alamb
reviewed
Jul 18, 2025
| // Ensure the StructArray has a metadata field of BinaryView | ||
|
|
||
| let Some(metadata_field) = VariantArray::find_metadata_field(&inner) else { | ||
| let Some(metadata_field) = VariantArray::find_metadata_field(inner) else { |
Contributor
There was a problem hiding this comment.
clippy was complaining about this locally so I fixed it
Contributor
There was a problem hiding this comment.
There was a gap in CI, I have a PR to fix it here:
|
|
||
| let mut offsets_iter = map_bytes_to_offsets(offset_bytes, self.header.offset_size); | ||
| let mut current_offset = offsets_iter.next().unwrap_or(0); | ||
| let mut offsets = map_bytes_to_offsets(offset_bytes, self.header.offset_size); |
Contributor
There was a problem hiding this comment.
An insignificant point. I named it *_iter, which exists in both metadata and object. If you want to make modifications, they should be consistent.
alamb
added a commit
that referenced
this pull request
Jul 18, 2025
# Which issue does this PR close? - Related to #6736 # Rationale for this change I noticed in #7956 that some Clippy errors were introduced but not caught by CI. # What changes are included in this PR? Add `parquet-variant-compute` to the CI for parqet-variant related PRs # Are these changes tested? It is only tests # Are there any user-facing changes? No
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Introduced a minor regression, in (accidentally?) forbidding the empty string as a dictionary key. Fix the bug and simplify the code a bit further while we're at it.
What changes are included in this PR?
Revert the unsorted dictionary check back to what it had been (it just uses
Iterator::is_sorted_bynow, instead ofprimitive.slice::is_sorted_by).Remove the redundant offset monotonicity check from the ordered dictionary path, relying on the fact that string slice extraction will anyway fail if the offsets are not monotonic. Improve the error message now that it does double duty.
Are these changes tested?
New unit tests for dictionaries containing the empty string. As a side effect, we now have at least a little coverage for sorted dictionaries -- somehow, I couldn't find any existing unit test that creates a sorted dictionary??
Are there any user-facing changes?
No