-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changeloggood first issueGood for newcomersGood for newcomersparquetChanges to the parquet crateChanges to the parquet crate
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The Variant encoding uses different sizes for offsets in nested types to optimize the encoding size
Specifically
- Arrays (
VariantList) use 1 byte and 4 bytes for length when the list has <=256 and >256 list elements respectively - Arrays (
VariantList) use 1 byte and 4 bytes for the offset depending on how large the total variant payload size is
Describe the solution you'd like
I would like tests that use the VariantBuilder API and cover the following cases:
VariantListwith more than 256 elementsVariantListwith total child data length between 2^8 and 2^16 (field_offset_size_minus_1= 1, 2 byte offsets)VariantListwith total child data length between 2^16 and 2^24 (field_offset_size_minus_1= 2, 3 byte offsets)VariantListwith total child data length between 2^24 and 2^32 (field_offset_size_minus_1= 3, 4 byte offsets)
The "total child data length" can be made by adding some large strings as children (for example, by adding 1KB - 1MB Varaint::Strings via ListBuilder::append)
Describe alternatives you've considered
Additional context
The tests may not pass until this ticket is implemented:
If they dont' pass, we can mark them #[ignored] until it is
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changeloggood first issueGood for newcomersGood for newcomersparquetChanges to the parquet crateChanges to the parquet crate