What happened + What you expected to happen
With #53454 we now unify schemas in _bundles_to_block_list when materializing datasets. This is exposing a bug recently introduced into unify_schemas_with_validation. RefBundles can have empty schemas and in these cases the following line blows up:
|
field_types = [s.field(col_name).type for s in schemas] |
You'll notice that on line 234 in this file for TensorArray types we check if the name of the column is present in the schema we're iterating over before retrieving the type. A similar check is needed for structs, otherwise if the overall dataset schema contains structs this method blows up.
cc: @srinathk10, this was a regression cherry-picked into ray2.48 it seems.
Versions / Dependencies
ray2.48
Reproduction script
n/a
Issue Severity
High: It blocks me from completing my task.
What happened + What you expected to happen
With #53454 we now unify schemas in
_bundles_to_block_listwhen materializing datasets. This is exposing a bug recently introduced intounify_schemas_with_validation. RefBundles can have empty schemas and in these cases the following line blows up:ray/python/ray/data/_internal/arrow_ops/transform_pyarrow.py
Line 264 in 36cfce8
You'll notice that on line 234 in this file for TensorArray types we check if the name of the column is present in the schema we're iterating over before retrieving the type. A similar check is needed for structs, otherwise if the overall dataset schema contains structs this method blows up.
cc: @srinathk10, this was a regression cherry-picked into ray2.48 it seems.
Versions / Dependencies
ray2.48
Reproduction script
n/a
Issue Severity
High: It blocks me from completing my task.