Skip to content

[Data] unify_schemas() not robust to empty Schemas when processing struct columns. #55960

@anrooo

Description

@anrooo

What happened + What you expected to happen

With #53454 we now unify schemas in _bundles_to_block_list when materializing datasets. This is exposing a bug recently introduced into unify_schemas_with_validation. RefBundles can have empty schemas and in these cases the following line blows up:

field_types = [s.field(col_name).type for s in schemas]

You'll notice that on line 234 in this file for TensorArray types we check if the name of the column is present in the schema we're iterating over before retrieving the type. A similar check is needed for structs, otherwise if the overall dataset schema contains structs this method blows up.

cc: @srinathk10, this was a regression cherry-picked into ray2.48 it seems.

Versions / Dependencies

ray2.48

Reproduction script

n/a

Issue Severity

High: It blocks me from completing my task.

Metadata

Metadata

Assignees

Labels

P0Issues that should be fixed in short orderbugSomething that is supposed to be working; but isn'tdataRay Data-related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions