Skip to content

order of relationship declaration affecting highly complex self-referential joined eager load when "innerjoin" is mixed in #11449

@b-x

Description

@b-x

Describe the bug

While doing migration to sqlalchemy 2.0, I've found that changing:

    child_nodes = relationship("Node", backref=backref("common_node", remote_side=[id]))

to:

    common_node = relationship("Node", back_populates="child_nodes", remote_side=[id])
    child_nodes = relationship("Node", back_populates="common_node")

leads to invalid results in some specific queries.
Please see attached example.

After some investigation, I noticed different order of joins in generated sql query.

I've found two workarounds:

  1. Use joinedload with innerjoin=True, so that all the joins are inner joins (but I can't use it in all my cases)
  2. Reorder relationship with innerjoin=True field and place it before other fields (btw. I found no mention in documentation that order of relationship fields matters)

Optional link from https://docs.sqlalchemy.org which documents the behavior that is expected

No response

SQLAlchemy Version in Use

1.4.52, 2.0.30

DBAPI (i.e. the database driver)

any

Database Vendor and Major Version

any

Python Version

3.10

Operating system

Linux

To Reproduce

from sqlalchemy import Column, Integer, String, ForeignKey, create_engine, select
from sqlalchemy.orm import declarative_base, relationship, Session, joinedload, backref

Base = declarative_base()


class Kind(Base):
    __tablename__ = "kind"
    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)


class Node(Base):
    __tablename__ = "node"
    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)
    common_node_id = Column(Integer, ForeignKey("node.id"), nullable=True)
    kind_id = Column(Integer, ForeignKey(Kind.id), nullable=False)

    # >>> v1 - valid results
    # child_nodes = relationship("Node", backref=backref("common_node", remote_side=[id]))
    # <<<
    # >>> v2 - invalid results
    common_node = relationship("Node", back_populates="child_nodes", remote_side=[id])
    child_nodes = relationship("Node", back_populates="common_node")
    # <<<
    kind = relationship(Kind, innerjoin=True, lazy="joined")  # order matters!!!


class NodeGroup(Base):
    __tablename__ = "node_group"
    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)
    nodes = relationship(Node, secondary="node_group_node")


class NodeGroupNode(Base):
    __tablename__ = "node_group_node"
    node_group_id = Column(Integer, ForeignKey(NodeGroup.id), primary_key=True)
    node_id = Column(Integer, ForeignKey(Node.id), primary_key=True)


def test():
    engine = create_engine("sqlite://", echo=True, future=True)
    Base.metadata.create_all(engine)
    with Session(engine) as session:
        kind_a = Kind(name="a")
        kind_c = Kind(name="c")
        session.add_all([kind_a, kind_c])
        session.flush()
        common_node = Node(name="nc", kind_id=kind_c.id)
        session.add(common_node)
        session.flush()
        node_a = Node(name="na", common_node_id=common_node.id, kind_id=kind_a.id)
        session.add(node_a)
        session.flush()
        node_group = NodeGroup(name="group")
        session.add(node_group)
        session.flush()
        node_group_a = NodeGroupNode(node_group_id=node_group.id, node_id=node_a.id)
        session.add(node_group_a)
        session.commit()

    with Session(engine) as session:
        print("------------------------")
        group = session.scalars(
            select(NodeGroup)
            .where(NodeGroup.name == "group")
            .options(
                joinedload(NodeGroup.nodes)
                .joinedload(Node.common_node)
            )
        ).unique().one_or_none()

        assert group.nodes[0].common_node.kind.name == "c"
        assert group.nodes[0].kind.name == "a"  # wrong result here !!!!!

Error

Invalid query:

SELECT
    node_group.id, node_group.name,
    kind_1.id AS id_1, kind_1.name AS name_1,
    node_1.id AS id_2, node_1.name AS name_2, node_1.common_node_id, node_1.kind_id,
    kind_2.id AS id_3, kind_2.name AS name_3,
    node_2.id AS id_4, node_2.name AS name_4, node_2.common_node_id AS common_node_id_1, node_2.kind_id AS kind_id_1
FROM node_group
LEFT OUTER JOIN (
    node_group_node AS node_group_node_1
    JOIN node AS node_2 ON node_2.id = node_group_node_1.node_id
) ON node_group.id = node_group_node_1.node_group_id
LEFT OUTER JOIN (
    node AS node_1
    JOIN kind AS kind_2 ON kind_2.id = node_1.kind_id
    JOIN kind AS kind_1 ON kind_1.id = node_1.kind_id
) ON node_1.id = node_2.common_node_id
WHERE node_group.name = ?

Valid query:

SELECT
    node_group.id, node_group.name,
    kind_1.id AS id_1, kind_1.name AS name_1,
    node_1.id AS id_2, node_1.name AS name_2, node_1.common_node_id, node_1.kind_id,
    kind_2.id AS id_3, kind_2.name AS name_3,
    node_2.id AS id_4, node_2.name AS name_4, node_2.common_node_id AS common_node_id_1, node_2.kind_id AS kind_id_1
FROM node_group
LEFT OUTER JOIN  (
    node_group_node AS node_group_node_1
    JOIN node AS node_1 ON node_1.id = node_group_node_1.node_id
    JOIN kind AS kind_1 ON kind_1.id = node_1.kind_id
) ON node_group.id = node_group_node_1.node_group_id
LEFT OUTER JOIN (
    node AS node_2
    JOIN kind AS kind_2 ON kind_2.id = node_2.kind_id
) ON node_2.id = node_1.common_node_id
WHERE node_group.name = ?

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    big joba major change that requires a lot of knowledge and workbugSomething isn't workingloader optionsORM options like joinedload(), load_only(), these are complicated and have a lot of issuesormquagmirereally hard to make the issue work "correctly" without lots of complication, risk

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions