Chunked selectin_polymorphic loading strategy for joined inheritance

Describe the use case

Currently, the selectin_polymorphic loading strategy will generate an IN (...) clause with an unbound number of parameters. This is problematic when working with an Oracle database, because the database will generate an error when the clause has more then 1000 parameters.

When querying the joined inheritance model directly, the user can chunk manually and circumvent this. However, when eagerly loaded as a part of a many-to-one relationship, this can lead to data-dependent crashes when more than 1000 objects would be loaded.

It would be great to have (optional?) behaviour similar to the selectin loading strategy for relationships, which already chunks in batches of 500.

Databases / Backends / Drivers targeted

Oracle databases. We are using oracledb, but it probably affects all drivers

Example Use

Slightly modified example from here https://docs.sqlalchemy.org/en/20/orm/queryguide/_inheritance_setup.html

To be executable without an Oracle database, it uses SQLite (which will work), but it will crash when executed against an Oracle database.

from typing import List import sqlalchemy from sqlalchemy import create_engine, ForeignKey, Integer, String, select from sqlalchemy.orm import DeclarativeBase from sqlalchemy.orm import Mapped from sqlalchemy.orm import mapped_column from sqlalchemy.orm import relationship from sqlalchemy.orm import Session class Base(DeclarativeBase): pass class Company(Base): __tablename__ = "company" id: Mapped[int] = mapped_column(primary_key=True) name: Mapped[str] employees: Mapped[List["Employee"]] = relationship(back_populates="company", lazy='selectin') class Employee(Base): __tablename__ = "employee" id = mapped_column(Integer, primary_key=True) name = mapped_column(String(50)) type = mapped_column(String(50)) company_id: Mapped[int] = mapped_column(ForeignKey("company.id")) company: Mapped[Company] = relationship(back_populates="employees") __mapper_args__ = {"polymorphic_identity": "employee", "polymorphic_on": type} class Engineer(Employee): __tablename__ = "engineer" id = mapped_column(Integer, ForeignKey("employee.id"), primary_key=True) engineer_info = mapped_column(String(30)) __mapper_args__ = { "polymorphic_load": "selectin", "polymorphic_identity": "engineer", } class Manager(Employee): __tablename__ = "manager" id = mapped_column(Integer, ForeignKey("employee.id"), primary_key=True) manager_name = mapped_column(String(30)) __mapper_args__ = { "polymorphic_load": "selectin", "polymorphic_identity": "manager", } engine = create_engine("sqlite://", echo=True) Base.metadata.create_all(engine) conn = engine.connect() session = Session(conn) managers = [Manager(name=f"Manager {i}", manager_name=f"Vice President No. {i}") for i in range(0, 1100)] engineers = [Engineer(name="SpongeBob", engineer_info="Krabby Patty Master")] company = Company(name='Krusty Krab Group', employees=managers + engineers) session.add(company) session.commit() session.expunge_all() query = select(Company) results = session.execute(query).scalars().all()

This will generate the following SQL statement:

SELECT manager.id AS manager_id, employee.id AS employee_id, employee.type AS employee_type, manager.manager_name AS manager_manager_name FROM employee JOIN manager ON employee.id = manager.id WHERE employee.id IN (?, ?, ?, [...] ?, ?) order by employee.id

with the following parameters

(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 ... 1000 parameters truncated ... 1051, 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100)

When querying Employees directly, chunking manually is possible:

chunk_size = 500 employee_query = select(Employee) employee_results = list(itertools.chain.from_iterable( session.execute(employee_query).yield_per(chunk_size).partitions(chunk_size)))

And this will then emit 3 SQL statements, each with a subset of the ids. But I don't see a way to avoid it when querying Company (except for changing the loading strategy)

Additional context

There may be a discussion, whether selectin_polymorphic is the correct loading strategy in such cases and in the above example with_polymophic might perform better. In a real world example we had, it was part of a more complicated inheritance structure, which may not perform well with with_polymorphic. Most of the time, we do not load that many objects, so crashes happened only occasionally but then required manual clean-up in an automated system.

As we do not have a 1:1000 relationship in our case, but more like 1:2 on average, we made a workaround to load the objects in chunks of 200, which works but still has the danger of crashing when the data is not as expected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Chunked selectin_polymorphic loading strategy for joined inheritance #12790

Describe the use case

Databases / Backends / Drivers targeted

Example Use

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Chunked selectin_polymorphic loading strategy for joined inheritance #12790

Description

Describe the use case

Databases / Backends / Drivers targeted

Example Use

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions