-
- Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Describe the use case
Currently, the selectin_polymorphic loading strategy will generate an IN (...) clause with an unbound number of parameters. This is problematic when working with an Oracle database, because the database will generate an error when the clause has more then 1000 parameters.
When querying the joined inheritance model directly, the user can chunk manually and circumvent this. However, when eagerly loaded as a part of a many-to-one relationship, this can lead to data-dependent crashes when more than 1000 objects would be loaded.
It would be great to have (optional?) behaviour similar to the selectin loading strategy for relationships, which already chunks in batches of 500.
Databases / Backends / Drivers targeted
Oracle databases. We are using oracledb, but it probably affects all drivers
Example Use
Slightly modified example from here https://docs.sqlalchemy.org/en/20/orm/queryguide/_inheritance_setup.html
To be executable without an Oracle database, it uses SQLite (which will work), but it will crash when executed against an Oracle database.
from typing import List import sqlalchemy from sqlalchemy import create_engine, ForeignKey, Integer, String, select from sqlalchemy.orm import DeclarativeBase from sqlalchemy.orm import Mapped from sqlalchemy.orm import mapped_column from sqlalchemy.orm import relationship from sqlalchemy.orm import Session class Base(DeclarativeBase): pass class Company(Base): __tablename__ = "company" id: Mapped[int] = mapped_column(primary_key=True) name: Mapped[str] employees: Mapped[List["Employee"]] = relationship(back_populates="company", lazy='selectin') class Employee(Base): __tablename__ = "employee" id = mapped_column(Integer, primary_key=True) name = mapped_column(String(50)) type = mapped_column(String(50)) company_id: Mapped[int] = mapped_column(ForeignKey("company.id")) company: Mapped[Company] = relationship(back_populates="employees") __mapper_args__ = {"polymorphic_identity": "employee", "polymorphic_on": type} class Engineer(Employee): __tablename__ = "engineer" id = mapped_column(Integer, ForeignKey("employee.id"), primary_key=True) engineer_info = mapped_column(String(30)) __mapper_args__ = { "polymorphic_load": "selectin", "polymorphic_identity": "engineer", } class Manager(Employee): __tablename__ = "manager" id = mapped_column(Integer, ForeignKey("employee.id"), primary_key=True) manager_name = mapped_column(String(30)) __mapper_args__ = { "polymorphic_load": "selectin", "polymorphic_identity": "manager", } engine = create_engine("sqlite://", echo=True) Base.metadata.create_all(engine) conn = engine.connect() session = Session(conn) managers = [Manager(name=f"Manager {i}", manager_name=f"Vice President No. {i}") for i in range(0, 1100)] engineers = [Engineer(name="SpongeBob", engineer_info="Krabby Patty Master")] company = Company(name='Krusty Krab Group', employees=managers + engineers) session.add(company) session.commit() session.expunge_all() query = select(Company) results = session.execute(query).scalars().all() This will generate the following SQL statement:
SELECT manager.id AS manager_id, employee.id AS employee_id, employee.type AS employee_type, manager.manager_name AS manager_manager_name FROM employee JOIN manager ON employee.id = manager.id WHERE employee.id IN (?, ?, ?, [...] ?, ?) order by employee.id with the following parameters
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50 ... 1000 parameters truncated ... 1051, 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100) When querying Employees directly, chunking manually is possible:
chunk_size = 500 employee_query = select(Employee) employee_results = list(itertools.chain.from_iterable( session.execute(employee_query).yield_per(chunk_size).partitions(chunk_size))) And this will then emit 3 SQL statements, each with a subset of the ids. But I don't see a way to avoid it when querying Company (except for changing the loading strategy)
Additional context
There may be a discussion, whether selectin_polymorphic is the correct loading strategy in such cases and in the above example with_polymophic might perform better. In a real world example we had, it was part of a more complicated inheritance structure, which may not perform well with with_polymorphic. Most of the time, we do not load that many objects, so crashes happened only occasionally but then required manual clean-up in an automated system.
As we do not have a 1:1000 relationship in our case, but more like 1:2 on average, we made a workaround to load the objects in chunks of 200, which works but still has the danger of crashing when the data is not as expected.