Feature proposal: Relation.update_all but with varying values

Hello hello, I am working on a variation of Relation.update_all that can set varying values per row affected. As I understand it there is no available high-level interface for such a query, the high-level workarounds being to do multiple update queries or use Relation.upsert_all. The interface is similar to Relation.upsert_all but the internals are more like Relation.update_all. current implementation

The update query employs the pattern of joining with a handwritten VALUES table incorporating both conditions and update values in each row. This is supported on all the engines – though certainly not without several annoyances on each one.

More importantly perhaps, checking EXPLAIN on MySQL and PostgreSQL suggests they do run the construct as one would expect, using the correct index for the join condition.

Example construct in PostgreSQL, with a join on the primary key:

UPDATE "books" SET "name" = "data"."column2" FROM (VALUES (1, 'Reword'), (2, 'Peopleware')) AS "data" ("column1", "column2") WHERE "books".id = "data"."column1"; 

It would be generated from this code:

# [[conditions, updates], ...] syntax Book.uniform_update_all([ [{ id: 1 }, { name: "Reword" }], [{ id: 2 }, { name: "Peopleware" }] ]) # shorthand syntax when conditions is the primary key Book.uniform_update_all({ 1 => { name: "Reword" }, 2 => { name: "Peopleware" }, }) 

More advanced example where the conditions are not the primary key:

# one-shot migration to combine two columns into one Book.uniform_update_all [ [{ written: false, published: false }, { status: :proposed }], [{ written: true, published: false }, { status: :written }], [{ written: true, published: true }, { status: :published }] ] 

It is “uniform” in the sense that all entries must use the same keys for conditions and values, just like Relation.insert_all.

Considerably more advanced example from the PostgreSQL docs linked before, which this solution cannot assemble for several reasons (non-trivial SET and JOIN):

UPDATE employees SET salary = salary * v.increase FROM (VALUES(1, 200000, 1.2), (2, 400000, 1.4)) AS v (depno, target, increase) WHERE employees.depno = v.depno AND employees.sales >= v.target; 

Some implementation details: we build the full Arel tree with a custom arel node for the VALUES table. Then we rely on Arel::Crud#compile_update just like Relation.update_all does to assemble the update query.

This approach looks like the most appropriate, as it will inherit where constraints just like update_all, but it will invariably break if the arel visitor later decides it must perform a subselect, as it hides the VALUES table from the SET clause.

This looks like a very general problem, one that other constructs would have as well, so I don’t think it means the implementation is wrong. It does mean that it will support only where and join basically, and none of these: order, limit, offset, group, having.

I am looking for feedback on the idea and implementation etc. Will send a pull request on github once I’ve cleaned the code up a bit more.

2 Likes

Just needed to do this, which brought me here. Glad that at least we have upsert_all that supports this already.

That said, it would be way more intuitive if update_all worked this way too. Weird that ActiveRecord won’t let me construct such a basic SQL query.

The only criticism I have here (without having read the implementation) is that uniform_update_all feels “unrailsy”. Usually Rails doesn’t add these types of adjectives to its methods. I bet it would be more welcome if the new capability was somehow baked into update_all itself. Of course that makes it a slightly more dangerous change, but this feature seems big enough to warrant at least a minor Rails version bump anyway.