Commit e7ef57f
[data] Update streaming_repartition and map_batches_fusion (#59476)
Analysis of the two operator patterns: ## Streaming_repartition → map_batches | | Number of `map_batches` tasks | |----------------------|---------------------------------------------------------------------------| | **Fused** | `num_input_blocks` (which is ≤ number of output blocks of StreamingRepartition) | | **Not fused** | number of output blocks of StreamingRepartition | When fused, the number of tasks equals the number of input blocks, which is ≤ the number of output blocks of StreamingRepartition. If StreamingRepartition is supposed to break down blocks to increase parallelism, that won't happen when fused. So we don't fuse. --- ## Map_batches → streaming_repartition `batch_size % target_num_rows == 0` | | Number of `map_batches` tasks | |----------------------|-------------------------------| | **Fused** | == total_rows / batch_size | | **Not fused** | == total_rows / batch_size | So, the fusion doesn’t affect the parallelism. --- Thus, we currently disable the `Streaming_repartition → map_batches` fusion and enable the fusion when `batch_size % target_num_rows == 0` for `Map_batches → streaming_repartition`. --------- Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com> Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Alexey Kudinkin <alexey.kudinkin@gmail.com> Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>1 parent 196c678 commit e7ef57f
File tree
3 files changed
+49
-9
lines changed- python/ray/data
- _internal/logical/rules
- tests
3 files changed
+49
-9
lines changedLines changed: 37 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
93 | 117 | | |
94 | 118 | | |
95 | 119 | | |
| |||
252 | 276 | | |
253 | 277 | | |
254 | 278 | | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
255 | 287 | | |
256 | | - | |
| 288 | + | |
| 289 | + | |
257 | 290 | | |
258 | 291 | | |
259 | 292 | | |
| |||
276 | 309 | | |
277 | 310 | | |
278 | 311 | | |
279 | | - | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
280 | 315 | | |
281 | 316 | | |
282 | 317 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
745 | 745 | | |
746 | 746 | | |
747 | 747 | | |
748 | | - | |
| 748 | + | |
749 | 749 | | |
750 | 750 | | |
| 751 | + | |
751 | 752 | | |
752 | 753 | | |
753 | 754 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
233 | 233 | | |
234 | 234 | | |
235 | 235 | | |
| 236 | + | |
236 | 237 | | |
237 | 238 | | |
238 | 239 | | |
239 | 240 | | |
240 | 241 | | |
| 242 | + | |
241 | 243 | | |
242 | 244 | | |
243 | 245 | | |
244 | 246 | | |
245 | 247 | | |
246 | 248 | | |
| 249 | + | |
247 | 250 | | |
248 | 251 | | |
249 | 252 | | |
250 | 253 | | |
251 | | - | |
| 254 | + | |
252 | 255 | | |
253 | 256 | | |
254 | 257 | | |
| |||
270 | 273 | | |
271 | 274 | | |
272 | 275 | | |
| 276 | + | |
273 | 277 | | |
274 | | - | |
275 | | - | |
| 278 | + | |
| 279 | + | |
276 | 280 | | |
277 | | - | |
278 | | - | |
| 281 | + | |
| 282 | + | |
279 | 283 | | |
280 | 284 | | |
281 | 285 | | |
| |||
286 | 290 | | |
287 | 291 | | |
288 | 292 | | |
289 | | - | |
| 293 | + | |
290 | 294 | | |
291 | 295 | | |
292 | 296 | | |
| |||
0 commit comments