- Notifications
You must be signed in to change notification settings - Fork 9.4k
Description
Preconditions and environment
- 2.4.5-p3 (but it's older problem and still exists on development branch)
- Indexers in scheduled mode
- Default batch size: 1000
- In indexer change-log more than 1000 unique entity_id's to be indexed and lot of duplicated records (preferable 10 000+)
Steps to reproduce
Run indexer in mview mode, for example by running cron job indexer_update_all_views
.
Example based on catalog_product_price index:
Current version_id = 40000
Last version_id = 66630
Index schedule_status (unique product_ids): idle (1093 in backlog)
Expected result
Only a total of 1093 products should be indexed, each product once.
Actual result
Some products are indexed more than once.
Magento\Framework\Mview\View
uses Magento\Framework\Mview\View\ChangeLogBatchWalker
to iterate through collection
However first batch is taken by select:
SELECT DISTINCT `catalog_product_price_cl`.`entity_id` FROM `catalog_product_price_cl` WHERE (version_id > 40000) AND (version_id <= 66630) GROUP BY `entity_id` LIMIT 1000
which returns 1000 IDs which are passed to indexer and indexed.
Second batch:
SELECT DISTINCT `catalog_product_price_cl`.`entity_id` FROM `catalog_product_price_cl` WHERE (version_id > 41000) AND (version_id <= 66630) GROUP BY `entity_id` LIMIT 1000
which also returns 1000 IDs which are passed to indexer and indexed.
This is because we take unique IDs from whole range, and we increase only vsFrom by batchSize (@see Magento\Framework\Mview\View
)
In my case to index those unique 1093 products it needed to run 27 batches, each batch processed this amount of products:
1000
1000
1000
968
927
885
843
802
760
718
677
635
593
552
510
468
427
385
343
302
260
218
177
135
93
52
12
Which adds up to total of 14742 indexer executions, so the last 12 products were reindexed 27 times in each batch.
Additional information
Magento\Framework\Mview\View
Each batch query:
Current: $vsFrom to $currentVersionId
Should be: $vsFrom to ($vsFrom + $batchSize).
So instead of batch queries:
WHERE (version_id > 40000) AND (version_id <= 66630) WHERE (version_id > 41000) AND (version_id <= 66630) WHERE (version_id > 42000) AND (version_id <= 66630) WHERE (version_id > 43000) AND (version_id <= 66630) WHERE (version_id > 44000) AND (version_id <= 66630) WHERE (version_id > 45000) AND (version_id <= 66630) WHERE (version_id > 46000) AND (version_id <= 66630) WHERE (version_id > 47000) AND (version_id <= 66630) WHERE (version_id > 48000) AND (version_id <= 66630) WHERE (version_id > 49000) AND (version_id <= 66630) WHERE (version_id > 50000) AND (version_id <= 66630) WHERE (version_id > 51000) AND (version_id <= 66630) WHERE (version_id > 52000) AND (version_id <= 66630) WHERE (version_id > 53000) AND (version_id <= 66630) WHERE (version_id > 54000) AND (version_id <= 66630) WHERE (version_id > 55000) AND (version_id <= 66630) WHERE (version_id > 56000) AND (version_id <= 66630) WHERE (version_id > 57000) AND (version_id <= 66630) WHERE (version_id > 58000) AND (version_id <= 66630) WHERE (version_id > 59000) AND (version_id <= 66630) WHERE (version_id > 60000) AND (version_id <= 66630) WHERE (version_id > 61000) AND (version_id <= 66630) WHERE (version_id > 62000) AND (version_id <= 66630) WHERE (version_id > 63000) AND (version_id <= 66630) WHERE (version_id > 64000) AND (version_id <= 66630) WHERE (version_id > 65000) AND (version_id <= 66630) WHERE (version_id > 66000) AND (version_id <= 66630)
Each batch query should look like:
WHERE (version_id > 40000) AND (version_id <= 41000) WHERE (version_id > 41000) AND (version_id <= 42000) WHERE (version_id > 42000) AND (version_id <= 43000) WHERE (version_id > 43000) AND (version_id <= 44000) WHERE (version_id > 44000) AND (version_id <= 45000) WHERE (version_id > 45000) AND (version_id <= 46000) WHERE (version_id > 46000) AND (version_id <= 47000) WHERE (version_id > 47000) AND (version_id <= 48000) WHERE (version_id > 48000) AND (version_id <= 49000) WHERE (version_id > 49000) AND (version_id <= 50000) WHERE (version_id > 50000) AND (version_id <= 51000) WHERE (version_id > 51000) AND (version_id <= 52000) WHERE (version_id > 52000) AND (version_id <= 53000) WHERE (version_id > 53000) AND (version_id <= 54000) WHERE (version_id > 54000) AND (version_id <= 55000) WHERE (version_id > 55000) AND (version_id <= 56000) WHERE (version_id > 56000) AND (version_id <= 57000) WHERE (version_id > 57000) AND (version_id <= 58000) WHERE (version_id > 58000) AND (version_id <= 59000) WHERE (version_id > 59000) AND (version_id <= 60000) WHERE (version_id > 60000) AND (version_id <= 61000) WHERE (version_id > 61000) AND (version_id <= 62000) WHERE (version_id > 62000) AND (version_id <= 63000) WHERE (version_id > 63000) AND (version_id <= 64000) WHERE (version_id > 64000) AND (version_id <= 65000) WHERE (version_id > 65000) AND (version_id <= 66000) WHERE (version_id > 66000) AND (version_id <= 66630)
Release note
No response
Triage and priority
- Severity: S0 - Affects critical data or functionality and leaves users without workaround.
- Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
- Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
- Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
- Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.