55<titleabbrev>How checkpoints work</titleabbrev>
66++++
77
8- Each time a {transform} examines the source indices and creates or
9- updates the destination index, it generates a _checkpoint_.
8+ Each time a {transform} examines the source indices and creates or updates the
9+ destination index, it generates a _checkpoint_.
1010
11- If your {transform} runs only once, there is logically only one
12- checkpoint. If your {transform} runs continuously, however, it creates
13- checkpoints as it ingests and transforms new source data.
11+ If your {transform} runs only once, there is logically only one checkpoint. If
12+ your {transform} runs continuously, however, it creates checkpoints as it
13+ ingests and transforms new source data.
1414
1515To create a checkpoint, the {ctransform}:
1616
1717. Checks for changes to source indices.
1818+
19- Using a simple periodic timer, the {transform} checks for changes to
20- the source indices. This check is done based on the interval defined in the
21- transform's `frequency` property.
19+ Using a simple periodic timer, the {transform} checks for changes to the source
20+ indices. This check is done based on the interval defined in the transform's
21+ `frequency` property.
2222+
2323If the source indices remain unchanged or if a checkpoint is already in progress
2424then it waits for the next timer.
2525
2626. Identifies which entities have changed.
2727+
28- The {transform} searches to see which entities have changed since the
29- last time it checked. The `sync` configuration object in the {transform}
30- identifies a time field in the source indices. The {transform} uses the values
31- in that field to synchronize the source and destination indices.
28+ The {transform} searches to see which entities have changed since the last time
29+ it checked. The `sync` configuration object in the {transform} identifies a time
30+ field in the source indices. The {transform} uses the values in that field to
31+ synchronize the source and destination indices.
3232
3333. Updates the destination index (the {dataframe}) with the changed entities.
3434+
3535--
36- The {transform} applies changes related to either new or changed
37- entities to the destination index. The set of changed entities is paginated. For
38- each page, the {transform} performs a composite aggregation using a
39- `terms` query. After all the pages of changes have been applied, the checkpoint
40- is complete.
36+ The {transform} applies changes related to either new or changed entities to the
37+ destination index. The set of changed entities is paginated. For each page, the
38+ {transform} performs a composite aggregation using a `terms` query. After all
39+ the pages of changes have been applied, the checkpoint is complete.
4140--
4241
4342This checkpoint process involves both search and indexing activity on the
4443cluster. We have attempted to favor control over performance while developing
45- {transforms}. We decided it was preferable for the
46- {transform} to take longer to complete, rather than to finish quickly
47- and take precedence in resource consumption. That being said, the cluster still
48- requires enough resources to support both the composite aggregation search and
49- the indexing of its results.
44+ {transforms}. We decided it was preferable for the {transform} to take longer to
45+ complete, rather than to finish quickly and take precedence in resource
46+ consumption. That being said, the cluster still requires enough resources to
47+ support both the composite aggregation search and the indexing of its results.
5048
5149TIP: If the cluster experiences unsuitable performance degradation due to the
5250{transform}, stop the {transform} and refer to <<transform-performance>>.
@@ -63,20 +61,18 @@ persisted periodically.
6361Checkpoint failures can be categorized as follows:
6462
6563* Temporary failures: The checkpoint is retried. If 10 consecutive failures
66- occur, the {transform} has a failed status. For example, this
67- situation might occur when there are shard failures and queries return only
68- partial results.
69- * Irrecoverable failures: The {transform} immediately fails. For
70- example, this situation occurs when the source index is not found.
71- * Adjustment failures: The {transform} retries with adjusted settings.
72- For example, if a parent circuit breaker memory errors occur during the
73- composite aggregation, the {transform} receives partial results. The aggregated
74- search is retried with a smaller number of buckets. This retry is performed at
75- the interval defined in the `frequency` property for the {transform}. If the
76- search is retried to the point where it reaches a minimal number of buckets, an
64+ occur, the {transform} has a failed status. For example, this situation might
65+ occur when there are shard failures and queries return only partial results.
66+ * Irrecoverable failures: The {transform} immediately fails. For example, this
67+ situation occurs when the source index is not found.
68+ * Adjustment failures: The {transform} retries with adjusted settings. For
69+ example, if a parent circuit breaker memory errors occur during the composite
70+ aggregation, the {transform} receives partial results. The aggregated search is
71+ retried with a smaller number of buckets. This retry is performed at the
72+ interval defined in the `frequency` property for the {transform}. If the search
73+ is retried to the point where it reaches a minimal number of buckets, an
7774irrecoverable failure occurs.
7875
79- If the node running the {transforms} fails, the {transform} restarts
80- from the most recent persisted cursor position. This recovery process might
81- repeat some of the work the {transform} had already done, but it ensures data
82- consistency.
76+ If the node running the {transforms} fails, the {transform} restarts from the
77+ most recent persisted cursor position. This recovery process might repeat some
78+ of the work the {transform} had already done, but it ensures data consistency.
0 commit comments