japila-books
diff --git a/‎docs/declarative-pipelines/FlowSystemMetadata.md‎
Lines changed: 16 additions & 0 deletions b/‎docs/declarative-pipelines/FlowSystemMetadata.md‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎docs/declarative-pipelines/PipelineUpdateContext.md‎
Lines changed: 14 additions & 2 deletions b/‎docs/declarative-pipelines/PipelineUpdateContext.md‎
Lines changed: 14 additions & 2 deletions
diff --git a/‎docs/declarative-pipelines/PipelineUpdateContextImpl.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/declarative-pipelines/PipelineUpdateContextImpl.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/declarative-pipelines/PipelinesHandler.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/declarative-pipelines/PipelinesHandler.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/declarative-pipelines/State.md‎
Lines changed: 3 additions & 0 deletions b/‎docs/declarative-pipelines/State.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/declarative-pipelines/StreamingFlow.md‎
Lines changed: 3 additions & 1 deletion b/‎docs/declarative-pipelines/StreamingFlow.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎docs/declarative-pipelines/StreamingFlowExecution.md‎
Lines changed: 11 additions & 1 deletion b/‎docs/declarative-pipelines/StreamingFlowExecution.md‎
Lines changed: 11 additions & 1 deletion
diff --git a/‎docs/declarative-pipelines/StreamingTableWrite.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/declarative-pipelines/StreamingTableWrite.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/declarative-pipelines/index.md‎
Lines changed: 4 additions & 1 deletion b/‎docs/declarative-pipelines/index.md‎
Lines changed: 4 additions & 1 deletion
@@ -0,0 +1,16 @@
+# FlowSystemMetadata
+
+## latestCheckpointLocation { #latestCheckpointLocation }
+
+```scala
+latestCheckpointLocation: String
+```
+
+`latestCheckpointLocation`...FIXME
+
+---
+
+`latestCheckpointLocation` is used when:
+
+* `FlowPlanner` is requested to [plan a StreamingFlow](FlowPlanner.md#plan)
+* `State` is requested to [reset a flow](State.md#reset)
@@ -2,9 +2,9 @@
 
 `PipelineUpdateContext` is an [abstraction](#contract) of [pipeline update contexts](#implementations) that can [refreshTables](#refreshTables) (_among other things_).
 
-## Contract (Subset) { #contract }
+## Contract (Subset)
 
-### refreshTables { #refreshTables }
+### refreshTables Table Filter { #refreshTables }
 
 ```scala
 refreshTables: TableFilter
@@ -15,6 +15,18 @@ Used when:
 * `DatasetManager` is requested to [constructFullRefreshSet](DatasetManager.md#constructFullRefreshSet)
 * `PipelineUpdateContext` is requested to [refreshFlows](PipelineUpdateContext.md#refreshFlows)
 
+### Root Storage Location { #storageRoot }
+
+```scala
+storageRoot: String
+```
+
+The root storage location of pipeline metadata (e.g., checkpoints for streaming flows)
+
+Used when:
+
+* `FlowSystemMetadata` is requested to [flowCheckpointsDirOpt](FlowSystemMetadata.md#flowCheckpointsDirOpt)
+
 ### Unresolved Dataflow Graph { #unresolvedGraph }
 
 ```scala
 
@@ -10,6 +10,7 @@
 * <span id="eventCallback"> `PipelineEvent` Callback (`PipelineEvent => Unit`)
 * <span id="refreshTables"> `TableFilter` of the tables to be refreshed (default: `AllTables`)
 * <span id="fullRefreshTables"> `TableFilter` of the tables to be refreshed (default: `NoTables`)
+* <span id="storageRoot"> [storageRoot](PipelineUpdateContext.md#storageRoot)
 
 `PipelineUpdateContextImpl` is created when:
 
 
@@ -28,7 +28,7 @@ handlePipelinesCommand(
 | `DROP_DATAFLOW_GRAPH` | [Drops a pipeline](#DROP_DATAFLOW_GRAPH) ||
 | `DEFINE_DATASET` | [Defines a dataset](#DEFINE_DATASET) | [SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_dataset) |
 | `DEFINE_FLOW` | [Defines a flow](#DEFINE_FLOW) | [SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_flow) |
-| `START_RUN` | [Starts a pipeline run](#START_RUN) | [pyspark.pipelines.spark_connect_pipeline.start_run](spark_connect_pipeline.md#start_run) |
+| `START_RUN` | [Starts a pipeline run](#START_RUN) | [pyspark.pipelines.spark_connect_pipeline](spark_connect_pipeline.md#start_run) |
 | `DEFINE_SQL_GRAPH_ELEMENTS` | [DEFINE_SQL_GRAPH_ELEMENTS](#DEFINE_SQL_GRAPH_ELEMENTS) | [SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_sql) |
 
 `handlePipelinesCommand` reports an `UnsupportedOperationException` for incorrect commands:
 
@@ -0,0 +1,3 @@
+# State
+
+`State` is...FIXME
@@ -2,6 +2,8 @@
 
 `StreamingFlow` is a [ResolvedFlow](ResolvedFlow.md) that may or may not be [append](#mustBeAppend).
 
+`StreamingFlow` represents an [UnresolvedFlow](UnresolvedFlow.md) with a [streaming dataframe](FlowFunctionResult.md#dataFrame) in a dataflow graph.
+
 `StreamingFlow` is [planned for execution](FlowPlanner.md#plan) as [StreamingTableWrite](StreamingTableWrite.md) (assuming that the [Output](DataflowGraph.md#output) of [this flow](#flow)'s [destination](ResolutionCompletedFlow.md#destinationIdentifier) is a [Table](Table.md)).
 
 ## Creating Instance
@@ -14,7 +16,7 @@
 
 `StreamingFlow` is created when:
 
-* `FlowResolver` is requested to [convertResolvedToTypedFlow](FlowResolver.md#convertResolvedToTypedFlow) (for [UnresolvedFlow](UnresolvedFlow.md)s with their results being streaming dataframes)
+* `FlowResolver` is requested to [convertResolvedToTypedFlow](FlowResolver.md#convertResolvedToTypedFlow) (for an [UnresolvedFlow](UnresolvedFlow.md) with a [streaming dataframe](FlowFunctionResult.md#dataFrame))
 
 ### mustBeAppend Flag { #mustBeAppend }
 
 
@@ -2,7 +2,17 @@
 
 `StreamingFlowExecution` is an [extension](#contract) of the [FlowExecution](FlowExecution.md) abstraction for [streaming flow executions](#implementations) that process data statefully using [Spark Structured Streaming]({{ book.structured_streaming }}).
 
-## Contract
+## Contract (Subset)
+
+### Checkpoint Location { #checkpointPath }
+
+```scala
+checkpointPath: String
+```
+
+Used when:
+
+* `StreamingTableWrite` is requested to [start a streaming query](StreamingTableWrite.md#startStream)
 
 ### Execute Streaming Query { #startStream }
 
 
@@ -12,7 +12,7 @@ When [executed](#startStream), `StreamingTableWrite` starts a streaming query to
 * <span id="flow"> [ResolvedFlow](StreamingFlowExecution.md#flow)
 * <span id="graph"> [DataflowGraph](DataflowGraph.md)
 * <span id="updateContext"> [PipelineUpdateContext](FlowExecution.md#updateContext)
-* <span id="checkpointPath"> [Checkpoint Path](StreamingFlowExecution.md#checkpointPath)
+* <span id="checkpointPath"> [Checkpoint Location](StreamingFlowExecution.md#checkpointPath)
 * <span id="trigger"> [Streaming Trigger](StreamingFlowExecution.md#trigger)
 * <span id="destination"> [Output table](FlowExecution.md#destination)
 * <span id="sqlConf"> [SQL Configuration](StreamingFlowExecution.md#sqlConf)
 
@@ -53,13 +53,16 @@ The following fields are supported:
 Field Name | Description
 -|-
  `name` (required) | &nbsp;
+ `storage` (required) | The root storage location of pipeline metadata (e.g., checkpoints for streaming flows).<br>[SPARK-53751 Explicit Checkpoint Location]({{ spark.jira }}/SPARK-53751)
  `catalog` | The default catalog to register datasets into.<br>Unless specified, [PipelinesHandler](PipelinesHandler.md#createDataflowGraph) falls back to the current catalog.
  `database` | The default database to register datasets into<br>Unless specified, [PipelinesHandler](PipelinesHandler.md#createDataflowGraph) falls back to the current database.
  `schema` | Alias of `database`. Used unless `database` is defined
- `storage` | ⚠️ does not seem to be used
  `configuration` | SparkSession configs<br>Spark Pipelines runtime uses the configs to build a new `SparkSession` when `run`.<br>[spark.sql.connect.serverStacktrace.enabled]({{ book.spark_connect }}/configuration-properties/#spark.sql.connect.serverStacktrace.enabled) is hardcoded to be always `false`.
  `libraries` | `glob`s of `include`s with transformations in [SQL](#sql) and [Python](#python-decorators)
 
+??? info
+ Pipeline spec is resolved in `pyspark/pipelines/cli.py::unpack_pipeline_spec`.
+
 ```yaml
 name: hello-spark-pipelines
 catalog: default_catalog