To help avoid incurring Google Cloud charges for an inactive cluster, use Dataproc's Cluster Scheduled Deletion feature when you create a cluster. This feature provides options to delete a cluster upon the happening of the following events:
- after a specified cluster idle period
- at a specified future time
- after a specified period that starts from the time of submission of the cluster creation request
Actions that disable scheduled deletion
While a cluster is running, the following actions disable scheduled deletion until the disabling action is reversed:
- Removing IAM Dataproc Service Agent role on the Dataproc Service Agent service account
- Disabling the Dataproc API in the cluster project
- Enabling Compute Engine VM deletion protection on a scheduled deletion cluster VM
- Enabling VPC-Service Controls if the Dataproc Service Agent service account (Control plane identity) isn't within the perimeter boundary
Calculate cluster idle time
You can use scheduled deletion to delete a cluster after a specified cluster idle time. Idle time is calculated after the cluster is created and cluster provisioning is complete. The idle time calculation starts when a cluster has no running jobs.
The dataproc:dataproc.cluster-ttl.consider-yarn-activity cluster property affects the calculation of cluster idle time, as follows:
- This property is enabled (set to
true) by default. - When this property is enabled, both YARN and Dataproc Jobs API activity must be idle to start and continue incrementing the cluster idle time calculation.
- YARN activity includes pending and running YARN applications.
- Dataproc Jobs API activity includes pending and running jobs submitted to the Dataproc Jobs API.
- When this property is set to
false, the cluster idle time calculation starts and continues only when Dataproc Jobs API activity is idle.
The dataproc:dataproc.cluster-ttl.consider-yarn-activity property applies to clusters created with image versions released on or after 1.4.64, 1.5.39, 2.0.13, and later image versions. For clusters created with earlier image versions, only Dataproc Jobs API activity is considered in calculating cluster idle time.
Use cluster scheduled deletion
You can set scheduled deletion values when you create a cluster using the Google Cloud CLI, Dataproc API, or Google Cloud console. After you create the cluster, you can update the cluster to change or delete scheduled deletion values previously set on the cluster.
gcloud CLI
You can create or update scheduled deletion values on a cluster by passing the flags and values listed in the following table to the gcloud dataproc clusters create or gcloud dataproc clusters update commands.
| gcloud CLI flag | Description | Value granularity | Min value | Max value |
|---|---|---|---|---|
--delete-max-idle1 | Applies to cluster create and cluster update commands. The duration from the time when the cluster becomes idle after the cluster is created or updated and is in a ready-to-use state to the moment when the cluster starts to delete. Provide the duration in IntegerUnit format, where the unit can be "s, m, h, d" (seconds, minutes, hours, days). Example: "30m": 30 minutes from the moment when the cluster becomes idle. | 1 second | 5 minutes | 14 days |
--no-delete-max-idle | Applies to cluster update command only. Cancels cluster deletion by the previous delete-max-idle flag setting. | not applicable | not applicable | not applicable |
--delete-expiration-time2 | Applies to cluster create and cluster update commands. The time to start deleting the cluster in ISO 8601 datetime format. To generate the datetime in correct format, you can use the Timestamp Generator. For example, "2017-08-22T13:31:48-08:00" specifies an expiration time of 13:21:48 in the UTC -8:00 time zone. | 1 second | 10 minutes from the current time | 14 days from the current time |
--delete-max-age2 | Applies to cluster create and cluster update commands. The duration from the moment of submitting the cluster create request to the moment when the cluster starts to delete. Provide the duration in IntegerUnit format, where the unit can be "s, m, h, d" (seconds, minutes, hours, days). Examples: "30m": 30 minutes from now; "1d": 1 day from now. | 1 second | 10 minutes | 14 days |
--no-delete-max-age | Applies to cluster update command only. Cancels cluster auto-deletion by the previous delete-max-age or delete-expiration-time flag setting. | Not applicable | Not applicable | Not applicable |
- You can pass the
delete-max-idleflag with either thedelete-expiration-timeordelete-max-ageflag in your cluster create or update request. The first to become true takes effect to delete the cluster. - You can pass either thec
delete-expiration-timeflag or thedelete-max-ageflag to the cluster create or update command, but not both.
Cluster creation example:
gcloud dataproc clusters create CLUSTER_NAME \ --region=REGION \ --delete-max-idle=DURATION \ --delete-expiration-time=TIME \ ... other flags ...
Cluster update example:
gcloud dataproc clusters update CLUSTER_NAME \ --region=REGION \ --delete-max-idle=DURATION \ --no-delete-max-age \ ... other flags
REST API
You can create or update scheduled deletion values on a cluster by setting the Dataproc API ClusterLifecycleConfig fields and values listed in the following table as part of a Dataproc cluster.create or cluster.patch API request.
| API field | Description | Value granularity | Min value | Max value |
|---|---|---|---|---|
idleDeleteTtl1 | Applies to cluster create and cluster update commands. The duration from the time when the cluster becomes idle after the cluster is created or updated and is in a ready-to-use state to the moment when the cluster starts to delete. When updating a cluster with a new value, the new value must be greater than the previously set value. Provide a duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s". Submit an empty duration to cancel a previously set idleDeleteTtl value. | 1 second | 5 minutes | 14 days |
autoDeleteTime2 | Applies to cluster create and cluster update commands. The time to start deleting the cluster. When updating a cluster with a new time, the new time must be later than the previously set time. When updating, if an empty value is set for autoDeleteTime, it cancels the existing auto delete.Provide a timestamp in RFC 3339 UTC "Zulu" format, accurate to nanoseconds. Example: "2014-10-02T15:01:23.045123456Z". | 1 second | 10 minutes from the current time | 14 days from the current time |
autoDeleteTtl2 | The duration from the moment of submitting the cluster create or update request to the moment when the cluster starts to delete. When updating a cluster, the new scheduled deletion time (time of the update request plus The new duration) must be later than the previously set cluster deletion time. Submit an empty value to cancel a previously set autoDeleteTtl value. Provide a duration in seconds with up to nine fractional digits, terminated by 's'. Example: "3.5s". | 1 second | 10 minutes | 14 days |
- You can set or update both
idleDeleteTtland eitherautoDeleteTimeorautoDeleteTtlin your cluster create or update request. The first to become true takes effect to delete the cluster. - You can set or update either
autoDeleteTimeorautoDeleteTtlin your request, but not both.
Console
- Open the Dataproc Create a cluster page.
- Select the Customize cluster panel.
- In the Scheduled deletion section, select the options to apply to your cluster.
View Scheduled Deletion cluster settings
gcloud CLI
You can use the gcloud dataproc clusters list command to confirm that a cluster has scheduled deletion enabled.
gcloud dataproc clusters list \ --region=REGION
... NAME WORKER_COUNT ... SCHEDULED_DELETE CLUSTER_ID NUMBER ... enabled ...
You can use the gcloud dataproc clusters describe command to check the cluster LifecycleConfig scheduled deletion settings.
gcloud dataproc clusters describe CLUSTER_NAME \ --region=REGION
... lifecycleConfig: autoDeleteTime: '2018-11-28T19:33:48.146Z' idleDeleteTtl: 1800s idleStartTime: '2018-11-28T18:33:48.146Z' ...
The autoDeleteTime and idleDeleteTtl are the scheduled deletion configuration values set on the cluster. Dataproc generates the idleStartTime value, which is the latest cluster idle start time. Dataproc deletes the cluster if the cluster remains idle at idleStartTime + idleDeleteTtl.
REST API
You can make a clusters.list request to confirm that a cluster has scheduled deletion enabled.
Console
- You can view cluster scheduled deletion settings by selecting the cluster name from the Dataproc Clusters page in the Google Cloud console.
- From the clusters details page, select the Configuration tab. Go to the cluster configuration list to view scheduled deletion settings.