Skip to content
24 changes: 15 additions & 9 deletions dataproc/python-api-walkthrough.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Use the Python Client Library to call Cloud Dataproc APIs
# Use the Python Client Library to call Dataproc APIs

Estimated completion time: <walkthrough-tutorial-duration duration="5"></walkthrough-tutorial-duration>

Expand All @@ -7,13 +7,13 @@ Estimated completion time: <walkthrough-tutorial-duration duration="5"></walkthr
This [Cloud Shell](https://cloud.google.com/shell/docs/) walkthrough leads you
through the steps to use the
[Google Cloud Client Libraries for Python](https://googleapis.github.io/google-cloud-python/latest/dataproc/index.html)
to programmatically interact with [Cloud Dataproc](https://cloud.google.com/dataproc/docs/).
to programmatically interact with [Dataproc](https://cloud.google.com/dataproc/docs/).

As you follow this walkthrough, you run Python code that calls
[Cloud Dataproc gRPC APIs](https://cloud.google.com/dataproc/docs/reference/rpc/)
[Dataproc gRPC APIs](https://cloud.google.com/dataproc/docs/reference/rpc/)
to:

* create a Cloud Dataproc cluster
* create a Dataproc cluster
* submit a small PySpark word sort job to run on the cluster
* get job status
* tear down the cluster after job completion
Expand All @@ -24,7 +24,7 @@ The `submit_job_to_cluster.py file` used in this walkthrough is opened in the
Cloud Shell editor when you launch the walkthrough. You can view
the code as your follow the walkthrough steps.

**For more information**: See [Cloud Dataproc&rarr;Use the Python Client Library](https://cloud.google.com/dataproc/docs/tutorials/python-library-example) for
**For more information**: See [Dataproc&rarr;Use the Python Client Library](https://cloud.google.com/dataproc/docs/tutorials/python-library-example) for
an explanation of how the code works.

**To reload this walkthrough:** Run the following command from the
Expand All @@ -41,8 +41,13 @@ an explanation of how the code works.
1. Create or select a Google Cloud Platform project to use for this tutorial.
* <walkthrough-project-billing-setup permissions=""></walkthrough-project-billing-setup>

1. Enable the Cloud Dataproc, Compute Engine, and Cloud Storage APIs in your project.
* <walkthrough-enable-apis apis="dataproc,compute_component,storage-component.googleapis.com"></walkthrough-enable-apis>
1. Click the link below to enable the Dataproc, Compute Engine, and Cloud Storage APIs
in a separate GCP console tab in your browser.

**Note:** After you select your project and enable the APIs, return to this tutorial by clicking
on the **Cloud Shell** tab in your browser.

* [Enable APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataproc,compute_component,storage-component.googleapis.com&redirect=https://console.cloud.google.com)

## Prerequisites (2)

Expand Down Expand Up @@ -140,7 +145,8 @@ Job output in Cloud Shell shows cluster creation, job submission,
### Next Steps:

* **View job details from the Console.** View job details by selecting the
PySpark job from the Cloud Dataproc
PySpark job from the Dataproc
=
[Jobs page](https://console.cloud.google.com/dataproc/jobs)
in the Google Cloud Platform Console.

Expand All @@ -160,5 +166,5 @@ Job output in Cloud Shell shows cluster creation, job submission,
gsutil rm -r gs://$BUCKET
```

* **For more information.** See the [Cloud Dataproc documentation](https://cloud.google.com/dataproc/docs/)
* **For more information.** See the [Dataproc documentation](https://cloud.google.com/dataproc/docs/)
for API reference and product feature information.