You can install additional components like Zeppelin when you create a Dataproc cluster using the Optional components feature. This page describes the Zeppelin component.
The Zeppelin Notebook component is a Web-based notebook for interactive data analytics. The Zeppelin Web UI is available on port 8080
on the cluster's first master node.
By default, notebooks are saved in Cloud Storage in the Dataproc staging bucket, which is specified by the user or auto-created when the cluster is created. The location can be changed at cluster creation time via the zeppelin:zeppelin.notebook.gcs.dir
property.
Install the component
Install the component when you create a Dataproc cluster. Components can be added to clusters created with Dataproc version 1.3 and later.
See Supported Dataproc versions for the component version included in each Dataproc image release.
gcloud command
To create a Dataproc cluster that includes the Zeppelin component, use the gcloud dataproc clusters create cluster-name command with the --optional-components
flag.
gcloud dataproc clusters create cluster-name \ --optional-components=ZEPPELIN \ --region=region \ --enable-component-gateway \ ... other flags
REST API
The Zeppelin component can be specified through the Dataproc API using SoftwareConfig.Component as part of a clusters.create request.Console
- Enable the component and component gateway.
- In the Google Cloud console, open the Dataproc Create a cluster page. The Set up cluster panel is selected.
- In the Components section:
- Under Optional components, select Zeppelin and other optional components to install on your cluster.
- Under Component Gateway, select Enable component gateway (see Viewing and Accessing Component Gateway URLs).
Open the Zeppelin notebook
See Viewing and Accessing Component Gateway URLs to click Component Gateway links on the Google Cloud console to open the Zeppelin notebook UI running on the cluster's master node in your local browser.