@@ -57,14 +57,14 @@ The latest version of the connector is publicly available in the following links
5757
5858| version | Link |
5959| ------------| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
60- | Spark 3.5 | ` gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.0 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-3.5-bigquery-0.42.0 .jar ) ) |
61- | Spark 3.4 | ` gs://spark-lib/bigquery/spark-3.4-bigquery-0.42.0 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.42.0 .jar ) ) |
62- | Spark 3.3 | ` gs://spark-lib/bigquery/spark-3.3-bigquery-0.42.0 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.42.0 .jar ) ) |
63- | Spark 3.2 | ` gs://spark-lib/bigquery/spark-3.2-bigquery-0.42.0 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.42.0 .jar ) ) |
64- | Spark 3.1 | ` gs://spark-lib/bigquery/spark-3.1-bigquery-0.42.0 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.42.0 .jar ) ) |
60+ | Spark 3.5 | ` gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.1 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-3.5-bigquery-0.42.1 .jar ) ) |
61+ | Spark 3.4 | ` gs://spark-lib/bigquery/spark-3.4-bigquery-0.42.1 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.42.1 .jar ) ) |
62+ | Spark 3.3 | ` gs://spark-lib/bigquery/spark-3.3-bigquery-0.42.1 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.42.1 .jar ) ) |
63+ | Spark 3.2 | ` gs://spark-lib/bigquery/spark-3.2-bigquery-0.42.1 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.42.1 .jar ) ) |
64+ | Spark 3.1 | ` gs://spark-lib/bigquery/spark-3.1-bigquery-0.42.1 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.42.1 .jar ) ) |
6565| Spark 2.4 | ` gs://spark-lib/bigquery/spark-2.4-bigquery-0.37.0.jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-2.4-bigquery-0.37.0.jar ) ) |
66- | Scala 2.13 | ` gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.42.0 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.42.0 .jar ) ) |
67- | Scala 2.12 | ` gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.42.0 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.42.0 .jar ) ) |
66+ | Scala 2.13 | ` gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.42.1 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.42.1 .jar ) ) |
67+ | Scala 2.12 | ` gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.42.1 .jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.42.1 .jar ) ) |
6868| Scala 2.11 | ` gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar ` ([ HTTP link] ( https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar ) ) |
6969
7070The first six versions are Java based connectors targeting Spark 2.4/3.1/3.2/3.3/3.4/3.5 of all Scala versions built on the new
@@ -107,14 +107,14 @@ repository. It can be used using the `--packages` option or the
107107
108108| version | Connector Artifact |
109109| ------------| ------------------------------------------------------------------------------------|
110- | Spark 3.5 | ` com.google.cloud.spark:spark-3.5-bigquery:0.42.0 ` |
111- | Spark 3.4 | ` com.google.cloud.spark:spark-3.4-bigquery:0.42.0 ` |
112- | Spark 3.3 | ` com.google.cloud.spark:spark-3.3-bigquery:0.42.0 ` |
113- | Spark 3.2 | ` com.google.cloud.spark:spark-3.2-bigquery:0.42.0 ` |
114- | Spark 3.1 | ` com.google.cloud.spark:spark-3.1-bigquery:0.42.0 ` |
110+ | Spark 3.5 | ` com.google.cloud.spark:spark-3.5-bigquery:0.42.1 ` |
111+ | Spark 3.4 | ` com.google.cloud.spark:spark-3.4-bigquery:0.42.1 ` |
112+ | Spark 3.3 | ` com.google.cloud.spark:spark-3.3-bigquery:0.42.1 ` |
113+ | Spark 3.2 | ` com.google.cloud.spark:spark-3.2-bigquery:0.42.1 ` |
114+ | Spark 3.1 | ` com.google.cloud.spark:spark-3.1-bigquery:0.42.1 ` |
115115| Spark 2.4 | ` com.google.cloud.spark:spark-2.4-bigquery:0.37.0 ` |
116- | Scala 2.13 | ` com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.42.0 ` |
117- | Scala 2.12 | ` com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.42.0 ` |
116+ | Scala 2.13 | ` com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.42.1 ` |
117+ | Scala 2.12 | ` com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.42.1 ` |
118118| Scala 2.11 | ` com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.29.0 ` |
119119
120120### Specifying the Spark BigQuery connector version in a Dataproc cluster
@@ -124,8 +124,8 @@ Using the standard `--jars` or `--packages` (or alternatively, the `spark.jars`/
124124
125125To use another version than the built-in one, please do one of the following:
126126
127- * For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version ` --metadata SPARK_BQ_CONNECTOR_VERSION=0.42.0 ` , or ` --metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.42.0 .jar ` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
128- * For Dataproc serverless batches, add the following property on batch creation to upgrade the version: ` --properties dataproc.sparkBqConnector.version=0.42.0 ` , or ` --properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.42.0 .jar ` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.
127+ * For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version ` --metadata SPARK_BQ_CONNECTOR_VERSION=0.42.1 ` , or ` --metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.42.1 .jar ` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
128+ * For Dataproc serverless batches, add the following property on batch creation to upgrade the version: ` --properties dataproc.sparkBqConnector.version=0.42.1 ` , or ` --properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.42.1 .jar ` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.
129129
130130## Hello World Example
131131
@@ -135,7 +135,7 @@ You can run a simple PySpark wordcount against the API without compilation by ru
135135
136136```
137137gcloud dataproc jobs submit pyspark --cluster "$MY_CLUSTER" \
138- --jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.42.0 .jar \
138+ --jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.42.1 .jar \
139139 examples/python/shakespeare.py
140140```
141141
@@ -183,7 +183,6 @@ SELECT query on BigQuery and fetch its results directly to a Spark Dataframe.
183183This is easily done as described in the following code sample:
184184```
185185spark.conf.set("viewsEnabled","true")
186- spark.conf.set("materializationDataset","<dataset>")
187186
188187sql = """
189188 SELECT tag, COUNT(*) c
@@ -230,16 +229,8 @@ efficiently then running joins on Spark or use other BigQuery features such as
230229[ BigQuery ML] ( https://cloud.google.com/bigquery-ml/docs )
231230and more.
232231
233- In order to use this feature the following configurations MUST be set:
234- * ` viewsEnabled ` must be set to ` true ` .
235- * ` materializationDataset ` must be set to a dataset where the GCP user has table
236- creation permission. ` materializationProject ` is optional.
237-
238- ** Note:** As mentioned in the [ BigQuery documentation] ( https://cloud.google.com/bigquery/docs/writing-results#temporary_and_permanent_tables ) ,
239- the queried tables must be in the same location as the ` materializationDataset ` .
240- Also, if the tables in the ` SQL statement ` are from projects other than the
241- ` parentProject ` then use the fully qualified table name i.e.
242- ` [project].[dataset].[table] ` .
232+ In order to use this feature the ` viewsEnabled ` configurations MUST be set to
233+ ` true ` . This can also be done globally as shown in the example above.
243234
244235** Important:** This feature is implemented by running the query on BigQuery and
245236saving the result into a temporary table, of which Spark will read the results
@@ -256,17 +247,24 @@ note there are a few caveats:
256247 read performance, even before running any ` collect() ` or ` count() ` action.
257248* The materialization process can also incur additional costs to your BigQuery
258249 bill.
259- * By default, the materialized views are created in the same project and
260- dataset. Those can be configured by the optional ` materializationProject `
261- and ` materializationDataset ` options, respectively. These options can also
262- be globally set by calling ` spark.conf.set(...) ` before reading the views.
263250* Reading from views is ** disabled** by default. In order to enable it,
264251 either set the viewsEnabled option when reading the specific view
265252 (` .option("viewsEnabled", "true") ` ) or set it globally by calling
266253 ` spark.conf.set("viewsEnabled", "true") ` .
254+
255+ ** Notice:** Before version 0.43.0 of the connector, the following configurations
256+ are required:
257+ * By default, the materialized views are created in the same project and
258+ dataset. Those can be configured by the optional ` materializationProject `
259+ and ` materializationDataset ` options, respectively. These options can also
260+ be globally set by calling ` spark.conf.set(...) ` before reading the views.
267261* As mentioned in the [ BigQuery documentation] ( https://cloud.google.com/bigquery/docs/writing-results#temporary_and_permanent_tables ) ,
268262 the ` materializationDataset ` should be in same location as the view.
269263
264+ Starting version 0.43.0 those configurations are ** redundant** and are ignored.
265+ It is highly recommended to upgrade to this version or a later one to enjoy
266+ simpler configuration when using views or loading from queries.
267+
270268### Writing data to BigQuery
271269
272270Writing DataFrames to BigQuery can be done using two methods: Direct and Indirect.
@@ -422,35 +420,6 @@ word-break:break-word
422420 </td >
423421 <td >Read</td >
424422 </tr >
425- <tr valign =" top " >
426- <td ><code >materializationProject</code >
427- </td >
428- <td >The project id where the materialized view is going to be created
429- <br/>(Optional. Defaults to view's project id)
430- </td >
431- <td >Read</td >
432- </tr >
433- <tr valign =" top " >
434- <td ><code >materializationDataset</code >
435- </td >
436- <td >The dataset where the materialized view is going to be created. This
437- dataset should be in same location as the view or the queried tables.
438- <br />(Optional. Defaults to view's dataset)
439- </td >
440- <td >Read</td >
441- </tr >
442- <tr valign =" top " >
443- <td ><code >materializationExpirationTimeInMinutes</code >
444- </td >
445- <td >The expiration time of the temporary table holding the materialized data
446- of a view or a query, in minutes. Notice that the connector may re-use
447- the temporary table due to the use of local cache and in order to reduce
448- BigQuery computation, so very low values may cause errors. The value must
449- be a positive integer.
450- <br/>(Optional. Defaults to 1440, or 24 hours)
451- </td >
452- <td >Read</td >
453- </tr >
454423 <tr valign =" top " >
455424 <td ><code >readDataFormat</code >
456425 </td >
@@ -1200,7 +1169,7 @@ using the following code:
12001169``` python
12011170from pyspark.sql import SparkSession
12021171spark = SparkSession.builder \
1203- .config(" spark.jars.packages" , " com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.42.0 " ) \
1172+ .config(" spark.jars.packages" , " com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.42.1 " ) \
12041173 .getOrCreate()
12051174df = spark.read.format(" bigquery" ) \
12061175 .load(" dataset.table" )
@@ -1209,15 +1178,15 @@ df = spark.read.format("bigquery") \
12091178** Scala:**
12101179``` scala
12111180val spark = SparkSession .builder
1212- .config(" spark.jars.packages" , " com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.42.0 " )
1181+ .config(" spark.jars.packages" , " com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.42.1 " )
12131182.getOrCreate()
12141183val df = spark.read.format(" bigquery" )
12151184.load(" dataset.table" )
12161185```
12171186
12181187In case Spark cluster is using Scala 2.12 (it's optional for Spark 2.4.x,
12191188mandatory in 3.0.x), then the relevant package is
1220- com.google.cloud.spark: spark-bigquery-with-dependencies_ ** 2.12** :0.42.0 . In
1189+ com.google.cloud.spark: spark-bigquery-with-dependencies_ ** 2.12** :0.42.1 . In
12211190order to know which Scala version is used, please run the following code:
12221191
12231192** Python:**
@@ -1241,14 +1210,14 @@ To include the connector in your project:
12411210<dependency >
12421211 <groupId >com.google.cloud.spark</groupId >
12431212 <artifactId >spark-bigquery-with-dependencies_${scala.version}</artifactId >
1244- <version >0.42.0 </version >
1213+ <version >0.42.1 </version >
12451214</dependency >
12461215```
12471216
12481217### SBT
12491218
12501219``` sbt
1251- libraryDependencies += " com.google.cloud.spark" %% " spark-bigquery-with-dependencies" % " 0.42.0 "
1220+ libraryDependencies += " com.google.cloud.spark" %% " spark-bigquery-with-dependencies" % " 0.42.1 "
12521221```
12531222
12541223### Connector metrics and how to view them
@@ -1293,7 +1262,7 @@ word-break:break-word
12931262</table >
12941263
12951264
1296- ** Note:** To use the metrics in the Spark UI page, you need to make sure the ` spark-bigquery-metrics-0.42.0 .jar ` is the class path before starting the history-server and the connector version is ` spark-3.2 ` or above.
1265+ ** Note:** To use the metrics in the Spark UI page, you need to make sure the ` spark-bigquery-metrics-0.42.1 .jar ` is the class path before starting the history-server and the connector version is ` spark-3.2 ` or above.
12971266
12981267## FAQ
12991268
0 commit comments