Skip to content

Commit 657270b

Browse files
SireInsectusSireInsectus
authored andcommitted
Publishing v2.1.0
1 parent 6c6dcd5 commit 657270b

File tree

51 files changed

+812
-596
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+812
-596
lines changed

Apache-Spark-Programming-with-Databricks/ASP 0 - Course Agenda.py

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -17,31 +17,31 @@
1717
# MAGIC # Day 1
1818
# MAGIC ## Introductions
1919
# MAGIC * [ASP 1.1 - Databricks Platform]($./ASP 1 - Introductions/ASP 1.1 - Databricks Platform)
20-
# MAGIC * [ASP 1.1L - Explore Datasets Lab]($./ASP 1 - Introductions/Labs/ASP 1.1L - Explore Datasets Lab)
20+
# MAGIC * [ASP 1.1L - Explore Datasets Lab]($./ASP 1 - Introductions/ASP 1.1L - Explore Datasets Lab)
2121
# MAGIC
2222
# MAGIC
2323
# MAGIC ## Spark Core
2424
# MAGIC * [ASP 2.1 - Spark SQL]($./ASP 2 - Spark Core/ASP 2.1 - Spark SQL)
25-
# MAGIC * [ASP 2.1L - Spark SQL Lab]($./ASP 2 - Spark Core/Labs/ASP 2.1L - Spark SQL Lab)
25+
# MAGIC * [ASP 2.1L - Spark SQL Lab]($./ASP 2 - Spark Core/ASP 2.1L - Spark SQL Lab)
2626
# MAGIC * [ASP 2.2 - Reader & Writer]($./ASP 2 - Spark Core/ASP 2.2 - Reader & Writer)
27-
# MAGIC * [ASP 2.2L - Ingesting Data Lab]($./ASP 2 - Spark Core/Labs/ASP 2.2L - Ingesting Data Lab)
27+
# MAGIC * [ASP 2.2L - Ingesting Data Lab]($./ASP 2 - Spark Core/ASP 2.2L - Ingesting Data Lab)
2828
# MAGIC * [ASP 2.3 - DataFrame & Column]($./ASP 2 - Spark Core/ASP 2.3 - DataFrame & Column)
29-
# MAGIC * [ASP 2.3L - Purchase Revenues Lab]($./ASP 2 - Spark Core/Labs/ASP 2.3L - Purchase Revenues Lab)
29+
# MAGIC * [ASP 2.3L - Purchase Revenues Lab]($./ASP 2 - Spark Core/ASP 2.3L - Purchase Revenues Lab)
3030

3131
# COMMAND ----------
3232

3333
# MAGIC %md
3434
# MAGIC # Day 2
3535
# MAGIC ## Functions
3636
# MAGIC * [ASP 3.1 - Aggregation]($./ASP 3 - Functions/ASP 3.1 - Aggregation)
37-
# MAGIC * [ASP 3.1L - Revenue by Traffic Lab]($./ASP 3 - Functions/Labs/ASP 3.1L - Revenue by Traffic Lab)
37+
# MAGIC * [ASP 3.1L - Revenue by Traffic Lab]($./ASP 3 - Functions/ASP 3.1L - Revenue by Traffic Lab)
3838
# MAGIC * [ASP 3.2 - Datetimes]($./ASP 3 - Functions/ASP 3.2 - Datetimes)
39-
# MAGIC * [ASP 3.2L - Active Users Lab]($./ASP 3 - Functions/Labs/ASP 3.2L - Active Users Lab)
39+
# MAGIC * [ASP 3.2L - Active Users Lab]($./ASP 3 - Functions/ASP 3.2L - Active Users Lab)
4040
# MAGIC * [ASP 3.3 - Complex Types]($./ASP 3 - Functions/ASP 3.3 - Complex Types)
4141
# MAGIC * [ASP 3.4 - Additional Functions]($./ASP 3 - Functions/ASP 3.4 - Additional Functions)
42-
# MAGIC * [ASP 3.4L - Abandoned Carts Lab]($./ASP 3 - Functions/Labs/ASP 3.4L - Abandoned Carts Lab)
42+
# MAGIC * [ASP 3.4L - Abandoned Carts Lab]($./ASP 3 - Functions/ASP 3.4L - Abandoned Carts Lab)
4343
# MAGIC * [ASP 3.5 - UDFs]($./ASP 3 - Functions/ASP 3.5 - UDFs)
44-
# MAGIC * [ASP 3.5L - Sort Day Lab]($./ASP 3 - Functions/Labs/ASP 3.5L - Sort Day Lab)
44+
# MAGIC * [ASP 3.5L - Sort Day Lab]($./ASP 3 - Functions/ASP 3.5L - Sort Day Lab)
4545
# MAGIC
4646

4747
# COMMAND ----------
@@ -51,22 +51,22 @@
5151
# MAGIC ## Performance
5252
# MAGIC * [ASP 4.1 - Query Optimization]($./ASP 4 - Performance/ASP 4.1 - Query Optimization)
5353
# MAGIC * [ASP 4.2 - Partitioning]($./ASP 4 - Performance/ASP 4.2 - Partitioning)
54-
# MAGIC * [ASP 4.3L - De-Duping Data Lab]($./ASP 4 - Performance/Labs/ASP 4.3L - De-Duping Data Lab)
54+
# MAGIC * [ASP 4.2L - De-Duping Data Lab]($./ASP 4 - Performance/ASP 4.2L - De-Duping Data Lab)
5555

5656
# COMMAND ----------
5757

5858
# MAGIC %md
5959
# MAGIC # Day 4
6060
# MAGIC ## Streaming
6161
# MAGIC * [ASP 5.1 - Streaming Query]($./ASP 5 - Streaming/ASP 5.1 - Streaming Query)
62-
# MAGIC * [ASP 5.1L - Coupon Sales Lab]($./ASP 5 - Streaming/Labs/ASP 5.1L - Coupon Sales Lab)
63-
# MAGIC * [ASP 5.2L - Hourly Activity by Traffic Lab]($./ASP 5 - Streaming/Labs/ASP 5.2L - Hourly Activity by Traffic Lab)
64-
# MAGIC * [ASP 5.3L - Activity by Traffic Lab]($./ASP 5 - Streaming/Labs/ASP 5.3L - Activity by Traffic Lab)
62+
# MAGIC * [ASP 5.1aL - Coupon Sales Lab]($./ASP 5 - Streaming/ASP 5.1aL - Coupon Sales Lab)
63+
# MAGIC * [ASP 5.1bL - Hourly Activity by Traffic Lab]($./ASP 5 - Streaming/ASP 5.1bL - Hourly Activity by Traffic Lab)
64+
# MAGIC * [ASP 5.1cL - Activity by Traffic Lab]($./ASP 5 - Streaming/ASP 5.1cL - Activity by Traffic Lab)
6565
# MAGIC
6666
# MAGIC
6767
# MAGIC ## Delta Lake
6868
# MAGIC * [ASP 6.1 - Delta Lake]($./ASP 6 - Delta Lake/ASP 6.1 - Delta Lake)
69-
# MAGIC * [ASP 6.1L - Delta Lake Lab]($./ASP 6 - Delta Lake/Labs/ASP 6.1L - Delta Lake Lab)
69+
# MAGIC * [ASP 6.1L - Delta Lake Lab]($./ASP 6 - Delta Lake/ASP 6.1L - Delta Lake Lab)
7070

7171
# COMMAND ----------
7272

Apache-Spark-Programming-with-Databricks/ASP 1 - Introductions/ASP 1.1 - Databricks Platform.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
# COMMAND ----------
99

10-
# MAGIC %md
10+
# MAGIC %md
1111
# MAGIC # Databricks Platform
1212
# MAGIC
1313
# MAGIC Demonstrate basic functionality and identify terms related to working in the Databricks workspace.
@@ -24,13 +24,13 @@
2424
# MAGIC
2525
# MAGIC ##### Databricks Notebook Utilities
2626
# MAGIC - <a href="https://docs.databricks.com/notebooks/notebooks-use.html#language-magic" target="_blank">Magic commands</a>: **`%python`**, **`%scala`**, **`%sql`**, **`%r`**, **`%sh`**, **`%md`**
27-
# MAGIC - <a href="https://docs.databricks.com/dev-tools/databricks-utils.html" target="_blank">DBUtils</a>: **`dbutils.fs`** **(`%fs`)**, **`dbutils.notebooks`** **(`%run`)**, **`dbutils.widgets`**
27+
# MAGIC - <a href="https://docs.databricks.com/dev-tools/databricks-utils.html" target="_blank">DBUtils</a>: **`dbutils.fs`** (**`%fs`**), **`dbutils.notebooks`** (**`%run`**), **`dbutils.widgets`**
2828
# MAGIC - <a href="https://docs.databricks.com/notebooks/visualizations/index.html" target="_blank">Visualization</a>: **`display`**, **`displayHTML`**
2929

3030
# COMMAND ----------
3131

3232
# MAGIC %md ### Setup
33-
# MAGIC Run classroom setup to [mount](https://docs.databricks.com/data/databricks-file-system.html#mount-storage) Databricks training datasets and create your own database for BedBricks.
33+
# MAGIC Run classroom setup to <a href="https://docs.databricks.com/data/databricks-file-system.html#mount-storage" target="_blank">mount</a> Databricks training datasets and create your own database for BedBricks.
3434
# MAGIC
3535
# MAGIC Use the **`%run`** magic command to run another notebook within a notebook
3636

@@ -92,7 +92,7 @@
9292

9393
# MAGIC %md
9494
# MAGIC ## Create documentation cells
95-
# MAGIC Render cell as <a href="https://www.markdownguide.org/cheat-sheet/" target="_blank">Markdown</a> using the magic command: **`%md`**
95+
# MAGIC Render cell as <a href="https://www.markdownguide.org/cheat-sheet/" target="_blank">Markdown</a> using the magic command: **`%md`**
9696
# MAGIC
9797
# MAGIC Below are some examples of how you can use Markdown to format documentation. Click this cell and press **`Enter`** to view the underlying Markdown syntax.
9898
# MAGIC
@@ -107,7 +107,7 @@
107107
# MAGIC
108108
# MAGIC ---
109109
# MAGIC
110-
# MAGIC - [link](https://www.markdownguide.org/cheat-sheet/)
110+
# MAGIC - <a href="https://www.markdownguide.org/cheat-sheet/" target="_blank">link</a>
111111
# MAGIC - `code`
112112
# MAGIC
113113
# MAGIC ```
@@ -169,7 +169,7 @@
169169

170170
# COMMAND ----------
171171

172-
# MAGIC %md
172+
# MAGIC %md
173173
# MAGIC Run file system commands on DBFS using DBUtils directly
174174

175175
# COMMAND ----------
@@ -219,7 +219,7 @@
219219
# COMMAND ----------
220220

221221
# MAGIC %sql
222-
# MAGIC CREATE TABLE IF NOT EXISTS events USING delta OPTIONS (path "${c.events_path}");
222+
# MAGIC CREATE TABLE IF NOT EXISTS events USING DELTA OPTIONS (path "${c.events_path}");
223223

224224
# COMMAND ----------
225225

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,14 +24,14 @@
2424

2525
# COMMAND ----------
2626

27-
# MAGIC %run ../../Includes/Classroom-Setup
27+
# MAGIC %run ../Includes/Classroom-Setup
2828

2929
# COMMAND ----------
3030

3131
# MAGIC %md ### 1. List data files in DBFS using magic commands
3232
# MAGIC Use a magic command to display files located in the DBFS directory: **`dbfs:/databricks-datasets`**
3333
# MAGIC
34-
# MAGIC <img src="https://files.training.databricks.com/images/icon_hint_32.png" alt="Hint"> You should see four items: **`events`**, **`products`**, **`sales`**, **`users`**
34+
# MAGIC <img src="https://files.training.databricks.com/images/icon_hint_32.png" alt="Hint"> You should see several datasets that come pre-installed in Databricks such as: **`COVID`**, **`adult`**, and **`airlines`**.
3535

3636
# COMMAND ----------
3737

@@ -44,7 +44,7 @@
4444
# MAGIC - Use **`dbutils`** to get the files at the directory above and save it to the variable **`files`**
4545
# MAGIC - Use the Databricks display() function to display the contents in **`files`**
4646
# MAGIC
47-
# MAGIC <img src="https://files.training.databricks.com/images/icon_hint_32.png" alt="Hint"> You should see four items: **`events`**, **`items`**, **`sales`**, **`users`**
47+
# MAGIC <img src="https://files.training.databricks.com/images/icon_hint_32.png" alt="Hint"> You should see several datasets that come pre-installed in Databricks such as: **`COVID`**, **`adult`**, and **`airlines`**.
4848

4949
# COMMAND ----------
5050

@@ -122,7 +122,7 @@
122122
# MAGIC
123123
# MAGIC Execute a SQL query that computes the average **`purchase_revenue_in_usd`** from the **`sales`** table.
124124
# MAGIC
125-
# MAGIC <img src="https://files.training.databricks.com/images/icon_hint_32.png" alt="Hint"> The result should be `1042.79`.
125+
# MAGIC <img src="https://files.training.databricks.com/images/icon_hint_32.png" alt="Hint"> The result should be **`1042.79`**.
126126

127127
# COMMAND ----------
128128

Apache-Spark-Programming-with-Databricks/ASP 2 - Spark Core/ASP 2.1 - Spark SQL.py

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
# COMMAND ----------
3434

3535
# MAGIC %md ## Multiple Interfaces
36-
# MAGIC Spark SQL is a module for structured data processing with multiple interfaces.
36+
# MAGIC Spark SQL is a module for structured data processing with multiple interfaces.
3737
# MAGIC
3838
# MAGIC We can interact with Spark SQL in two ways:
3939
# MAGIC 1. Executing SQL queries
@@ -42,7 +42,7 @@
4242
# COMMAND ----------
4343

4444
# MAGIC %md
45-
# MAGIC **Method 1: Executing SQL queries**
45+
# MAGIC **Method 1: Executing SQL queries**
4646
# MAGIC
4747
# MAGIC This is how we interacted with Spark SQL in the previous lesson.
4848

@@ -58,7 +58,7 @@
5858

5959
# MAGIC %md **Method 2: Working with the DataFrame API**
6060
# MAGIC
61-
# MAGIC We can also express Spark SQL queries using the DataFrame API.
61+
# MAGIC We can also express Spark SQL queries using the DataFrame API.
6262
# MAGIC The following cell returns a DataFrame containing the same results as those retrieved above.
6363

6464
# COMMAND ----------
@@ -72,7 +72,7 @@
7272

7373
# COMMAND ----------
7474

75-
# MAGIC %md We'll go over the syntax for the DataFrame API later in the lesson, but you can see this builder design pattern allows us to chain a sequence of operations very similar to those we find in SQL.
75+
# MAGIC %md We'll go over the syntax for the DataFrame API later in the lesson, but you can see this builder design pattern allows us to chain a sequence of operations very similar to those we find in SQL.
7676

7777
# COMMAND ----------
7878

@@ -87,21 +87,21 @@
8787

8888
# MAGIC %md ## Spark API Documentation
8989
# MAGIC
90-
# MAGIC To learn how we work with DataFrames in Spark SQL, let's first look at the Spark API documentation.
91-
# MAGIC The main Spark [documentation](https://spark.apache.org/docs/latest/) page includes links to API docs and helpful guides for each version of Spark.
90+
# MAGIC To learn how we work with DataFrames in Spark SQL, let's first look at the Spark API documentation.
91+
# MAGIC The main Spark <a href="https://spark.apache.org/docs/latest/" target="_blank">documentation</a> page includes links to API docs and helpful guides for each version of Spark.
9292
# MAGIC
93-
# MAGIC The [Scala API](https://spark.apache.org/docs/latest/api/scala/org/apache/spark/index.html) and [Python API](https://spark.apache.org/docs/latest/api/python/index.html) are most commonly used, and it's often helpful to reference the documentation for both languages.
93+
# MAGIC The <a href="https://spark.apache.org/docs/latest/api/scala/org/apache/spark/index.html" target="_blank">Scala API</a> and <a href="https://spark.apache.org/docs/latest/api/python/index.html" target="_blank">Python API</a> are most commonly used, and it's often helpful to reference the documentation for both languages.
9494
# MAGIC Scala docs tend to be more comprehensive, and Python docs tend to have more code examples.
9595
# MAGIC
9696
# MAGIC #### Navigating Docs for the Spark SQL Module
97-
# MAGIC Find the Spark SQL module by navigating to **`org.apache.spark.sql`** in the Scala API or **`pyspark.sql`** in the Python API.
97+
# MAGIC Find the Spark SQL module by navigating to **`org.apache.spark.sql`** in the Scala API or **`pyspark.sql`** in the Python API.
9898
# MAGIC The first class we'll explore in this module is the **`SparkSession`** class. You can find this by entering "SparkSession" in the search bar.
9999

100100
# COMMAND ----------
101101

102102
# MAGIC %md
103103
# MAGIC ## SparkSession
104-
# MAGIC The **`SparkSession`** class is the single entry point to all functionality in Spark using the DataFrame API.
104+
# MAGIC The **`SparkSession`** class is the single entry point to all functionality in Spark using the DataFrame API.
105105
# MAGIC
106106
# MAGIC In Databricks notebooks, the SparkSession is created for you, stored in a variable called **`spark`**.
107107

@@ -119,13 +119,13 @@
119119

120120
# COMMAND ----------
121121

122-
# MAGIC %md
123-
# MAGIC Below are several additional methods we can use to create DataFrames. All of these can be found in the <a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.SparkSession.html" target="_blank">documentation</a> for `SparkSession`.
122+
# MAGIC %md
123+
# MAGIC Below are several additional methods we can use to create DataFrames. All of these can be found in the <a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.SparkSession.html" target="_blank">documentation</a> for **`SparkSession`**.
124124
# MAGIC
125-
# MAGIC #### `SparkSession` Methods
125+
# MAGIC #### **`SparkSession`** Methods
126126
# MAGIC | Method | Description |
127127
# MAGIC | --- | --- |
128-
# MAGIC | sql | Returns a DataFrame representing the result of the given query |
128+
# MAGIC | sql | Returns a DataFrame representing the result of the given query |
129129
# MAGIC | table | Returns the specified table as a DataFrame |
130130
# MAGIC | read | Returns a DataFrameReader that can be used to read data in as a DataFrame |
131131
# MAGIC | range | Create a DataFrame with a column containing elements in a range from start to end (exclusive) with step value and number of partitions |
@@ -191,16 +191,16 @@
191191
# COMMAND ----------
192192

193193
# MAGIC %md ## Transformations
194-
# MAGIC When we created **`budget_df`**, we used a series of DataFrame transformation methods e.g. **`select`**, **`where`**, **`orderBy`**.
194+
# MAGIC When we created **`budget_df`**, we used a series of DataFrame transformation methods e.g. **`select`**, **`where`**, **`orderBy`**.
195195
# MAGIC
196-
# MAGIC ```
197-
# MAGIC products_df
198-
# MAGIC .select("name", "price")
199-
# MAGIC .where("price < 200")
200-
# MAGIC .orderBy("price")
201-
# MAGIC ```
202-
# MAGIC Transformations operate on and return DataFrames, allowing us to chain transformation methods together to construct new DataFrames.
203-
# MAGIC However, these operations can't execute on their own, as transformation methods are **lazily evaluated**.
196+
# MAGIC <strong><code>products_df
197+
# MAGIC &nbsp; .select("name", "price")
198+
# MAGIC &nbsp; .where("price < 200")
199+
# MAGIC &nbsp; .orderBy("price")
200+
# MAGIC </code></strong>
201+
# MAGIC
202+
# MAGIC Transformations operate on and return DataFrames, allowing us to chain transformation methods together to construct new DataFrames.
203+
# MAGIC However, these operations can't execute on their own, as transformation methods are **lazily evaluated**.
204204
# MAGIC
205205
# MAGIC Running the following cell does not trigger any computation.
206206

@@ -214,8 +214,8 @@
214214
# COMMAND ----------
215215

216216
# MAGIC %md ## Actions
217-
# MAGIC Conversely, DataFrame actions are methods that **trigger computation**.
218-
# MAGIC Actions are needed to trigger the execution of any DataFrame transformations.
217+
# MAGIC Conversely, DataFrame actions are methods that **trigger computation**.
218+
# MAGIC Actions are needed to trigger the execution of any DataFrame transformations.
219219
# MAGIC
220220
# MAGIC The **`show`** action causes the following cell to execute transformations.
221221

@@ -243,7 +243,7 @@
243243

244244
# COMMAND ----------
245245

246-
# MAGIC %md
246+
# MAGIC %md
247247
# MAGIC **`count`** returns the number of records in a DataFrame.
248248

249249
# COMMAND ----------
@@ -252,12 +252,12 @@
252252

253253
# COMMAND ----------
254254

255-
# MAGIC %md
255+
# MAGIC %md
256256
# MAGIC **`collect`** returns an array of all rows in a DataFrame.
257257

258258
# COMMAND ----------
259259

260-
budget_df.collect()
260+
budget_df.collect()
261261

262262
# COMMAND ----------
263263

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,16 +20,16 @@
2020
# MAGIC ##### Methods
2121
# MAGIC - <a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.SparkSession.html?highlight=sparksession" target="_blank">SparkSession</a>: **`sql`**, **`table`**
2222
# MAGIC - <a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.html" target="_blank">DataFrame</a> transformations: **`select`**, **`where`**, **`orderBy`**
23-
# MAGIC - <a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.html" target="_blank">DataFrame</a> actions: `select`, `count`, `take`
23+
# MAGIC - <a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.html" target="_blank">DataFrame</a> actions: **`select`**, **`count`**, **`take`**
2424
# MAGIC - Other <a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.html" target="_blank">DataFrame</a> methods: **`printSchema`**, **`schema`**, **`createOrReplaceTempView`**
2525

2626
# COMMAND ----------
2727

28-
# MAGIC %run ../../Includes/Classroom-Setup-SQL
28+
# MAGIC %run ../Includes/Classroom-Setup-SQL
2929

3030
# COMMAND ----------
3131

32-
# MAGIC %md ### 1. Create a DataFrame from the `events` table
32+
# MAGIC %md ### 1. Create a DataFrame from the **`events`** table
3333
# MAGIC - Use SparkSession to create a DataFrame from the **`events`** table
3434

3535
# COMMAND ----------
@@ -83,6 +83,7 @@
8383
assert(num_rows == 1938215)
8484
assert(len(rows) == 5)
8585
assert(type(rows[0]) == Row)
86+
print("All test pass")
8687

8788
# COMMAND ----------
8889

@@ -109,6 +110,7 @@
109110
assert (mac_sql_df.select("device").distinct().count() == 1 and len(verify_rows) == 5 and verify_rows[0]['device'] == "macOS"), "Incorrect filter condition"
110111
assert (verify_rows[4]['event_timestamp'] == 1592539226602157), "Incorrect sorting"
111112
del verify_rows
113+
print("All test pass")
112114

113115
# COMMAND ----------
114116

Apache-Spark-Programming-with-Databricks/ASP 2 - Spark Core/ASP 2.2 - Reader & Writer.py

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,6 @@
55
# MAGIC <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
66
# MAGIC </div>
77

8-
# COMMAND ----------
9-
10-
11-
128
# COMMAND ----------
139

1410
# MAGIC %md # Reader & Writer
@@ -204,7 +200,7 @@
204200
# MAGIC // Step 2 - pull the value from the config (or copy & paste it)
205201
# MAGIC val eventsJsonPath = spark.conf.get("com.whatever.your_scope.events_path")
206202
# MAGIC
207-
# MAGIC // Step 3 - Read in the JSON, but let it infer the scmea
203+
# MAGIC // Step 3 - Read in the JSON, but let it infer the schema
208204
# MAGIC val eventsSchema = spark.read
209205
# MAGIC .option("inferSchema", true)
210206
# MAGIC .json(eventsJsonPath)
@@ -243,13 +239,14 @@
243239
# MAGIC %md ## DataFrameWriter
244240
# MAGIC Interface used to write a DataFrame to external storage systems
245241
# MAGIC
246-
# MAGIC ```
247-
# MAGIC (df.write
248-
# MAGIC .option("compression", "snappy")
249-
# MAGIC .mode("overwrite")
250-
# MAGIC .parquet(output_dir)
242+
# MAGIC <strong><code>
243+
# MAGIC (df
244+
# MAGIC &nbsp; .write
245+
# MAGIC &nbsp; .option("compression", "snappy")
246+
# MAGIC &nbsp; .mode("overwrite")
247+
# MAGIC &nbsp; .parquet(output_dir)
251248
# MAGIC )
252-
# MAGIC ```
249+
# MAGIC </code></strong>
253250
# MAGIC
254251
# MAGIC DataFrameWriter is accessible through the SparkSession attribute **`write`**. This class includes methods to write DataFrames to different external storage systems.
255252

0 commit comments

Comments
 (0)