Skip to content

Commit fb0019f

Browse files
SireInsectusSireInsectus
authored andcommitted
Publishing v2.1.1
1 parent 657270b commit fb0019f

23 files changed

+54
-68
lines changed

Apache-Spark-Programming-with-Databricks/ASP 0 - Course Agenda.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
# MAGIC * [ASP 3.2 - Datetimes]($./ASP 3 - Functions/ASP 3.2 - Datetimes)
3939
# MAGIC * [ASP 3.2L - Active Users Lab]($./ASP 3 - Functions/ASP 3.2L - Active Users Lab)
4040
# MAGIC * [ASP 3.3 - Complex Types]($./ASP 3 - Functions/ASP 3.3 - Complex Types)
41+
# MAGIC * [ASP 3.3L - Users]($./ASP 3 - Functions/ASP 3.3L - Users)
4142
# MAGIC * [ASP 3.4 - Additional Functions]($./ASP 3 - Functions/ASP 3.4 - Additional Functions)
4243
# MAGIC * [ASP 3.4L - Abandoned Carts Lab]($./ASP 3 - Functions/ASP 3.4L - Abandoned Carts Lab)
4344
# MAGIC * [ASP 3.5 - UDFs]($./ASP 3 - Functions/ASP 3.5 - UDFs)

Apache-Spark-Programming-with-Databricks/ASP 1 - Introductions/ASP 1.1 - Databricks Platform.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@
161161

162162
# COMMAND ----------
163163

164-
# MAGIC %md `%fs` is shorthand for the <a href="https://docs.databricks.com/dev-tools/databricks-utils.html" target="_blank">DBUtils</a> module: **`dbutils.fs`**
164+
# MAGIC %md **`%fs`** is shorthand for the <a href="https://docs.databricks.com/dev-tools/databricks-utils.html" target="_blank">DBUtils</a> module: **`dbutils.fs`**
165165

166166
# COMMAND ----------
167167

@@ -219,7 +219,9 @@
219219
# COMMAND ----------
220220

221221
# MAGIC %sql
222-
# MAGIC CREATE TABLE IF NOT EXISTS events USING DELTA OPTIONS (path "${c.events_path}");
222+
# MAGIC CREATE TABLE IF NOT EXISTS events
223+
# MAGIC USING DELTA
224+
# MAGIC OPTIONS (path = "${c.events_path}");
223225

224226
# COMMAND ----------
225227

Apache-Spark-Programming-with-Databricks/ASP 2 - Spark Core/ASP 2.2 - Reader & Writer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -320,7 +320,7 @@
320320
# MAGIC
321321
# MAGIC #### Delta Lake's Key Features
322322
# MAGIC - ACID transactions
323-
# MAGIC - Scalable metadata handline
323+
# MAGIC - Scalable metadata handling
324324
# MAGIC - Unified streaming and batch processing
325325
# MAGIC - Time travel (data versioning)
326326
# MAGIC - Schema enforcement and evolution

Apache-Spark-Programming-with-Databricks/ASP 2 - Spark Core/ASP 2.3 - DataFrame & Column.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@
228228
# MAGIC %md #### **`dropDuplicates()`**
229229
# MAGIC Returns a new DataFrame with duplicate rows removed, optionally considering only a subset of columns.
230230
# MAGIC
231-
# MAGIC ##### Alias: `distinct`
231+
# MAGIC ##### Alias: **`distinct`**
232232

233233
# COMMAND ----------
234234

Apache-Spark-Programming-with-Databricks/ASP 3 - Functions/ASP 3.1 - Aggregation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@
100100
# MAGIC %md ## Built-In Functions
101101
# MAGIC In addition to DataFrame and Column transformation methods, there are a ton of helpful functions in Spark's built-in <a href="https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-functions-builtin.html" target="_blank">SQL functions</a> module.
102102
# MAGIC
103-
# MAGIC In Scala, this is <a href="https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html" target="_bank">**`org.apache.spark.sql.functions`**</a>, and <a href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql.html#functions" target="_blank">**`pyspark.sql.functions`**</a> in Python. Functions from this module must be imported into your code.
103+
# MAGIC In Scala, this is <a href="https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html" target="_blank">**`org.apache.spark.sql.functions`**</a>, and <a href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql.html#functions" target="_blank">**`pyspark.sql.functions`**</a> in Python. Functions from this module must be imported into your code.
104104

105105
# COMMAND ----------
106106

Apache-Spark-Programming-with-Databricks/ASP 3 - Functions/ASP 3.2 - Datetimes.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@
122122
# MAGIC %md #### **`year`**
123123
# MAGIC Extracts the year as an integer from a given date/timestamp/string.
124124
# MAGIC
125-
# MAGIC ##### Similar methods: `month`, `dayofweek`, `minute`, `second`, etc.
125+
# MAGIC ##### Similar methods: **`month`**, **`dayofweek`**, **`minute`**, **`second`**, etc.
126126

127127
# COMMAND ----------
128128

Apache-Spark-Programming-with-Databricks/ASP 3 - Functions/ASP 3.3L - Users.py

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,6 @@
2020

2121
# COMMAND ----------
2222

23-
details_df = (df
24-
.withColumn("items", explode("items"))
25-
.select("email", "items.item_name")
26-
.withColumn("details", split(col("item_name"), " "))
27-
)
28-
display(details_df)
29-
30-
# COMMAND ----------
31-
3223
# MAGIC %md ### 1. Extract item details from purchases
3324
# MAGIC
3425
# MAGIC - Explode the **`items`** field in **`df`** with the results replacing the existing **`items`** field

Apache-Spark-Programming-with-Databricks/ASP 3 - Functions/ASP 3.4 - Additional Functions.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -81,21 +81,21 @@
8181
# COMMAND ----------
8282

8383
# MAGIC %md ### Joining DataFrames
84-
# MAGIC The DataFrame <a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.join.html?highlight=join#pyspark.sql.DataFrame.join" target="_blank">**`join`**</a> method joins two DataFrames based on a given join expression. Several different types of joins are supported. For example:
84+
# MAGIC The DataFrame <a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.join.html?highlight=join#pyspark.sql.DataFrame.join" target="_blank">**`join`**</a> method joins two DataFrames based on a given join expression.
8585
# MAGIC
86-
# MAGIC ```
87-
# MAGIC # Inner join based on equal values of a shared column called "name" (i.e., an equi join)
88-
# MAGIC df1.join(df2, "name")
86+
# MAGIC Several different types of joins are supported:
8987
# MAGIC
90-
# MAGIC # Inner join based on equal values of the shared columns called "name" and "age"
91-
# MAGIC df1.join(df2, ["name", "age"])
88+
# MAGIC Inner join based on equal values of a shared column called "name" (i.e., an equi join)<br/>
89+
# MAGIC **`df1.join(df2, "name")`**
9290
# MAGIC
93-
# MAGIC # Full outer join based on equal values of a shared column called "name"
94-
# MAGIC df1.join(df2, "name", "outer")
91+
# MAGIC Inner join based on equal values of the shared columns called "name" and "age"<br/>
92+
# MAGIC **`df1.join(df2, ["name", "age"])`**
9593
# MAGIC
96-
# MAGIC # Left outer join based on an explicit column expression
97-
# MAGIC df1.join(df2, df1["customer_name"] == df2["account_name"], "left_outer")
98-
# MAGIC ```
94+
# MAGIC Full outer join based on equal values of a shared column called "name"<br/>
95+
# MAGIC **`df1.join(df2, "name", "outer")`**
96+
# MAGIC
97+
# MAGIC Left outer join based on an explicit column expression<br/>
98+
# MAGIC **`df1.join(df2, df1["customer_name"] == df2["account_name"], "left_outer")`**
9999

100100
# COMMAND ----------
101101

Apache-Spark-Programming-with-Databricks/ASP 4 - Performance/ASP 4.2 - Partitioning.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@
6262

6363
# COMMAND ----------
6464

65-
# MAGIC %md #### `repartition`
65+
# MAGIC %md #### **`repartition`**
6666
# MAGIC Returns a new DataFrame that has exactly **`n`** partitions.
6767
# MAGIC
6868
# MAGIC - Wide transformation
@@ -79,7 +79,7 @@
7979

8080
# COMMAND ----------
8181

82-
# MAGIC %md #### `coalesce`
82+
# MAGIC %md #### **`coalesce`**
8383
# MAGIC Returns a new DataFrame that has exactly **`n`** partitions, when fewer partitions are requested.
8484
# MAGIC
8585
# MAGIC If a larger number of partitions is requested, it will stay at the current number of partitions.

Apache-Spark-Programming-with-Databricks/ASP 4 - Performance/ASP 4.2L - De-Duping Data Lab.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@
5151
# COMMAND ----------
5252

5353
# MAGIC %md
54-
# MAGIC It's helpful to look at the file first, so you can check the format with `dbutils.fs.head()`.
54+
# MAGIC It's helpful to look at the file first, so you can check the format with **`dbutils.fs.head()`**.
5555

5656
# COMMAND ----------
5757

0 commit comments

Comments
 (0)