Skip to content

Commit 6ca787d

Browse files
committed
🔥
1 parent 0aca822 commit 6ca787d

File tree

1 file changed

+19
-25
lines changed

1 file changed

+19
-25
lines changed

README.md

Lines changed: 19 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,21 @@
1-
# PySpark Cheat Sheet
2-
3-
🐍 📄 A quick reference guide to the most commonly used patterns and functions in PySpark SQL.
4-
5-
## Table of Contents
6-
7-
- [PySpark Cheat Sheet](#pyspark-cheat-sheet)
8-
- [Table of Contents](#table-of-contents)
9-
- [Common Patterns](#common-patterns)
10-
- [Importing Functions & Types](#importing-functions--types)
11-
- [Filtering](#filtering)
12-
- [Joins](#joins)
13-
- [Creating New Columns](#creating-new-columns)
14-
- [Coalescing Values](#coalescing-values)
15-
- [Casting, Nulls & Duplicates](#casting-nulls--duplicates)
16-
- [Column Operations](#column-operations)
17-
- [String Operations](#string-operations)
18-
- [String Filters](#string-filters)
19-
- [String Functions](#string-functions)
20-
- [Number Operations](#number-operations)
21-
- [Array Operations](#array-operations)
22-
- [Aggregation Operations](#aggregation-operations)
23-
- [Repartitioning](#repartitioning)
24-
- [UDFs (User Defined Functions)](#udfs-user-defined-functions)
1+
#### Table of Contents
2+
3+
- [Common Patterns](#common-patterns)
4+
- [Importing Functions & Types](#importing-functions--types)
5+
- [Filtering](#filtering)
6+
- [Joins](#joins)
7+
- [Creating New Columns](#creating-new-columns)
8+
- [Coalescing Values](#coalescing-values)
9+
- [Casting, Nulls & Duplicates](#casting-nulls--duplicates)
10+
- [Column Operations](#column-operations)
11+
- [String Operations](#string-operations)
12+
- [String Filters](#string-filters)
13+
- [String Functions](#string-functions)
14+
- [Number Operations](#number-operations)
15+
- [Array Operations](#array-operations)
16+
- [Aggregation Operations](#aggregation-operations)
17+
- [Repartitioning](#repartitioning)
18+
- [UDFs (User Defined Functions)](#udfs-user-defined-functions)
2519

2620
If you can't find what you're looking for, check out the [PySpark Official Documentation](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html) and add it here!
2721

@@ -226,7 +220,7 @@ df = df.groupBy('gender').agg(F.max('age').alias('max_age_by_gender'))
226220
df = df.groupBy('age').agg(F.collect_set('name').alias('person_names'))
227221
```
228222

229-
#### Repartitioning
223+
## Repartitioning
230224

231225
```python
232226
# Repartition – df.repartition(num_output_partitions)

0 commit comments

Comments
 (0)