|
1 | | -# PySpark Cheat Sheet |
2 | | - |
3 | | -🐍 📄 A quick reference guide to the most commonly used patterns and functions in PySpark SQL. |
4 | | - |
5 | | -## Table of Contents |
6 | | - |
7 | | -- [PySpark Cheat Sheet](#pyspark-cheat-sheet) |
8 | | - - [Table of Contents](#table-of-contents) |
9 | | - - [Common Patterns](#common-patterns) |
10 | | - - [Importing Functions & Types](#importing-functions--types) |
11 | | - - [Filtering](#filtering) |
12 | | - - [Joins](#joins) |
13 | | - - [Creating New Columns](#creating-new-columns) |
14 | | - - [Coalescing Values](#coalescing-values) |
15 | | - - [Casting, Nulls & Duplicates](#casting-nulls--duplicates) |
16 | | - - [Column Operations](#column-operations) |
17 | | - - [String Operations](#string-operations) |
18 | | - - [String Filters](#string-filters) |
19 | | - - [String Functions](#string-functions) |
20 | | - - [Number Operations](#number-operations) |
21 | | - - [Array Operations](#array-operations) |
22 | | - - [Aggregation Operations](#aggregation-operations) |
23 | | - - [Repartitioning](#repartitioning) |
24 | | - - [UDFs (User Defined Functions)](#udfs-user-defined-functions) |
| 1 | +#### Table of Contents |
| 2 | + |
| 3 | +- [Common Patterns](#common-patterns) |
| 4 | + - [Importing Functions & Types](#importing-functions--types) |
| 5 | + - [Filtering](#filtering) |
| 6 | + - [Joins](#joins) |
| 7 | + - [Creating New Columns](#creating-new-columns) |
| 8 | + - [Coalescing Values](#coalescing-values) |
| 9 | + - [Casting, Nulls & Duplicates](#casting-nulls--duplicates) |
| 10 | +- [Column Operations](#column-operations) |
| 11 | +- [String Operations](#string-operations) |
| 12 | + - [String Filters](#string-filters) |
| 13 | + - [String Functions](#string-functions) |
| 14 | +- [Number Operations](#number-operations) |
| 15 | +- [Array Operations](#array-operations) |
| 16 | +- [Aggregation Operations](#aggregation-operations) |
| 17 | +- [Repartitioning](#repartitioning) |
| 18 | +- [UDFs (User Defined Functions)](#udfs-user-defined-functions) |
25 | 19 |
|
26 | 20 | If you can't find what you're looking for, check out the [PySpark Official Documentation](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html) and add it here! |
27 | 21 |
|
@@ -226,7 +220,7 @@ df = df.groupBy('gender').agg(F.max('age').alias('max_age_by_gender')) |
226 | 220 | df = df.groupBy('age').agg(F.collect_set('name').alias('person_names')) |
227 | 221 | ``` |
228 | 222 |
|
229 | | -#### Repartitioning |
| 223 | +## Repartitioning |
230 | 224 |
|
231 | 225 | ```python |
232 | 226 | # Repartition – df.repartition(num_output_partitions) |
|
0 commit comments