Add a column with the literal value in PySpark DataFrame

Add a column with the literal value in PySpark DataFrame

To add a column with a literal value to a DataFrame in PySpark, you can use the withColumn method in combination with the lit function from the pyspark.sql.functions module.

Here's an example of how to add a new column with a literal value to an existing DataFrame:

from pyspark.sql import SparkSession from pyspark.sql.functions import lit # Initialize Spark Session spark = SparkSession.builder \ .appName("AddLiteralColumn") \ .getOrCreate() # Create a DataFrame data = [('Alice', 1), ('Bob', 2), ('Cathy', 3)] df = spark.createDataFrame(data, ["name", "id"]) # Add a new column with a literal value df = df.withColumn('new_column', lit('literal_value')) # Show the resulting DataFrame df.show() 

This would produce output like:

+-----+---+------------+ | name| id| new_column| +-----+---+------------+ |Alice| 1|literal_value| | Bob| 2|literal_value| |Cathy| 3|literal_value| +-----+---+------------+ 

In this snippet, lit('literal_value') is used to create a column filled with the string 'literal_value'. The lit function can be used with various types of literals, not just strings. You can also use numbers, boolean values, and other literal types.


More Tags

swiftmailer android-pagetransformer aspnetboilerplate calculator raw-input associations android-mapview pytorch angular-providers imputation

More Programming Guides

Other Guides

More Programming Examples