In PySpark, you can create a new column in a DataFrame by mapping values from an existing column to a dictionary using the withColumn function along with F.create_map and F.col from the pyspark.sql.functions module. However, a more common and efficient way is to use the F.create_map function for a literal dictionary lookup within a F.when or F.expr clause.
Here's a step-by-step example to show you how to do this:
from pyspark.sql import SparkSession from pyspark.sql.functions import col, create_map, lit from itertools import chain # Initialize a SparkSession spark = SparkSession.builder.appName("example").getOrCreate() # Suppose we have the following DataFrame data = [("A",), ("B",), ("C",), ("D",)] df = spark.createDataFrame(data, ["letter"]) # And a dictionary for mapping mapping_dict = {"A": "Apple", "B": "Banana", "C": "Cherry"} # Convert dictionary to a column of key-value pairs mapping_expr = create_map([lit(x) for x in chain(*mapping_dict.items())]) # Add a new column with the mapped values df = df.withColumn("fruit", mapping_expr.getItem(col("letter"))) # Show the resulting DataFrame df.show() # Stop the SparkSession spark.stop() In this code snippet:
chain(*mapping_dict.items()) flattens the dictionary into a list of alternating keys and values.create_map creates a mapping expression from the flattened list.getItem(col("letter")) retrieves the value from the mapping for each entry in the 'letter' column.withColumn creates a new column in the DataFrame with the corresponding mapped values.Please note that if a key from the 'letter' column doesn't exist in the mapping_dict, the corresponding entry in the 'fruit' column will be null. If you have a default value for unmapped keys, you could use F.coalesce to substitute a fallback value for null results.
code-documentation pcf live-templates geopandas embedded-tomcat-8 ftp getter-setter text-to-speech chromium redirectstandardoutput