How to get name of dataframe column in PySpark?

How to get name of dataframe column in PySpark?

In PySpark, you can retrieve the names of all columns of a DataFrame using the columns attribute. This returns a list of column names.

Here's how you can do it:

from pyspark.sql import SparkSession # Create a Spark session spark = SparkSession.builder.appName("exampleApp").getOrCreate() # Sample dataframe data = [("Alice", 34), ("Bob", 45), ("Catherine", 29)] df = spark.createDataFrame(data, ["name", "age"]) # Get column names column_names = df.columns print(column_names) # Output: ['name', 'age'] 

If you want to get the name of a specific column by its position, you can access it using indexing:

first_column_name = df.columns[0] print(first_column_name) # Output: 'name' 

This way, you can retrieve the names of DataFrame columns in PySpark.


More Tags

xml-drawable vectorization w3c-validation emoticons joblib removeall unique-values innerhtml href mpmovieplayercontroller

More Programming Guides

Other Guides

More Programming Examples