Simplest way to select a specific or multiple columns in pandas dataframe is by using bracket notation, where you place the column name inside square brackets. Let's consider following example:
import pandas as pd data = {'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'], 'Age': [25, 30, 22, 35, 28], 'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'], 'Salary': [50000, 55000, 40000, 70000, 48000] } df = pd.DataFrame(data) # select column Age by Bracket method score_column = df['Age'] print(score_column)
Output
0 25 1 30 2 22 3 35 4 28 Name: Age, dtype: int64
This method allows to easily access a single column of data. Now, let's select multiple columns, you need to pass a list of column names inside double brackets.
# Select both 'Age' and 'Salary' columns subset_columns = df[['Age', 'Salary']] print(subset_columns)
Output
Age Salary 0 25 50000 1 30 55000 2 22 40000 3 35 70000 4 28 48000
This approach enables to select and manipulate multiple columns simultaneously.
In addition to the this method, there are several other approaches to select columns in a Pandas DataFrame:
1. Selecting Columns with loc
The loc[] method selects rows and columns by label. When you want to select specific columns using labels, you can use this method to retrieve the desired columns efficiently.
selected_columns = df.loc[:, ['Name', 'Gender']] print(selected_columns)
Output
Name Gender 0 John Male 1 Alice Female 2 Bob Male 3 Eve Female 4 Charlie Male
2. Selecting Columns Using Iloc
The iloc[] method is used for selecting rows and columns by their integer index positions. This is helpful when you know the position of the columns rather than their names.
selected_with_iloc = df.iloc[:, [0, 1]] print(selected_with_iloc)
Output
Name Age 0 John 25 1 Alice 30 2 Bob 22 3 Eve 35 4 Charlie 28
3. Selecting Columns Using filter
The filter() method is useful when you want to select columns based on certain conditions, such as column names that match a specific pattern. This method can be used to select columns with a substring match or regex pattern.
# Select columns that contain 'Age' or 'Salary' filtered_columns = df.filter(like='Age') print(filtered_columns)
Output
Age 0 25 1 30 2 22 3 35 4 28
4. Selecting Columns by Data Type
If you want to select columns based on their data types (e.g., selecting only numeric columns), use the select_dtypes() method.
numeric_columns = df.select_dtypes(include=['number']) print(numeric_columns)
Output
Age Salary 0 25 50000 1 30 55000 2 22 40000 3 35 70000 4 28 48000
Here are some key takeaways:
- Use bracket notation (df['column_name']) for selecting a single column.
- Use double square brackets (df[['column1', 'column2']]) for selecting multiple columns.
- Explore loc[], iloc[], filter(), and select_dtypes() for more advanced selection techniques based on labels, positions, or conditions.