Sorting is a fundamental operation in data manipulation and analysis that involves arranging data in a specific order.
Sorting is crucial for tasks such as organizing data for better readability, identifying patterns, making comparisons, and facilitating further analysis.
Sort DataFrame in Pandas
In Pandas, we can use the sort_values()
function to sort a DataFrame. For example,
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [28, 22, 25]} df = pd.DataFrame(data) # sort DataFrame by Age in ascending order sorted_df = df.sort_values(by='Age') print(sorted_df.to_string(index=False))
Output
Name Age Bob 22 Charlie 25 Alice 28
In the above example, df.sort_values(by='Age')
sorts the df DataFrame based on the values in the Age column in ascending order. And the result is stored in the sorted_df variable.
To sort values in descending order, we use the ascending parameter as:
sorted_df = df.sort_values(by='Age', ascending=False)
The output would be:
Name Age Alice 28 Charlie 25 Bob 22
Note: The .to_string(index=False)
is used to display values without the index.
Sort Pandas DataFrame by Multiple Columns
We can also sort DataFrame by multiple columns in Pandas. When we sort a Pandas DataFrame by multiple columns, the sorting is done with a priority given to the order of the columns listed.
To sort by multiple columns in Pandas, you can pass the desired columns as a list to the by
parameter in the sort_values()
method. Here's how we do it.
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 22, 30, 22], 'Score': [85, 90, 75, 80]} df = pd.DataFrame(data) # 1. Sort DataFrame by 'Age' and then by 'Score' (Both in ascending order) df1 = df.sort_values(by=['Age', 'Score']) print("Sorting by 'Age' (ascending) and then by 'Score' (ascending):\n") print(df1.to_string(index=False)) print() # 2. Sort DataFrame by 'Age' in ascending order, and then by 'Score' in descending order df2 = df.sort_values(by=['Age', 'Score'], ascending=[True, False]) print("Sorting by 'Age' (ascending) and then by 'Score' (descending):\n") print(df2.to_string(index=False))
Output
Name Age Score Bob 22 90 David 22 80 Alice 25 85 Charlie 30 75
Here,
- df1 shows the default sorting behavior (both columns in ascending order).
- df2 shows custom sorting, where
Age
is in ascending andScore
is in descending order.
Sort Pandas Series
In Pandas, we can use the sort_values()
function to sort a Series. For example,
import pandas as pd ages = pd.Series([28, 22, 25], name='Age') # sort Series in ascending order sorted_ages = ages.sort_values() print(sorted_ages.to_string(index=False))
Output
22 25 28
Here, ages.sort_values()
sorts the ages Series in ascending order. The sorted result is assigned to the sorted_ages variable.
#index Sort Pandas DataFrame Using sort_index()
We can also sort by the index of a DataFrame in Pandas using the sort_index()
function.
The sort_index()
function is used to sort a DataFrame or Series by its index. This is useful for organizing data in a logical order, improving query performance, and ensuring consistent data representation.
Let's look at an example.
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [28, 22, 25]} # create a DataFrame with a non-sequential index df = pd.DataFrame(data, index=[2, 0, 1]) print("Original DataFrame:") print(df.to_string(index=True)) print("\n") # sort DataFrame by index in ascending order sorted_df = df.sort_index() print("Sorted DataFrame by index:") print(sorted_df.to_string(index=True))
Output
Original DataFrame: Name Age 2 Alice 28 0 Bob 22 1 Charlie 25 Sorted DataFrame by index: Name Age 0 Bob 22 1 Charlie 25 2 Alice 28
In the above example, we have created the df DataFrame with a non-sequential index from the data dictionary.
The index
parameter is specified as [2, 0, 1]
, meaning that the rows will not have a default sequential index (0, 1, 2), but rather the provided non-sequential index.
Then we sorted the df DataFrame by its index in ascending order using the sort_index()
method.