Append Pandas DataFrames Using for Loop
Last Updated : 20 Dec, 2024
When dealing with large datasets, we often need to combine dataframes into single dataframe. Usually concat() is used along with the for loop to append the dataframes. Let us consider an example:
Python import pandas as pd import numpy as np # Create some example DataFrames dataframes = [pd.DataFrame(np.random.rand(10, 5)) for _ in range(100)] # Efficient way: collect in a list and concatenate once combined_df = pd.concat(dataframes, ignore_index=True) # Display the result print(combined_df)
Output:
Append Pandas DataFrames Using for LoopHere we are generating 100 dataframes. Each dataframe comprises of 10 rows and 5 columns. Now using a for loop, we are iterating over the list of dataframes and finally using the concat method to append the dataframes. This is much more memory efficient.
Let us consider an another example: here we have 10 dataframes which are appended to the list with the help of list comprehension. Then using concat() we are concatenating all the dataframes.
Python import pandas as pd # Example DataFrames (Creating 10 DataFrames with simple values) dfs = [pd.DataFrame({'A': [i, i+1], 'B': [i+2, i+3]}) for i in range(0, 10)] # Concatenate all DataFrames in the list result = pd.concat(df_list, ignore_index=False) print(result)
Output:
Append Pandas DataFrames Using for LoopFrom the output we can see that the dataframes have been stacked one over the other. This technique is used for large datasets as it does not create dataframes in each iteration. Hence it is much more memory efficient.
Appending dataframes but with different columns
There can be scenarios when we need to append dataframes but each of them having different column names. So we need to preprocess the columns and append the dataframes using for loop and concat method.
Let us consider a scenario. Here we have three dataframes and each of them have different column names. Now we will first collect all the column names and use reindex in the for loop to ensure each dataframes has all the columns and append them to the list. Finally use concat to concatenate all the dataframes.
Python import pandas as pd # Creating 10 DataFrames with different columns df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]}) df3 = pd.DataFrame({'A': [9, 10], 'D': [11, 12]}) # List of DataFrames dfs = [df1, df2, df3] # List to store DataFrames for concatenation df_list = [] # Get all columns across the DataFrames all_columns = list(set(df1.columns).union(set(df2.columns), set(df3.columns))) # For loop to append DataFrames, reindexing them to the same column set for df in dfs: df = df.reindex(columns=all_columns) # Reindex with all columns df_list.append(df) # Concatenate all DataFrames result = pd.concat(df_list, ignore_index=True) print(result)
Output:
Append Pandas DataFrames Using for LoopFrom the output we can see that for those dataframes that do not have the particular column, it generates NaN value.
Append Pandas DataFrames Using for Loop - Examples
Example 1: Let us consider that we have list of dataframes. We will iterate over the list and for each iteration we will use concat method to concatenate the dataframes one by one.
Python import pandas as pd # Create sample DataFrames with different columns df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}) # List of DataFrames to concatenate dfs = [df1, df2] # Initialize an empty DataFrame to concatenate into result = pd.DataFrame() # For loop to concatenate DataFrames for df in dfs: result = pd.concat([result, df], ignore_index=True, sort=False) print(result)
Output:
Append Pandas DataFrames Using for LoopFrom the output we can see that all the columns are present in the final dataframe. The values which does not exist in a particular column are assigned NaN. This method is useful for small datasets since concat() creates a new dataframe in every iteration and consumes much more memory . So we can also use reindex() to preprocess the dataframes and concat at one go as well.
Example 2: Here we have three dataframes. So we will iterate and append the dataframes to the list. Lastly we will use concat() to combine all the dataframes that are present in the list.
Python import pandas as pd # Create sample DataFrames df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]}) df3 = pd.DataFrame({'A': [9, 10], 'B': [11, 12]}) # Append DataFrames to a list df_list = [] for i in range(1,4): df_list.append(eval(f'df{i}')) # Concatenate all DataFrames in the list result = pd.concat(df_list, ignore_index=True) print(result)
Output:
Append Pandas DataFrames Using for LoopSo here we have appended all the dataframes to a list using append method and then use concat() to combine the dataframes.