Get the data type of column in Pandas - Python



Pandas is a popular and powerful Python library commonly used for data analysis and manipulation. It offers a number of data structures, including the Series, DataFrame, and Panel, for working with tabular and time-series data.

Pandas DataFrame is a two-dimensional tabular data structure. In this article, we'll go through various methods for determining a column's data type in Pandas. There can be numerous cases where we have to find the data type of a column in Pandas DataFrame. Each column in a Pandas DataFrame can contain a different data type.

Before Moving forward, let's make a sample dataframe on which we have to Get the data type of column in Pandas

import pandas as pd # create a sample dataframe df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]}) print(df) 

Output

This python script prints the DataFrame that we have created.

 Vehicle name price 0 Supra 5000000 1 Honda 600000 2 Lamorghini 7000000 

The approaches that can be followed to complete the task are mentioned as below

Approaches

  • Using the dtypes attribute

  • Using select_dtypes()

  • Using the info() method

  • Using the describe() function

Now let's discuss each approach and how they can be used to get the data type of column in Pandas.

Method 1: Using the dtypes attribute

We can use the dtypes attribute for getting the data type of each column present in the DataFrame. This attribute will return a series with the data type of each column. Below syntax can be used:

Syntax

df.dtypes

Return Type data type of each column present in the DataFrame.

Algorithm

  • Import the Pandas library.

  • Create a DataFrame using the pd.DataFrame() function and pass the sample as a dictionary.

  • Use the dtypes attribute to get the data types of each column in the DataFrame.

  • Print the result to check the data types of each column.

Example 1

# import the Pandas library import pandas as pd # create a sample dataframe df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]}) # print the dataframe print("DataFrame:\n", df) # get the data types of each column print("\nData types of each column:") print(df.dtypes) 

Output

DataFrame: Vehicle name price 0 Supra 5000000 1 Honda 600000 2 Lamorghini 7000000 Data types of each column: Vehicle name object price int64 dtype: object 

Example 2

In this example, we are getting the data type of a single column of the DataFrame

# import the Pandas library import pandas as pd # create a sample dataframe df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]}) # print the dataframe print("DataFrame:\n", df) # get the data types of column named price print("\nData types of column named price:") print(df.dtypes['price']) 

Output

DataFrame: Vehicle name price 0 Supra 5000000 1 Honda 600000 2 Lamorghini 7000000 Data types of column named price: int64 

Method 2: Using select_dtypes()

We can use the select_dtypes() method for filtering out what data type columns we need. Based on the data types supplied as inputs, the select_dtypes() method returns a subset of the columns. This method allows us to choose the columns that belong to a specific data type and then determine the data type.

Algorithm

  • Import the Pandas library.

  • Create a DataFrame using pd.DataFrame() function and pass the given data as a dictionary.

  • Print the DataFrame to check the created data.

  • Use the select_dtypes() method to select the all the numeric columns from the DataFrame. Pass the list of data types that we want to select as an argument using the include parameter.

  • loop on the columns to iterate through each numeric column and print its data type.

Example

# import the Pandas library import pandas as pd # create a sample dataframe df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]}) # print the dataframe print("DataFrame:\n", df) # select the numeric columns numeric_cols = df.select_dtypes(include=['float64', 'int64']).columns # get the data type of each numeric column for col in numeric_cols: print("Data Type of column", col, "is", df[col].dtype) 

Output

DataFrame: Vehicle name price 0 Supra 5000000 1 Honda 600000 2 Lamorghini 7000000 Data Type of column price is int64 

Method 3: Using the info() method

We can also use the info() method for our task. The info() method provides us with a concise summary of a DataFrame, including the data type of each column. Below syntax can be used:

Syntax

DataFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)

Return Value None

Algorithm

  • Import the Pandas library.

  • Create a DataFrame using the pd.DataFrame() function and pass the above data as a dictionary.

  • Print the DataFrame to check the created data.

  • Use the info() method to get information about the DataFrame.

  • Print the information obtained from the info() method.

Example

# import the Pandas library import pandas as pd # create a sample dataframe df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]}) # print the dataframe print("DataFrame:\n", df) # use the info() method to get the data type of each column print(df.info()) 

Output

DataFrame: Vehicle name price 0 Supra 5000000 1 Honda 600000 2 Lamorghini 7000000 <class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Vehicle name 3 non-null object 1 price 3 non-null int64 dtypes: int64(1), object(1) memory usage: 176.0+ bytes None 

Method 4: Using the describe() function

The describe() method is used to generate descriptive statistics of a DataFrame, including the data type of each column.

Algorithm

  • Import the Pandas library using the import statement.

  • Create a DataFrame using the pd.DataFrame() function and pass the given data as a dictionary.

  • Print the DataFrame to check the created data.

  • Use the describe() method to get the descriptive statistics of the DataFrame.

  • Use the include parameter of the describe() method to 'all' for including all the columns in the descriptive statistics.

  • Get the data type of each column in the DataFrame using the dtypes attribute.

  • Print the data type of each column.

Example

# import the Pandas library import pandas as pd # create a sample dataframe df = pd.DataFrame({'Vehicle name': ['Supra', 'Honda', 'Lamorghini'],'price': [5000000, 600000, 7000000]}) # print the dataframe print("DataFrame:\n", df) # use the describe() method to get the descriptive statistics of the dataframe desc_stats = df.describe(include='all') # get the data type of each column dtypes = desc_stats.dtypes # print the data type of each column print("Data type of each column in the descriptive statistics:\n", dtypes) 

Output

DataFrame: Vehicle name price 0 Supra 5000000 1 Honda 600000 2 Lamorghini 7000000 Data type of each column in the descriptive statistics: Vehicle name object price float64 dtype: object 

Conclusion

We can efficiently complete various data manipulation and analysis jobs by knowing how to get the data type of each column. Each approach has its own advantages and disadvantages based on the method or function used. You can choose the method you want based on the complexity of the expression you want to have and your personal preference for writing the code.

Updated on: 2023-05-29T12:33:09+05:30

19K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements