Pandas read_csv()

The read_csv() function in Pandas is used to convert a CSV file into a DataFrame.

Example

Let's suppose that sample_data.csv contains the following content:

 Employee ID,First Name,Last Name,Department,Position,Salary 101,John,Doe,Marketing,Manager,50000 102,Jane,Smith,Sales,Associate,35000 103,Michael,Johnson,Finance,Analyst,45000 104,Emily,Williams,HR,Coordinator,40000

Now, let's write code to read the above csv file using read_csv().

 import pandas as pd # load data from a CSV file df = pd.read_csv('sample_data.csv') print(df) ''' Output Employee ID First Name Last Name Department Position Salary 0 101 John Doe Marketing Manager 50000 1 102 Jane Smith Sales Associate 35000 2 103 Michael Johnson Finance Analyst 45000 3 104 Emily Williams HR Coordinator 40000 '''

read_csv() Syntax

The syntax for the read_csv() function in Pandas is:

 pd.read_csv(filepath_or_buffer, sep=',', header='infer', names=None, index_col=None, usecols=None, dtype=None, skiprows=None, nrows=None, na_values=None, parse_dates=False)

read_csv() Arguments

The read_csv() function takes the following common arguments:

  1. filepath_or_buffer: the path to the file or a file-like object
  2. sep or delimiter (optional): the delimiter to use
  3. header (optional): row number to use as column names
  4. names (optional): list of column names to use.
  5. index_col (optional): column(s) to set as index
  6. usecols (optional): return a subset of the columns
  7. dtype (optional): type for data or column(s)
  8. nrows (optional): number of rows of file to read
  9. na_values (optional): additional strings to recognize as NaN
  10. parse_dates (optional): boolean or list of integers or names or list of lists or dictionaries

read_csv() Return Value

The read_csv() function returns a DataFrame containing the data read from the CSV file.


Example 1: Basic CSV Reading

Let's suppose that sample_data.csv contains the following content:

 Employee ID,First Name,Last Name,Department,Position,Salary 101,John,Doe,Marketing,Manager,50000 102,Jane,Smith,Sales,Associate,35000 103,Michael,Johnson,Finance,Analyst,45000 104,Emily,Williams,HR,Coordinator,40000

Now, let's write code to read the above csv file using read_csv().

 import pandas as pd # load data from a CSV file df = pd.read_csv('sample_data.csv') print(df)

Output

  Employee ID First Name Last Name Department Position Salary 0 101 John Doe Marketing Manager 50000 1 102 Jane Smith Sales Associate 35000 2 103 Michael Johnson Finance Analyst 45000 3 104 Emily Williams HR Coordinator 40000

In this example, we read data from sample_data.csv and print the DataFrame.


Example 2: Skipping Rows and Setting Index Column

For this example, let's use the same csv file used in the first example (with comma as delimiter) .

 import pandas as pd # skip the first row and set the first column as the index df = pd.read_csv('sample_data.csv', skiprows=1, index_col=0) print(df)

Output

  101 John Doe Marketing Manager 50000 102 Jane Smith Sales Associate 35000 103 Michael Johnson Finance Analyst 45000 104 Emily Williams HR Coordinator 40000

Here, we skipped the first row, so the second row is automatically inferred to be the header. Also, we used the first column to be the index using index_col=0.


Example 3: Reading Selected Columns with Data Types

For this example, let's use the same file sample_data.csv.

 import pandas as pd # read specific columns and set their data types df = pd.read_csv('sample_data.csv', usecols=['First Name', 'Salary'], dtype={'First Name': str, 'Salary': float}) print(df)

Output

  First Name Salary 0 John 50000.0 1 Jane 35000.0 2 Michael 45000.0 3 Emily 40000.0

This example reads only the First Name and Salary columns from the file and sets the data type for each column.

Note: When working with large CSV files, you might want to consider parameters such as chunksize for reading the file in chunks, or an iterator to read the file piece by piece.


Example 4: Specifying Delimiter and Column Names

For this example, let's suppose that sample_data.csv has the following content:

 Employee ID;First Name;Last Name;Department;Position;Salary 101;John;Doe;Marketing;Manager;50000 102;Jane;Smith;Sales;Associate;35000 103;Michael;Johnson;Finance;Analyst;45000 104;Emily;Williams;HR;Coordinator;40000

Notice the use of ; as the delimiter. Now, let's read the CSV file separated by a delimiter.

 import pandas as pd # specify a delimiter and column names df = pd.read_csv('sample_data.csv', delimiter=';', names=['ID', 'Name', 'Surname', 'Dept', 'Position', 'Salary'], header=0) print(df)

Output

  ID Name Surname Dept Position Salary 0 101 John Doe Marketing Manager 50000 1 102 Jane Smith Sales Associate 35000 2 103 Michael Johnson Finance Analyst 45000 3 104 Emily Williams HR Coordinator 40000

In this example, we specified the delimiter to be ;. We also specified the column names manually using the names argument.

Here, the header=0 argument indicates that row 0 is the header.

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO
  • Interactive Courses
  • Certificates
  • AI Help
  • 2000+ Challenges