0% found this document useful (0 votes)
63 views7 pages

Importing - Exporting Data Between CSV Files and Data Frames

The document provides detailed instructions on how to import and export data between CSV files and DataFrames using Python's Pandas library. It includes examples of reading CSV files into DataFrames, saving DataFrames to CSV files, and various parameters that can be used with the read_csv() and to_csv() functions. Additionally, it covers common questions and coding exercises related to handling CSV files in Python.

Uploaded by

hrshvrdhnpradhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views7 pages

Importing - Exporting Data Between CSV Files and Data Frames

The document provides detailed instructions on how to import and export data between CSV files and DataFrames using Python's Pandas library. It includes examples of reading CSV files into DataFrames, saving DataFrames to CSV files, and various parameters that can be used with the read_csv() and to_csv() functions. Additionally, it covers common questions and coding exercises related to handling CSV files in Python.

Uploaded by

hrshvrdhnpradhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Importing/Exporting Data between CSV files and Data Frames.

Importing and Exporting Data between CSV Files and DataFrames


We can create a DataFrame by importing data from CSV files where values are separated by commas. Similarly,
we can also store or export data in a DataFrame as a .csv file.

Importing a CSV file to a DataFrame


Let us assume that we have the following data in a csv file named ResultData.csv stored in the folder C:/NCERT.
In order to practice the code while we progress, you are suggested to create this csv file using a spreadsheet and
save in your computer.
RollNo Name Eco Maths
1 Arnab 18 57
2 Kritika 23 45
3 Divyam 51 37
4 Vivaan 40 60
5 Aaroosh 18 27
We can load the data from the ResultData.csv file into a DataFrame, say marks using Pandas read_csv() function
as shown below:
>>> marks = pd.read_csv("C:/NCERT/ResultData.csv",sep =",", header=0)
>>> marks
RollNo Name Eco Maths
0 1 Arnab 18 57
1 2 Kritika 23 45
2 3 Divyam 51 37
3 4 Vivaan 40 60
4 5 Aaroosh 18 27
• The first parameter to the read_csv() is the name of the comma separated data file along with its path.
• The parameter sep specifies whether the values are separated by comma, semicolon, tab, or any other character.
The default value for sepis a space.
• The parameter header specifies the number of the row whose values are to be used as the column names. It also
marks the start of the data to be fetched. header=0 implies that column names are inferred from the first line of the
file. By default, header=0.
We can exclusively specify column names using the parameter names while creating the DataFrame using the
read_csv() function. For example, in the following statement, names parameter is used to specify the labels for
columns of the DataFrame marks1:

>>> marks1 = pd.read_csv("C:/NCERT/ResultData1.csv",sep=",",names=['RNo','StudentName', 'Sub1','Sub2'])


>>> marks1
RNo StudentName Sub1 Sub2
0 1 Arnab 18 57
1 2 Kritika 23 45
2 3 Divyam 51 37
3 4 Vivaan 40 60
4 5 Aaroosh 18 27

Exporting a DataFrame to a CSV file


We can use the to_csv() function to save a DataFrame to a text or csv file. For example, to save the DataFrame
ResultDF created in the previous section; we can use the following statement:

>>> ResultDF
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

>>> ResultDF.to_csv(path_or_buf='C:/NCERT/ resultout.csv', sep=',')

This creates a file by the name resultout.csv in the folder C:/NCERT on the hard disk. When we open this file in
any text editor or a spreadsheet, we will find the above data along with the row labels and the column headers,
separated by comma. In case we do not want the column names to be saved to the file we may use the parameter
header=False. Another parameter index=False is used when we do not want the row labels to be written to the file
on disk. For example:

>>> ResultDF.to_csv( 'C:/NCERT/resultonly.txt',sep = '@', header = False, index= False)

If we open the file resultonly.txt, we will find the following contents:


90@92@89@81@94
91@81@91@71@95
97@96@88@67@99

Questions:
Question 1
Are the following two statements same ? Why/Why not?
(i) pd.read_csv('zoo.csv', sep = ',')
(ii) pd.read_csv('zoo.csv')
Answer
Yes, the two statements are same. The reason is that when we don't explicitly specify the sep parameter
in pd.read_csv(), pandas assumes the default separator to be a comma (,). So, both statements are telling pandas to
read the CSV file "zoo.csv" with comma-separated values.
Question 2
How are following two codes similar or different? What output will they produce?
(i)
df = pd.read_csv("data.csv", nrows = 5)
print(df)

(ii)
df = pd.read_csv("data.csv")
print(df)

Answer
The two codes are similar in that they both use pd.read_csv() to read a CSV file named 'data.csv' into a pandas
DataFrame df. However, they differ in their usage of the nrows parameter. The first code uses the nrows parameter
with a value of 5, indicating that it reads only the first 5 rows of the CSV file. On the other hand, the second code
does not have the nrows parameter, so it reads the entire CSV file.
For code (i), the output will be a DataFrame containing the first 5 rows of the 'data.csv' file. For code (ii), the
output will be a DataFrame containing all the rows of the 'data.csv' file.
Question 3
Write Python statement to export the DataFrame to a CSV file named data.csv stored at D: drive.
Answer
DataFrame.to_csv('D:\\data.csv')
Question 4
What is the difference between following two statements ?
(i)
df.to_sql('houses', con = conn, if_exists = 'replace')
(ii)
df.to_sql('houses', con = conn, if_exists = 'replace', index = False)
Answer
The difference between the two statements is whether the DataFrame's index is included as a separate column in
the resulting SQL table. By default, when we use to_sql() without specifying the index parameter, index = True is
assumed, meaning that the DataFrame's index will be included in the SQL table, as in the first statement.
Setting index = False explicitly excludes the DataFrame's index from being saved as a separate column in the SQL
table, as in the second statement.

Question 5
Write a program to read details such as Item, Sales made in a DataFrame and then store this data in a CSV file.
Solution
import pandas as pd
data = {'Item': ['Apple', 'Banana', 'Orange', 'Grapes'],
'Sales': [100, 150, 80, 120]}
df = pd.DataFrame(data)
df.to_csv('one.csv', index = False)
Output
Item,Sales
Apple,100
Banana,150
Orange,80
Grapes,120
Question 6
Write a program to read data from a CSV file where separator character is '@'. Make sure that :
the top row is used as data, not as column headers.
only 20 rows are read into DataFrame.
Answer
Let the contents of the file bike.csv be the following:
Brand@Price
Honda@2500
Yamaha@2800
Suzuki@2100
Kawasaki@3200
Ducati@3500
BMW@4500
Harley-Davidson@5500
KTM@4000
Triumph@5300
Aprilia@4800
Indian@5700
Royal Enfield@3000
MV Agusta@4200
Moto Guzzi@4900
Victory@5600
Benelli@3900
Husqvarna@4800
Zero Motorcycles@6500
Energica@7500
Norton@5900
The program to read data from a CSV file is as follows:
import pandas as pd
d = pd.read_csv('one.csv', sep = '@', header = None, nrows = 20)
print(d)
Output
0 1
0 Brand Price
1 Honda 2500
2 Yamaha 2800
3 Suzuki 2100
4 Kawasaki 3200
5 Ducati 3500
6 BMW 4500
7 Harley-Davidson 5500
8 KTM 4000
9 Triumph 5300
10 Aprilia 4800
11 Indian 5700
12 Royal Enfield 3000
13 MV Agusta 4200
14 Moto Guzzi 4900
15 Victory 5600
16 Benelli 3900
17 Husqvarna 4800
18 Zero Motorcycles 6500
19 Energica 7500

Question 7
# To create and open a data frame using ‘Student_result.csv’ file using Pandas.
# To display row labels, column labels data types of each column and the dimensions
# To display the shape (number of rows and columns) of the CSV file.

import pandas as pd
import csv
#Reading the Data
df = pd.read_csv("student_result.csv")
# Display Name of Columns
print(df.columns)
# Display no of rows and column
print(df.shape)
# Display Column Names and their types
print(df.info())

7.IMPORTING/EXPORTING DATA FROM CSV FILE MULTIPLE CHOICE QUESTIONS-


CSV file can’t be open in which software-
Calc b. Excel c. Notepad d. Word
Ans - Word
Which function is used to write dataframe’s data into the csv file?
read_csv() b. to_csv() c. from_csv() d. None of these Ans- to_csv()
Which function is used to read data from CSV file and store in a dataframe?
read_csv() b. to_csv() c. from_csv() d. None of these Ans- read_csv()
Using read_csv(), if we want to avoid the first row as header,we use
Noheader=True b. skipheader=True c. header=None d. None of these Ans- header=None
To skip rows of a CSV file, which argument will be given in read_csv() ?
Skiprows b. skip_rows c. skip d. noread Ans – skiprows
Very Short Answered Questions-
Which argument is used to read specific number of rows from a csv file using read_csv()? Ans - nrows
Which argument is used to specify with read_csv() for separator character? Ans- sep
Write full form of csv ?
Ans – Comma Separated value
Which function is used to load the last row of a dataframe df? Ans – tail()
Which argument is used in read_csv() to specify your own headings? Ans - names
Can we skip more than one row of a csv file while loading into dataframe? Ans – Yes, by providing the row
numbers in a list eg. Skiprows=[3,5,7]
Short Answered Questions-
What are the advantages of using CSV files? Ans –
Simple and compact format
Easily import and export mong all spreadsheets and databases.
Common format for interchange.
Which softwares are used to open a CSV file? Ans – MS Excel, Notepad, Calc etc.
Give the syntax of using read_csv() function?
Ans - <Dataframe_name>=pandas.read_csv(<path of csv file>)

Give the syntax of using to_csv() function?


Ans - <Dataframe_name>.to_csv(<path of csv file>, sep=’,’)

How can we use the column of a csv file as index label for the dataframe? Ans – By using the index_col argument
in read_csv()
For eg. Df1=pd.read_csv(“employee.csv”,index_col=”id”)

Assertion(A) and Reasoning(R) based Question-


Mark the correct choice as

Both A and R are true and R is the correct explanation of A


Both A and R are true and R is not the correct explanation of A
A is true and R is False
A is false and R is True

Assertion(A): CSV files are mainly used in importing/exporting data for dataframe. Reason(R) : CSV files have
common format for data interchange.
Ans - Both A and R are true and R is the correct explanation of A

Assertion: While displaying the contents of the CSV file, the header argument can be used to avoid the header
line.
Reason: The value of the header argument will be “NaN” . Ans - A is true and R is False

Asssertion : we need to import pandas for using to_csv() in our program .to_csv() function is used to transfer data
from dataframe to csv file.
Reason: to_csv() function is used to transfer data from dataframe to csv file.
Ans - Both A and R are true and R is the correct explanation of A
Case Based Question
Create a CSV file named as “student.csv” and do the following tasks given below- a.) Read the csv file and create
a dataframe for it.
b.) Display the contents of the dataframe.
c.) Create a dataframe from the above CSV file and it should display the column header as 0,1,2,… rather than
file’s column header.
Coding
import pandas as pd

Df1=pd.read_csv(“student.csv”) print(Df1)
df2=pd.read_csv(“student.csv”,header=None)

Create a CSV file named as “staff.csv” and do the following tasks given below- a.) Read the csv file and create a
dataframe for it.

b.) Create a dataframe from the above CSV file and it should display the column headings as S_id,
S_name,S_dept,S_salary rather than file’s column header.
c.) Display the highest salary from it.

Coding import pandas as pd


Df1=pd.read_csv(“staff.csv”) print(Df1)
df2=pd.read_csv(“staff.csv”,names=[“S_id”,”S_name”,”S_salary”],skiprows=1) print(df2)
print(“highest salary::”,df2.S_salary.max())
Create a CSV file named as “Emp.csv” that contains 10 rows and do the following tasks given below- a.) Read the
csv file and create a dataframe for it.

b.) Create a dataframe from the above CSV file and it should display the first 5 rows of the file.
c.) Display the 3rd ,4th and 5th lines of the csv file.
Coding
import pandas as pd
Df1=pd.read_csv(“Emp.csv”) print(Df1)
df2=pd.read_csv(“student.csv”,nrows=5) print(df2)
print(df2.tail(3))

Long Answer based Questions-


Write a python program that will read a CSV file named as “Exam.csv” containing the name and marks of 5
subjects of students as (M1,M2,M3,M4,M5)and store in a dataframe and add column as “Total_marks” and
“Avg_marks” in dataframe that will display the sum and average of marks for each student and display the
dataframe after adding these additional column.
Coding
import pandas as pd
Df1=pd.read_csv(“Exam.csv”,names=[‘Name’,’M1’,’M2’,’M3’,’M4’,’M5’]) print(Df1)
df1[‘Total_marks’]=df1[‘M1’]+ df1[‘M2’]+df1[‘M3’]+ df1[‘M4’]+ df1[‘M5]
df1[‘Avg_marks’]=df1[‘Total_marks’]/5
print(“dataframe after adding the columns”)
print(df1)

Write a python program that will read a CSV file named as “Student.csv” containing the Rollno, Name and
Percentage of students. Create a dataframe that will load the contents of csv file and display it. Create another
dataframe that will store the first 5 rows of the csv file and display it.
Coding
import pandas as pd
Df1=pd.read_csv(“Student.csv”, names=[‘Rollno’,‘Name’,’Percentage’])
print(Df1)
df2=pd.read_csv(“Student.csv”, nrows=5) print(“First five rows of the csv file are”)
print(df2)
Write a python code that will read a CSV file named as “Employee.csv” ,
containing the Empid, Name and Salary . Create a dataframe that will load the contents of csv file and display it.
Create another dataframe that will store the last 5 rows of the csv file and display it.
Coding
import pandas as pd
Df1=pd.read_csv(“Employee.csv”, names=[‘Empid’,‘Name’,’Salary’])
print(Df1)
df2=pd.read_csv(“Student.csv”, sep=’,’,skiprows=1) print(“Last five rows of the csv file are”)
print(df2.tail(5))

You might also like