Skip to content

DOC: floating point precision on writing/reading to csv #13159

@FBartlett

Description

@FBartlett

Code Sample

x0 = 18292498239.824 df1 = pd.DataFrame({'One': x0},index=["bignum"]) df1.to_csv('repr_test.csv') df2 = pd.DataFrame.from_csv('repr_test.csv') df3 = pd.read_csv('repr_test.csv') x1 = df1['One'][0] x2 = df2['One'][0] x3 = df3['One'][0] fh = open('repr_test.csv','rb') ll = fh.readlines() x4 = float(ll[1].split(',')[1].split()[0]) print "x0 = %f; x1 = %f; Are they equal? %s" % (x0,x1,(x0 == x1)) print "x0 = %f; x2 = %f; Are they equal? %s" % (x0,x2,(x0 == x2)) print "x0 = %f; x3 = %f; Are they equal? %s" % (x0,x3,(x0 == x3)) print "x0 = %f; x4 = %f; Are they equal? %s" % (x0,x4,(x0 == x4)) 

Expected Output

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True x0 = 18292498239.824001; x2 = 18292498239.824001; Are they equal? True x0 = 18292498239.824001; x3 = 18292498239.824001; Are they equal? True x0 = 18292498239.824001; x4 = 18292498239.824001; Are they equal? True 

output of pd.show_versions()

(Note that there are two, presented side-by-side, with results underneath)

INSTALLED VERSIONS INSTALLED VERSIONS ------------------ ------------------ commit: None commit: None python: 2.7.5.final.0 python: 2.7.11.final.0 python-bits: 64 python-bits: 64 OS: Linux OS: Linux OS-release: 2.6.32-431.56.1.el6.x86_64 OS-release: 2.6.32-431.56.1.el6.x86_64 machine: x86_64 machine: x86_64 processor: x86_64 processor: x86_64 byteorder: little byteorder: little LC_ALL: None LC_ALL: None LANG: en_US.UTF-8 LANG: en_US.UTF-8 pandas: 0.15.1 pandas: 0.18.0 nose: 1.3.4 nose: 1.3.7 Cython: 0.21.2 Cython: 0.23.4 numpy: 1.9.1 numpy: 1.10.4 scipy: 0.14.0 scipy: 0.17.0 statsmodels: 0.6.0 statsmodels: 0.6.1 IPython: 2.3.0 IPython: 4.1.2 sphinx: 1.2.3 sphinx: 1.3.5 patsy: 0.3.0 patsy: 0.4.0 dateutil: 2.2 dateutil: 2.5.1 pytz: 2014.9 pytz: 2016.2 bottleneck: None bottleneck: 1.0.0 tables: 3.1.1 tables: 3.2.2 numexpr: 2.4 numexpr: 2.5 matplotlib: 1.4.2 matplotlib: 1.5.1 openpyxl: None openpyxl: 2.3.2 xlrd: 0.9.3 xlrd: 0.9.4 xlwt: 0.7.5 xlwt: 1.0.0 xlsxwriter: 0.6.3 xlsxwriter: 0.8.4 lxml: 3.3.3 lxml: 3.6.0 bs4: 4.3.2 bs4: 4.4.1 html5lib: None html5lib: None httplib2: None httplib2: None apiclient: None apiclient: None rpy2: None sqlalchemy: None sqlalchemy: 1.0.12 pymysql: None pymysql: None psycopg2: None psycopg2: None pip: 8.1.1 xarray: None setuptools: 20.3 blosc: None jinja2: 2.8 boto: 2.39.0 

Results from left setup (0.15.1):

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True x0 = 18292498239.824001; x2 = 18292498239.823997; Are they equal? False x0 = 18292498239.824001; x3 = 18292498239.823997; Are they equal? False x0 = 18292498239.824001; x4 = 18292498239.824001; Are they equal? True 

Results from right setup (0.18.0):

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True x0 = 18292498239.824001; x2 = 18292498239.799999; Are they equal? False x0 = 18292498239.824001; x3 = 18292498239.799999; Are they equal? False x0 = 18292498239.824001; x4 = 18292498239.799999; Are they equal? False 

Expectations

I expect to be able to write a DataFrame to a csv file and later read it in to a new DataFrame such that the two DataFrames will be identical. The older version (result 0.15.1) is quite a bit better than the newer (since I can round to three decimal places to get the expected results or read from a filehandle instead of using from_csv() or read_csv()). The newer version (0.18.0) loses information, which is not acceptable.

Note that the documentation at http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.from_csv.html reads

It is preferable to use the more powerful pandas.read_csv() for most general purposes, but from_csv makes for an easy roundtrip to and from a file (the exact counterpart of to_csv), especially with a DataFrame of time series data.

But this does not describe what actually happens, as demonstrated above.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions