DOC: floating point precision on writing/reading to csv

Code Sample

x0 = 18292498239.824 df1 = pd.DataFrame({'One': x0},index=["bignum"]) df1.to_csv('repr_test.csv') df2 = pd.DataFrame.from_csv('repr_test.csv') df3 = pd.read_csv('repr_test.csv') x1 = df1['One'][0] x2 = df2['One'][0] x3 = df3['One'][0] fh = open('repr_test.csv','rb') ll = fh.readlines() x4 = float(ll[1].split(',')[1].split()[0]) print "x0 = %f; x1 = %f; Are they equal? %s" % (x0,x1,(x0 == x1)) print "x0 = %f; x2 = %f; Are they equal? %s" % (x0,x2,(x0 == x2)) print "x0 = %f; x3 = %f; Are they equal? %s" % (x0,x3,(x0 == x3)) print "x0 = %f; x4 = %f; Are they equal? %s" % (x0,x4,(x0 == x4))

Expected Output

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True x0 = 18292498239.824001; x2 = 18292498239.824001; Are they equal? True x0 = 18292498239.824001; x3 = 18292498239.824001; Are they equal? True x0 = 18292498239.824001; x4 = 18292498239.824001; Are they equal? True

output of `pd.show_versions()`

(Note that there are two, presented side-by-side, with results underneath)

INSTALLED VERSIONS INSTALLED VERSIONS ------------------ ------------------ commit: None commit: None python: 2.7.5.final.0 python: 2.7.11.final.0 python-bits: 64 python-bits: 64 OS: Linux OS: Linux OS-release: 2.6.32-431.56.1.el6.x86_64 OS-release: 2.6.32-431.56.1.el6.x86_64 machine: x86_64 machine: x86_64 processor: x86_64 processor: x86_64 byteorder: little byteorder: little LC_ALL: None LC_ALL: None LANG: en_US.UTF-8 LANG: en_US.UTF-8 pandas: 0.15.1 pandas: 0.18.0 nose: 1.3.4 nose: 1.3.7 Cython: 0.21.2 Cython: 0.23.4 numpy: 1.9.1 numpy: 1.10.4 scipy: 0.14.0 scipy: 0.17.0 statsmodels: 0.6.0 statsmodels: 0.6.1 IPython: 2.3.0 IPython: 4.1.2 sphinx: 1.2.3 sphinx: 1.3.5 patsy: 0.3.0 patsy: 0.4.0 dateutil: 2.2 dateutil: 2.5.1 pytz: 2014.9 pytz: 2016.2 bottleneck: None bottleneck: 1.0.0 tables: 3.1.1 tables: 3.2.2 numexpr: 2.4 numexpr: 2.5 matplotlib: 1.4.2 matplotlib: 1.5.1 openpyxl: None openpyxl: 2.3.2 xlrd: 0.9.3 xlrd: 0.9.4 xlwt: 0.7.5 xlwt: 1.0.0 xlsxwriter: 0.6.3 xlsxwriter: 0.8.4 lxml: 3.3.3 lxml: 3.6.0 bs4: 4.3.2 bs4: 4.4.1 html5lib: None html5lib: None httplib2: None httplib2: None apiclient: None apiclient: None rpy2: None sqlalchemy: None sqlalchemy: 1.0.12 pymysql: None pymysql: None psycopg2: None psycopg2: None pip: 8.1.1 xarray: None setuptools: 20.3 blosc: None jinja2: 2.8 boto: 2.39.0

Results from left setup (0.15.1):

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True x0 = 18292498239.824001; x2 = 18292498239.823997; Are they equal? False x0 = 18292498239.824001; x3 = 18292498239.823997; Are they equal? False x0 = 18292498239.824001; x4 = 18292498239.824001; Are they equal? True

Results from right setup (0.18.0):

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True x0 = 18292498239.824001; x2 = 18292498239.799999; Are they equal? False x0 = 18292498239.824001; x3 = 18292498239.799999; Are they equal? False x0 = 18292498239.824001; x4 = 18292498239.799999; Are they equal? False

Expectations

I expect to be able to write a DataFrame to a csv file and later read it in to a new DataFrame such that the two DataFrames will be identical. The older version (result 0.15.1) is quite a bit better than the newer (since I can round to three decimal places to get the expected results or read from a filehandle instead of using from_csv() or read_csv()). The newer version (0.18.0) loses information, which is not acceptable.

Note that the documentation at http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.from_csv.html reads

It is preferable to use the more powerful pandas.read_csv() for most general purposes, but from_csv makes for an easy roundtrip to and from a file (the exact counterpart of to_csv), especially with a DataFrame of time series data.

But this does not describe what actually happens, as demonstrated above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

DOC: floating point precision on writing/reading to csv #13159

Code Sample

Expected Output

output of `pd.show_versions()`

(Note that there are two, presented side-by-side, with results underneath)

Results from left setup (0.15.1):

Results from right setup (0.18.0):

Expectations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

DOC: floating point precision on writing/reading to csv #13159

Description

Code Sample

Expected Output

output of pd.show_versions()

(Note that there are two, presented side-by-side, with results underneath)

Results from left setup (0.15.1):

Results from right setup (0.18.0):

Expectations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

output of `pd.show_versions()`