-
- Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Failing test
def test_pandas(): import tempfile import csv import pandas as pd import numpy as np df = pd.DataFrame.from_dict({'column': [1.0, 2.0]}) assert df['column'].dtype == np.dtype('float') with tempfile.TemporaryFile() as f: df.to_csv(f, quoting=csv.QUOTE_NONNUMERIC, index=False) f.seek(0) lines = f.read().splitlines() assert lines[0] == '"column"' assert not lines[1].startswith('"') # <--- THIS FAILS assert [1, 2] == map(float, lines[1:]) The issue is that the floats are being output wrapped with quotes, even though I requested QUOTE_NONNUMERIC.
The problem is that pandas.core.internals.FloatBlock.to_native_types (and by extension pandas.formats.format.FloatArrayFormatter.get_result_as_array) unconditionally formats the float array to a str array, which is then passed unchanged to the csv module and hence will be wrapped in quotes by that code.
I'm not 100% sure but the fix may be to have FloatBlock.to_native_types check if quoting is set, and if so to skip using the FloatArrayFormatter? I say this because pandas.indexes.base.Index._format_native_types already has a special case along these lines. This does seem a bit dirty though!
Here is an awful monkeypatch that works around the problem:
orig_to_native_types = pd.core.internals.FloatBlock.to_native_types def to_native_types(self, *args, **kwargs): if kwargs.get('quoting'): values = self.values slicer = kwargs.get('slicer') if slicer is not None: values = values[:, slicer] return values res = orig_to_native_types(self, *args, **kwargs) print 'FloatBlock.to_native_types', args, kwargs, '=', res return res pd.core.internals.FloatBlock.to_native_types = to_native_types output of pd.show_versions()
commit: None python: 2.7.9.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: None pandas: 0.18.0 nose: None pip: 8.1.1 setuptools: 7.0 Cython: 0.20.1 numpy: 1.11.0 scipy: 0.13.3 statsmodels: None xarray: None IPython: 3.2.1 sphinx: None patsy: 0.3.0 dateutil: 2.5.2 pytz: 2016.3 blosc: None bottleneck: 1.0.0 tables: None numexpr: None matplotlib: 1.3.1 openpyxl: 2.0.4 xlrd: 0.9.2 xlwt: None xlsxwriter: None lxml: 3.3.2 bs4: 4.2.0 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.11 pymysql: None psycopg2: None jinja2: 2.7.2 boto: None