Skip to content

Conversation

@stephenrauch
Copy link
Contributor

Fix for GH15376

In io/parsers/_try_convert_dates() when selecting columns based on a
column index from a set of columns with multi-level names, the column
name was converted to a string. This appears to be a bug since the
name was a tuple before the conversion. This causes problems
downstream when there is an attempt to use this name to lookup a
column, and that lookup fails because the desired column is keyed from
the tuple, not its string representation.


def test_parse_date_time_multi_level_column_name(self):
# GH 15376
result = conv.parse_date_time(self.dates, self.times)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what these 2 lines are doing, remove.

2001-01-05, 00:00:00, 1., 11.
"""
datecols = {'date_time': [0, 1]}
df = read_table(StringIO(data), sep=',', header=[0, 1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self.read_csv, this tests on all parsers (c/python)

datecols = {'date_time': [0, 1]}
df = read_table(StringIO(data), sep=',', header=[0, 1],
parse_dates=datecols, date_parser=conv.parse_date_time)
self.assertIn('date_time', df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct an expected frame, and use assert_frame_equal

- Bug in ``Series.replace`` and ``DataFrame.replace`` which failed on empty replacement dicts (:issue:`15289`)
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
- Bug in ``.read_csv()`` which caused ``parse_dates={'datetime': [0, 1]}`` to fail with multiline headers (:issue:`15376`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't put this as the last line, instead use an empty space, otherwise you will get conflicts.

Bug in .read_csv() where parse_dates with a list-of-integers specified would fail with multiline headers

@jreback jreback added Bug IO CSV read_csv, to_csv labels Feb 12, 2017
@codecov-io
Copy link

codecov-io commented Feb 12, 2017

Codecov Report

Merging #15378 into master will decrease coverage by -0.01%.
The diff coverage is 100%.

@@ Coverage Diff @@ ## master #15378 +/- ## ========================================== - Coverage 90.37% 90.37% -0.01%  ========================================== Files 135 135 Lines 49440 49454 +14 ========================================== + Hits 44681 44693 +12  - Misses 4759 4761 +2
Impacted Files Coverage Δ
pandas/io/parsers.py 95.51% <100%> (ø)
pandas/core/common.py 91.02% <ø> (-0.34%)
pandas/core/frame.py 97.82% <ø> (-0.05%)
pandas/tools/concat.py 97.62% <ø> (ø)
pandas/core/generic.py 96.33% <ø> (ø)
pandas/io/excel.py 79.64% <ø> (+0.24%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a8883b...030f5ec. Read the comment docs.

Fix for GH15376 In `io/parsers/_try_convert_dates()` when selecting columns based on a column index from a set of columns with multi-level names, the column `name` was converted to a string. This appears to be a bug since the `name` was a tuple before the conversion. This causes problems downstream when threre is an attempt to use this name to lookup a column, and that lookup fails becuase the desired column is keyed from the tuple, not its string representation.
@stephenrauch stephenrauch force-pushed the fix_read_csv_merge_datetime branch from 3ed8551 to 030f5ec Compare February 16, 2017 16:16
2001-01-06, 00:00:00, 1.0, 11.
"""
datecols = {'date_time': [0, 1]}
result = read_csv(StringIO(data), sep=',', header=[0, 1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be self.read_csv, but I can fix on the merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. A few more of these and hopefully I'll get it.

Copy link
Contributor

@jreback jreback Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha np. parser tests are a little tricky to understand because of this actually.

@jreback
Copy link
Contributor

jreback commented Feb 16, 2017

ok ping on green.

@jreback jreback added this to the 0.20.0 milestone Feb 16, 2017
@jreback
Copy link
Contributor

jreback commented Feb 23, 2017

can you update

@stephenrauch
Copy link
Contributor Author

@jreback, You asked for update 4 days back, but I thought this was OK. If you still need something, please let me know what.

@jreback jreback closed this in fb7dc7d Feb 27, 2017
@jreback
Copy link
Contributor

jreback commented Feb 27, 2017

closed via: fb7dc7d

thanks @stephenrauch

this test was in the wrong place (I had made a comment above, but not sure if you saw it).

In fact I think all of the pandas/tests/io/test_date_converters are in the wrong place and should simply be in pandas/tests/io/parsers/parse_dates.py (or equiv), so that they run under each parser. My guess is that this is an older file.

I'll create an issue about this.

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
In `io/parsers/_try_convert_dates()` when selecting columns based on a column index from a set of columns with multi- level names, the column `name` was converted to a string. This appears to be a bug since the `name` was a tuple before the conversion. This causes problems downstream when there is an attempt to use this name to lookup a column, and that lookup fails because the desired column is keyed from the tuple, not its string representation closes pandas-dev#15376 Author: Stephen Rauch <stephen.rauch+github@gmail.com> Closes pandas-dev#15378 from stephenrauch/fix_read_csv_merge_datetime and squashes the following commits: 030f5ec [Stephen Rauch] BUG: Parse two date columns broken in read_csv with multiple headers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug IO CSV read_csv, to_csv

3 participants