Skip to content

Pandas indexing bug raises TypeError when slicing with categorical IntervalIndex #21068

@antipisa

Description

@antipisa

Pandas indexing should not rely on subnormal floats behavior inside categorical data. Please bit cast your floats to integers when computing categorical labels:

def _get_next_label(label):

The following is an example of integer slicing with floating point interval endpoints that should return the first slice of the table:

import pandas as pd import numpy as np t = pd.DataFrame(dict(sym=np.arange(2), y=1., z=-1.)) t.loc[:, 'x'] = pd.Series([pd.Interval(-1., 0.0, closed='right'), pd.Interval(0.0, 1, closed='right')]) t.set_index('x', inplace=True) t.index = pd.Categorical(t.index) t.loc[t.index.categories[0], :] Out: sym 0.0 y 1.0 z -1.0 Name: (-1.0, 0.0], dtype: float64

However, this fails:

import daz daz.set_ftz() daz.set_daz() t = pd.DataFrame(dict(sym=np.arange(2), y=1., z=-1.)) t.loc[:, 'x'] = pd.Series([pd.Interval(-1., 0.0, closed='right'), pd.Interval(0.0, 1, closed='right')]) t.set_index('x', inplace=True) t.index = pd.Categorical(t.index) t.loc[t.index.categories[0], :]
TypeError Traceback (most recent call last) <ipython-input-3-3a8fe3a302cf> in <module>() 8 t.set_index('x', inplace=True) 9 t.index = pd.Categorical(t.index) ---> 10 t.loc[t.index.categories[0], :] /Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key) 1365 except (KeyError, IndexError): 1366 pass -> 1367 return self._getitem_tuple(key) 1368 else: 1369 # we by definition only have the 0th axis /Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup) 856 def _getitem_tuple(self, tup): 857 try: --> 858 return self._getitem_lowerdim(tup) 859 except IndexingError: 860 pass /Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_lowerdim(self, tup) 989 for i, key in enumerate(tup): 990 if is_label_like(key) or isinstance(key, tuple): --> 991 section = self._getitem_axis(key, axis=i) 992 993 # we have yielded a scalar ? /Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis) 1625 # fall thru to straight lookup 1626 self._has_valid_type(key, axis) -> 1627 return self._get_label(key, axis=axis) 1628 1629 /Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_label(self, label, axis) 143 raise IndexingError('no slices here, handle elsewhere') 144 --> 145 return self.obj._xs(label, axis=axis) 146 147 def _get_loc(self, key, axis=None): /Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in xs(self, key, axis, level, drop_level) 2342 drop_level=drop_level) 2343 else: -> 2344 loc = self.index.get_loc(key) 2345 2346 if isinstance(loc, np.ndarray): /Users/bohun/anaconda2/lib/python2.7/site-packages/pandas/core/indexes/category.pyc in get_loc(self, key, method) 410 if (codes == -1): 411 raise KeyError(key) --> 412 return self._engine.get_loc(codes) 413 414 def get_value(self, series, key): pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() TypeError: 'slice(0, 2, None)' is an invalid key

since the default behavior for floating endpoints forces the interval index to be cast into an integer slice. This is not ideal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCategoricalCategorical Data TypeIndexingRelated to indexing on series/frames, not to indexes themselvesIntervalInterval data type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions