Skip to content

Conversation

@alanbato
Copy link
Contributor

@alanbato alanbato commented Sep 17, 2017

verify_integrity now also checks if any level has non-unique values and raises ValueError if one does.

However, some tests were broken due to this new behaviour.
I'd like to know what should I do in this case, should I add verify_integrity=False to those tests, change them, or do something else?

Also, how should I state this in the whatsnew file and where?

Thank you for your time! 🐼 🐍

for i, level in enumerate(levels):
if len(level) != len(set(level)):
raise ValueError("Level values must be unique: %s "
"on level %d" % ([value for value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use .format syntax instead of %? I realize that other places within this file use % formatting, but there's an ongoing effort to transition all % formatting in the pandas codebase to .format, so might as well minimize the number of changes that will need to be made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! I do prefer .format but decided to stick with % because of the other tests. Thank you for telling me, I'll keep it in mind in future contributions :)

for i, level in enumerate(levels):
if len(level) != len(set(level)):
raise ValueError("Level values must be unique: {0}"
" on level {1}".format([value for value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your check is not checking the values

i would be surprised that you have any failures though it is possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking if all the values inside a level are unique by comparing its length with the length of the set containing all the unique values. Should I be doing it in some other way? Maybe I misunderstood what needs to be checked.

Here are the tests that failed with the new check
========================================================================================== FAILURES ========================================================================================== __________________________________________________________________________________ TestMultiIndex.test_is_ ___________________________________________________________________________________ self = <pandas.tests.indexes.test_multi.TestMultiIndex object at 0x7fb0e671a780> def test_is_(self): mi = MultiIndex.from_tuples(lzip(range(10), range(10))) assert mi.is_(mi) assert mi.is_(mi.view()) assert mi.is_(mi.view().view().view().view()) mi2 = mi.view() # names are metadata, they don't change id mi2.names = ["A", "B"] assert mi2.is_(mi) assert mi.is_(mi2) assert mi.is_(mi.set_names(["C", "D"])) mi2 = mi.view() mi2.set_names(["E", "F"], inplace=True) assert mi.is_(mi2) # levels are inherent properties, they change identity mi3 = mi2.set_levels([lrange(10), lrange(10)]) assert not mi3.is_(mi2) # shouldn't change assert mi2.is_(mi) mi4 = mi3.view() > mi4.set_levels([[1 for _ in range(10)], lrange(10)], inplace=True) pandas/tests/indexes/test_multi.py:1584: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/core/indexes/multi.py:254: in set_levels verify_integrity=verify_integrity) pandas/core/indexes/multi.py:183: in _set_levels self._verify_integrity(levels=new_levels) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], names=['E', 'F']) labels = FrozenList([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]), levels = FrozenList([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]) def _verify_integrity(self, labels=None, levels=None): """ Parameters ---------- labels : optional list Labels to check for validity. Defaults to current labels. levels : optional list Levels to check for validity. Defaults to current levels. Raises ------ ValueError * if length of levels and labels don't match or any label would exceed level bounds """ # NOTE: Currently does not check, among other things, that cached # nlevels matches nor that sortorder matches actually sortorder. labels = labels or self.labels levels = levels or self.levels if len(levels) != len(labels): raise ValueError("Length of levels and labels must match. NOTE:" " this index is in an inconsistent state.") label_length = len(self.labels[0]) for i, (level, label) in enumerate(zip(levels, labels)): if len(label) != label_length: raise ValueError("Unequal label lengths: %s" % ([len(lab) for lab in labels])) if len(label) and label.max() >= len(level): raise ValueError("On level %d, label max (%d) >= length of" " level (%d). NOTE: this index is in an" " inconsistent state" % (i, label.max(), len(level))) for i, level in enumerate(levels): if len(level) != len(set(level)): raise ValueError("Level values must be unique: {0}" " on level {1}".format([value for value > in level], i)) E ValueError: Level values must be unique: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1] on level 0 pandas/core/indexes/multi.py:154: ValueError ____________________________________________________________________ TestMultiIndex.test_level_setting_resets_attributes _____________________________________________________________________ self = <pandas.tests.indexes.test_multi.TestMultiIndex object at 0x7fb0e69834e0> def test_level_setting_resets_attributes(self): ind = MultiIndex.from_arrays([ ['A', 'A', 'B', 'B', 'B'], [1, 2, 1, 2, 3] ]) assert ind.is_monotonic ind.set_levels([['A', 'B', 'A', 'A', 'B'], [2, 1, 3, -2, 5]], > inplace=True) pandas/tests/indexes/test_multi.py:2387: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/core/indexes/multi.py:254: in set_levels verify_integrity=verify_integrity) pandas/core/indexes/multi.py:183: in _set_levels self._verify_integrity(levels=new_levels) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = MultiIndex(levels=[['A', 'B'], [1, 2, 3]], labels=[[0, 0, 1, 1, 1], [0, 1, 0, 1, 2]]), labels = FrozenList([[0, 0, 1, 1, 1], [0, 1, 0, 1, 2]]) levels = FrozenList([['A', 'B', 'A', 'A', 'B'], [2, 1, 3, -2, 5]]) def _verify_integrity(self, labels=None, levels=None): """ Parameters ---------- labels : optional list Labels to check for validity. Defaults to current labels. levels : optional list Levels to check for validity. Defaults to current levels. Raises ------ ValueError * if length of levels and labels don't match or any label would exceed level bounds """ # NOTE: Currently does not check, among other things, that cached # nlevels matches nor that sortorder matches actually sortorder. labels = labels or self.labels levels = levels or self.levels if len(levels) != len(labels): raise ValueError("Length of levels and labels must match. NOTE:" " this index is in an inconsistent state.") label_length = len(self.labels[0]) for i, (level, label) in enumerate(zip(levels, labels)): if len(label) != label_length: raise ValueError("Unequal label lengths: %s" % ([len(lab) for lab in labels])) if len(label) and label.max() >= len(level): raise ValueError("On level %d, label max (%d) >= length of" " level (%d). NOTE: this index is in an" " inconsistent state" % (i, label.max(), len(level))) for i, level in enumerate(levels): if len(level) != len(set(level)): raise ValueError("Level values must be unique: {0}" " on level {1}".format([value for value > in level], i)) E ValueError: Level values must be unique: ['A', 'B', 'A', 'A', 'B'] on level 0 pandas/core/indexes/multi.py:154: ValueError ================================================================ 2 failed, 189 passed, 2 skipped, 1 xfailed in 18.21 seconds ================================================================= 
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The very first example is wrong

In [17]: mi Out[17]: MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], names=['E', 'F']) In [18]: mi.levels[0] Out[18]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='E') In [19]: mi.levels[1] Out[19]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name='F') In [20]: [set(level) for i, level in enumerate(mi.levels)] Out[20]: [{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}] 
In [22]: list(map(lambda x: x.is_unique, mi.levels)) Out[22]: [True, True] 

you can prob do something like this (or rather iterate to show exactly where the error is)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the example from the issue

In [31]: idx0 = range(2) ...: idx1 = np.repeat(range(2), 2) ...: ...: midx = pd.MultiIndex( ...: levels=[idx0, idx1], ...: labels=[ ...: np.repeat(range(len(idx0)), len(idx1)), ...: np.tile(range(len(idx1)), len(idx0)) ...: ], ...: names=['idx0', 'idx1'] ...: ) ...: In [32]: list(map(lambda x: x.is_unique, midx.levels)) Out[32]: [True, False] 
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the first example throwing out an error because it's replacing those levels with these?
levels = FrozenList([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

And the is_unique method is indeed a more clear way to do it, thanks!

Also, I'm having troubles running the performance checks with asv 😞 Some weird I/O shutil error while trying to do pip wheel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I don't know if the conversation got lost due to the changes on that piece of code, so I'm pinging in case you thought I didn't reply. If you're just busy, sorry to bother you!

@TomAugspurger if you have time could you look at it? :)

Thanks, both of you!

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

needs performance checking

whatsnew is in the other api changes section

@jreback jreback added Error Reporting Incorrect or improved errors from pandas MultiIndex labels Sep 17, 2017
@TomAugspurger
Copy link
Contributor

from #17557 (comment)

Isn't the first example throwing out an error because it's replacing those levels with these?
levels = FrozenList([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
And the is_unique method is indeed a more clear way to do it, thanks!
Also, I'm having troubles running the performance checks with asv 😞 Some weird I/O shutil > error while trying to do pip wheel

For the asv, are you using conda? That's typically easiest.

@alanbato
Copy link
Contributor Author

alanbato commented Sep 20, 2017

@TomAugspurger I ran the command again and I got it to work, but I'm seeing really weird results in the performance test, everything is either above or below 10% . Are these normal?

+ 96.5±0.2ms 367±5ms 3.80 frame_methods.Reindex.time_reindex_axis1 + 40.4±0.08ms 122±5ms 3.02 frame_methods.Reindex.time_reindex_both_axes + 40.9±0.2ms 111±0.3ms 2.72 frame_methods.Reindex.time_reindex_both_axes_ix + 93.2±0.1ms 243±4ms 2.61 frame_methods.Dropna.time_count_level_axis0_multi + 29.1±2ms 74.5±7ms 2.56 frame_methods.Shift.time_shift_axis0 + 29.9±0.2ms 74.8±0.3ms 2.50 frame_methods.Shift.time_shift_axis_1 + 84.3±0.5ms 190±3ms 2.25 frame_methods.Dropna.time_count_level_axis1_multi + 96.0±0.05ms 213±2ms 2.21 frame_methods.Dropna.time_count_level_axis1_mixed_dtypes_multi + 109±0.08ms 231ms 2.13 frame_methods.Dropna.time_count_level_axis0_mixed_dtypes_multi + 105±0.1ms 219±0.7ms 2.10 frame_methods.Dropna.time_dropna_axis1_all + 104±0.03ms 198ms 1.90 frame_methods.Dropna.time_dropna_axis0_all + 33.9±0.9ms 58.5±2ms 1.73 binary_ops.TimeseriesTZ.time_timestamp_ops_diff1 + 452±2ms 731ms 1.62 frame_methods.Dropna.time_dropna_axis0_all_mixed_dtypes + 996±7ms 1.55±0s 1.56 gil.NoGilGroupby.time_groups_2 + 11.0±0.02ms 16.7ms 1.51 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('LastWeekOfMonth', 1) + 11.3±0.03ms 17.1±0.03ms 1.51 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('FY5253_1', 2) + 5.22±0.01ms 7.81±0.3ms 1.50 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('CustomBusinessDay', 1) + 47.8±0.01ms 70.6ms 1.48 frame_methods.Dropna.time_dropna_axis1_any + 50.9±0.04ms 74.9ms 1.47 frame_methods.Dropna.time_dropna_axis0_any + 2.97ms 4.25ms 1.43 frame_methods.Iteration.time_iteritems_cached + 1.28±0ms 1.82±0.03ms 1.42 groupby.GroupBySuite.time_first('float', 100) + 90.4±0.5ms 126±4ms 1.40 eval.Eval.time_chained_cmp('python', 1) + 450±0ms 617ms 1.37 frame_methods.Dropna.time_dropna_axis1_all_mixed_dtypes + 1.10±0.01ms 1.51±0.04ms 1.37 frame_methods.frame_boolean_row_select.time_frame_boolean_row_select + 5.36±0.01ms 7.31±0.3ms 1.36 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Minute', 1) + 1.34±0.01ms 1.81±0.08ms 1.34 groupby.GroupBySuite.time_last('float', 100) + 1.27±0ms 1.70±0.06ms 1.34 groupby.GroupBySuite.time_last('int', 100) + 2.19±0ms 2.93±0.08ms 1.34 groupby.GroupBySuite.time_head('float', 100) + 1.27±0.01ms 1.67±0.03ms 1.32 groupby.GroupBySuite.time_first('int', 100) + 5.48±0.01ms 7.18±0.02ms 1.31 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('DateOffset', 2) + 2.13±0.01ms 2.78±0.07ms 1.30 groupby.GroupBySuite.time_head('int', 100) + 6.29±0.03ms 8.18±0.2ms 1.30 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Milli', 2) + 6.47±0.01ms 8.41±0.02ms 1.30 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('QuarterBegin', 1) + 6.27±0.02ms 8.13±0.03ms 1.30 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Micro', 2) + 5.85±0.01ms 7.57±0.01ms 1.29 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Easter', 1) + 40.0±2ms 51.6±2ms 1.29 eval.Eval.time_mult('python', 1) + 23.6±0.09ms 30.5±0.2ms 1.29 binary_ops.Timeseries.time_timestamp_ops_diff1 + 29.5±0.05ms 38.0±0.2ms 1.29 frame_methods.Apply.time_apply_lambda_mean + 138±4ms 178±5ms 1.29 gil.NoGilGroupby.time_count_2 + 7.85±0.07ms 10.1±0ms 1.29 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('CustomBusinessHour', 1) + 6.29±0.02ms 8.08±0.01ms 1.28 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Second', 2) + 425±1ms 546±0.8ms 1.28 frame_methods.Apply.time_apply_axis_1 + 6.51±0.03ms 8.32±0.02ms 1.28 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('BYearBegin', 2) + 7.29±0.02ms 9.30±0.02ms 1.28 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('BusinessHour', 1) + 682ms 866ms 1.27 gil.nogil_read_csv.time_read_csv + 29.3μs 37.2μs 1.27 timeseries.DatetimeIndex.time_timestamp_tzinfo_cons + 6.20±0.01ms 7.86±0.01ms 1.27 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('CustomBusinessDay', 2) + 6.63±0.01ms 8.39±0ms 1.27 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('BMonthEnd', 2) + 674ms 853ms 1.27 frame_methods.Iteration.time_iteritems_indexing + 4.59±0.02ms 5.80±0.2ms 1.26 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('DateOffset', 1) + 175±2ms 221±2ms 1.26 groupby.GroupBySuite.time_mad('int', 100) + 17.2s 21.7s 1.26 groupby.GroupBySuite.time_mad('int', 10000) + 132±0.2ms 164±0.1ms 1.25 frame_ctor.FromDicts.time_frame_ctor_list_of_dict + 70.4ms 87.8ms 1.25 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('FY5253Quarter_1', 2) + 393±0.4ms 489ms 1.25 frame_methods.Dropna.time_dropna_axis0_any_mixed_dtypes + 108±0.02ms 134±7ms 1.24 frame_methods.frame_getitem_single_column.time_frame_getitem_single_column + 43.2±0.05ms 53.6±0ms 1.24 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('FY5253_2', 2) + 394±0.06ms 489ms 1.24 frame_methods.Dropna.time_dropna_axis1_any_mixed_dtypes + 6.19±0.04ms 7.68±0.2ms 1.24 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Second', 1) + 80.7±0.4μs 100±0.1μs 1.24 timeseries.Offsets.time_custom_bday_apply_dt64 + 131±0.04ms 163ms 1.24 frame_methods.Iteration.time_iteritems + 5.89±0.06ms 7.30±0.07ms 1.24 binary_ops.Timeseries.time_series_timestamp_compare + 42.8ms 52.7ms 1.23 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('FY5253_2', 1) + 27.5s 33.9s 1.23 groupby.GroupBySuite.time_mad('float', 10000) + 2.73±0.01s 3.36±0.03s 1.23 gil.NoGilGroupby.time_groups_4 + 4.99±0.06ms 6.13±0.2ms 1.23 groupby.GroupBySuite.time_last('int', 10000) + 274±2ms 336±30ms 1.23 groupby.GroupBySuite.time_mad('float', 100) + 42.5±0.2μs 52.0±0.2μs 1.22 timeseries.Offsets.time_timeseries_year_apply + 61.1±0.4ms 74.4±1ms 1.22 groupby.GroupBySuite.time_diff('int', 100) + 6.25±0.05ms 7.62±0.01ms 1.22 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Micro', 1) + 44.3±0.9ms 53.8±1ms 1.22 groupby.FirstLast.time_groupby_nth_none('datetime') + 1.25±0ms 1.51±0.04ms 1.22 groupby.GroupBySuite.time_cummin('int', 100) + 120±0.3μs 146±0.2μs 1.21 timeseries.Offsets.time_timeseries_day_incr + 214±2ms 260±0.3ms 1.21 timedelta.ToTimedelta.time_convert_coerce + 28.2±0.06ms 34.1±2ms 1.21 frame_methods.Apply.time_apply_np_mean + 6.12±0.01ms 7.38±0.2ms 1.21 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('CDay', 1) + 1.24±0ms 1.50±0.03ms 1.20 groupby.GroupBySuite.time_cummax('int', 100) + 5.97±0.02ms 7.18±0.03ms 1.20 frame_ctor.FromDicts.time_series_ctor_from_dict + 75.7μs 90.8μs 1.20 index_object.Float64.time_construct + 11.1s 13.2s 1.20 panel_methods.PanelMethods.time_pct_change_items + 87.5μs 105μs 1.20 inference.to_numeric.time_from_float + 5.14±0ms 6.14±0.4ms 1.20 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('BDay', 1) + 114±0.8ms 136±0.6ms 1.19 frame_ctor.FromDicts.time_frame_ctor_nested_dict + 5.31s 6.31s 1.19 gil.NoGilGroupby.time_groups_8 + 8.96s 10.6s 1.19 groupby.GroupBySuite.time_diff('float', 10000) + 582±2μs 687±1μs 1.18 period.period_standard_indexing.time_intersection + 6.93±0.06ms 8.18±0.09ms 1.18 groupby.GroupBySuite.time_head('int', 10000) + 5.29±0.05ms 6.22±0.1ms 1.18 groupby.GroupBySuite.time_first('int', 10000) + 1.85±0.01ms 2.18±0.04ms 1.18 groupby.GroupBySuite.time_cumprod('float', 100) + 22.6±0.1μs 26.5±0.1μs 1.17 indexing.DataFrameIndexing.time_get_value_ix + 1.20s 1.40s 1.17 packers.Excel.time_write_excel_xlsxwriter + 59.5±0.06ms 69.8±0.8ms 1.17 categoricals.Categoricals2.time_value_counts_dropna + 893±1μs 1.05±0ms 1.17 indexing.MultiIndexing.time_frame_xs_mi_ix + 8.12s 9.52s 1.17 panel_methods.PanelMethods.time_pct_change_major + 233±0.4μs 273±0.3μs 1.17 timeseries.SemiMonthOffset.time_begin_decr_n + 47.9s 56.0s 1.17 groupby.GroupBySuite.time_describe('int', 10000) + 208±0.2μs 242±0.3μs 1.16 timeseries.SemiMonthOffset.time_begin_incr + 3.95±0.01ms 4.60±0.3ms 1.16 categoricals.Categoricals.time_constructor_datetimes_with_nat + 1.29±0.01ms 1.50±0.05ms 1.16 groupby.GroupBySuite.time_cumsum('int', 100) + 2.89±0.01ms 3.35±0.3ms 1.16 frame_methods.frame_from_records_generator.time_frame_from_records_generator_nrows + 9.17±0.03ms 10.6±0.1ms 1.16 groupby.GroupBySuite.time_head('float', 10000) + 146±0.3ms 168±3ms 1.16 groupby.GroupBySuite.time_pct_change('int', 100) + 1.21±0ms 1.40±0.01ms 1.16 groupby.GroupBySuite.time_var('float', 100) + 8.06s 9.31s 1.15 panel_methods.PanelMethods.time_pct_change_minor + 5.60±0.04ms 6.46±0.1ms 1.15 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('MonthBegin', 2) + 15.7±0.1ms 18.1±0.4ms 1.15 algorithms.Algorithms.time_add_overflow_pos_scalar + 31.0±0.1ms 35.8±0.6ms 1.15 eval.Eval.time_add('numexpr', 1) + 1.39±0ms 1.60±0.05ms 1.15 groupby.GroupBySuite.time_cumsum('float', 100) + 5.65s 6.50s 1.15 groupby.GroupBySuite.time_diff('int', 10000) + 3.26s 3.76s 1.15 timeseries.SeriesArithmetic.time_add_offset_slow + 97.2±2ms 112±3ms 1.15 eval.Eval.time_and('python', 1) + 154ms 177ms 1.15 frame_methods.Iteration.time_itertuples + 4.99±0.03ms 5.73±0.2ms 1.15 groupby.GroupBySuite.time_cummax('int', 10000) + 6.56±0.01ms 7.53±0.1ms 1.15 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('SemiMonthBegin', 1) + 9.84ms 11.3ms 1.15 index_object.Float64.time_boolean_series_indexer + 1.31±0ms 1.50±0.04ms 1.14 groupby.GroupBySuite.time_cummax('float', 100) + 7.73±0.06ms 8.84±0.1ms 1.14 groupby.GroupBySuite.time_last('float', 10000) + 1.76±0.01ms 2.01±0.01ms 1.14 groupby.GroupBySuite.time_cumprod('int', 100) + 92.4±0.3ms 106±3ms 1.14 groupby.GroupBySuite.time_diff('float', 100) + 6.68ms 7.62ms 1.14 inference.DtypeInfer.time_int64 + 1.29±0.01ms 1.47±0.02ms 1.14 groupby.GroupBySuite.time_cumcount('float', 100) + 6.04±0.04ms 6.86±0.7ms 1.14 binary_ops.TimeseriesTZ.time_series_timestamp_compare + 5.90±0.03ms 6.70±0.3ms 1.14 binary_ops.Timeseries.time_timestamp_series_compare + 392±1ms 446±9ms 1.14 frame_methods.frame_insert_100_columns_begin.time_frame_insert_500_columns_end + 5.02±0.05ms 5.70±0.05ms 1.14 groupby.GroupBySuite.time_cummin('int', 10000) + 5.80±0.03ms 6.58±0.1ms 1.14 groupby.GroupBySuite.time_cumprod('int', 10000) + 770±6ms 874±8ms 1.13 groupby.GroupBySuite.time_describe('float', 100) + 22.8s 25.9s 1.13 gil.nogil_datetime_fields.time_datetime_field_day + 7.74±0.06ms 8.78±0.1ms 1.13 groupby.GroupBySuite.time_first('float', 10000) + 108±0.5ms 122±0.3ms 1.13 frame_methods.FrameIsnull.time_isnull_obj + 819±2μs 926±2μs 1.13 indexing.MultiIndexing.time_series_xs_mi_ix + 479±2μs 541±20μs 1.13 frame_methods.frame_get_dtype_counts.time_frame_get_dtype_counts + 101±0.3ms 114±0.2ms 1.13 frame_methods.FrameIsnull.time_isnull_strngs + 9.86±0.06ms 11.1±0.8ms 1.13 binary_ops.Ops.time_frame_mult(False, 'default') + 114±1ms 128±1ms 1.13 parser_vb.read_csv_categorical.time_convert_direct + 6.28±0.02ms 7.07±0.2ms 1.13 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('BQuarterEnd', 2) + 37.7±0.1ms 42.3±1ms 1.12 groupby.GroupBySuite.time_any('int', 100) + 3.82±0.05μs 4.29±0.02μs 1.12 period.Properties.time_minute + 264±2ms 296±3ms 1.12 packers.packers_read_sas.time_read_sas7bdat + 30.8±0.08ms 34.5±2ms 1.12 eval.Eval.time_mult('numexpr', 1) + 2.64±0ms 2.96±0.09ms 1.12 categoricals.Categoricals2.time_rendering + 7.40±0.1ms 8.29±0.1ms 1.12 groupby.GroupBySuite.time_cummin('float', 10000) + 3.09±0.01ms 3.46±0.01ms 1.12 groupby.groupby_sum_multiindex.time_groupby_sum_multiindex + 120±0.2ms 135±0.4ms 1.12 frame_ctor.FromDicts.time_frame_ctor_nested_dict_int64 + 485±2ms 543±4ms 1.12 groupby.GroupBySuite.time_describe('int', 100) + 47.3s 52.9s 1.12 binary_ops.TimeseriesTZ.time_timestamp_ops_diff2 + 603±1μs 674±2μs 1.12 indexing.DataFrameIndexing.time_iloc_dups + 1.02±0ms 1.14±0.04ms 1.12 categoricals.Categoricals.time_constructor_fastpath + 3.83±0.05μs 4.27±0.07μs 1.12 period.Properties.time_hour + 10.8±0.04ms 12.0±0.07ms 1.11 groupby.groupby_nth.time_groupby_frame_nth_any + 21.6±0.05μs 24.1±0.06μs 1.11 indexing.MultiIndexing.time_multiindex_med_get_loc + 350±0.8ms 390±2ms 1.11 inference.to_numeric_downcast.time_downcast('string-float', 'float') + 4.03±0.03μs 4.47±0.05μs 1.11 period.Properties.time_year + 2.40±0.03ms 2.66±0.01ms 1.11 frame_methods.FrameIsnull.time_isnull + 24.1±0.1ms 26.7±0.2ms 1.11 reindex.LibFastZip.time_lib_fast_zip + 103±0.6ms 114±0.5ms 1.10 strings.StringMethods.time_join_split_expand + 6.29±0.01ms 6.94±0.02ms 1.10 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('Day', 2) + 6.78±0.03ms 7.47±0.06ms 1.10 frame_methods.Equals.time_frame_float_equal + 4.97±0.02ms 5.48±0.05ms 1.10 groupby.GroupBySuite.time_cumsum('int', 10000) - 26.8μs 24.2μs 0.90 index_object.Float64.time_slice_indexer_even - 7.60ms 6.81ms 0.90 inference.DtypeInfer.time_float64 - 2.41±0ms 2.16±0.01ms 0.90 groupby.GroupBySuite.time_sem('float', 100) - 9.28±0.02ms 8.27±0.05ms 0.89 groupby.groupby_datetimetz.time_groupby_sum - 612±1ms 546±1ms 0.89 inference.to_numeric_downcast.time_downcast('string-nint', 'unsigned') - 1.07±0ms 957±3μs 0.89 reindex.Duplicates.time_series_drop_dups_int - 1.48±0.01s 1.32±0s 0.89 replace.replace_convert.time_replace_frame_timestamp - 1.37±0ms 1.22±0ms 0.89 period.Algorithms.time_value_counts_pindex - 486±4ms 432±2ms 0.89 groupby.Transform.time_transform_func - 42.8±0.2μs 38.0±0.2μs 0.89 timestamp.TimestampProperties.time_is_year_end - 40.9ms 36.1ms 0.88 gil.nogil_read_csv.time_read_csv_object - 18.5±0.2ms 16.2±0.2ms 0.88 groupby.groupby_nth.time_groupby_series_nth_any - 82.1±0.3μs 71.7±0.3μs 0.87 indexing.Int64Indexing.time_getitem_scalar - 94.5±0.2ms 82.4±0.2ms 0.87 groupby.GroupBySuite.time_skew('float', 100) - 752ms 649±1ms 0.86 inference.to_numeric_downcast.time_downcast('string-nint', 'signed') - 380±3ms 328±1ms 0.86 inference.to_numeric_downcast.time_downcast('string-float', 'signed') - 44.6ms 38.3ms 0.86 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('FY5253Quarter_1', 1) - 79.4±0.1ms 68.1±0.6ms 0.86 groupby.GroupBySuite.time_unique('float', 100) - 1.54±0.01s 1.30±0.01s 0.85 replace.replace_convert.time_replace_frame_timedelta - 959±6ms 808±5ms 0.84 replace.replace_convert.time_replace_series_timedelta - 26.9μs 22.5μs 0.84 index_object.Float64.time_slice_indexer_basic - 273±1μs 226±0.3μs 0.83 period.period_standard_indexing.time_series_loc - 106±0.7μs 87.5±0.5μs 0.82 timeseries.Offsets.time_custom_bday_incr - 17.2±0.3ms 13.7±0.3ms 0.80 binary_ops.Ops2.time_frame_int_mod - 25.8±0.4ms 17.5±0.7ms 0.68 binary_ops.Ops.time_frame_add(True, 'default') - 17.4±0.3ms 9.88±0.5ms 0.57 binary_ops.Ops2.time_frame_float_mod 
@TomAugspurger
Copy link
Contributor

The -f 1.1 limits it to only benchmarks that changed by more that 10%. The results can be noisy, depending on how much load your machine is under while running the benchmarks. Typically anything larger than 1.3 - 1.5 is significant.

I'll take a look later.

@alanbato
Copy link
Contributor Author

Alright, ping me when you've gone through it. Thanks @TomAugspurger :)

@TomAugspurger
Copy link
Contributor

Perf looked fine, though the CI failures look relevant.

@alanbato
Copy link
Contributor Author

Yes, I think I'll need to change those tests because they they have this faulty behavior. Any advice on that?

@jreback
Copy link
Contributor

jreback commented Oct 29, 2017

this is being closed by #17971. thanks for the effort here. many other issues if you'd like to take a look!

@jreback jreback closed this Oct 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Error Reporting Incorrect or improved errors from pandas MultiIndex

4 participants