Skip to content

A DataFrame including Timestamp with time zone fails to agg(), and makes errors. #23683

@propella

Description

@propella

Code Sample, a copy-pastable example if possible

import pandas as pd df = pd.DataFrame({ 'tag': [1,1], 'date': [ pd.Timestamp('2018-01-01', tz='UTC'), pd.Timestamp('2018-01-02', tz='UTC')] }) df.groupby('tag').agg({'date': lambda e: e.head(1)})

Problem description

The above code makes the following errors.

Traceback (most recent call last): File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2670, in agg_series return self._aggregate_series_fast(obj, func) File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2689, in _aggregate_series_fast dummy) File "pandas/_libs/reduction.pyx", line 334, in pandas._libs.reduction.SeriesGrouper.__init__ File "pandas/_libs/reduction.pyx", line 347, in pandas._libs.reduction.SeriesGrouper._check_dummy ValueError: Dummy array must be same dtype During handling of the above exception, another exception occurred: Traceback (most recent call last): File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3495, in aggregate return self._python_agg_general(func_or_funcs, *args, **kwargs) File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1068, in _python_agg_general result, counts = self.grouper.agg_series(obj, f) File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2672, in agg_series return self._aggregate_series_pure_python(obj, func) File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2706, in _aggregate_series_pure_python raise ValueError('Function does not reduce') ValueError: Function does not reduce During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 4656, in aggregate return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs) File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 4087, in aggregate result, how = self._aggregate(arg, _level=_level, *args, **kwargs) File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 490, in _aggregate result = _agg(arg, _agg_1dim) File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 441, in _agg result[fname] = func(fname, agg_how) File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/base.py", line 424, in _agg_1dim return colg.aggregate(how, _level=(_level or 0) + 1) File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3497, in aggregate result = self._aggregate_named(func_or_funcs, *args, **kwargs) File "~/.pyenv/versions/anaconda3-5.3.0/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 3627, in _aggregate_named raise Exception('Must produce aggregated value') Exception: Must produce aggregated value 

Actually, it works if I remove tz from the timestamps like this. So I guess it is a bug.

df = pd.DataFrame({ 'tag': [1,1], 'date': [ pd.Timestamp('2018-01-01'), pd.Timestamp('2018-01-02')] }) df.groupby('tag').agg({'date': lambda e: e.head(1)}) 

Expected Output

 date tag 1 2018-01-01 00:00:00+00:00 

Output of pd.show_versions()

>>> pd.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.0.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-34-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 pandas: 0.23.4 pytest: 3.8.0 pip: 18.1 setuptools: 40.2.0 Cython: 0.28.5 numpy: 1.15.1 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.5.0 sphinx: 1.7.9 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.8 feather: None matplotlib: 2.2.3 openpyxl: 2.5.6 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.1.0 lxml: 4.2.5 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.11 pymysql: None psycopg2: 2.7.5 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None 

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions