Skip to content

Strange behaviour when trying to create a series from two columns of a dataframe with apply(tuple, axis=1) #17348

@daltschu

Description

@daltschu

Unintended behaviour of pandas happens when one tries to create a series applying
tuple (or list) to two columns of a dataframe, one of which consists of timestamps:

import pandas as pd import numpy as np d = pd.DataFrame({'a': pd.Series(np.random.randn(4)), 'b': ['a', 'list', 'of', 'words'], 'ts': pd.date_range('2016-10-01', periods=4, freq='H')})
d
a b ts
0 0.200813 a 2016-10-01 00:00:00
1 0.316971 list 2016-10-01 01:00:00
2 -0.186392 of 2016-10-01 02:00:00
3 -0.565593 words 2016-10-01 03:00:00

let's try first with columns 'a'and 'b':

d[['a', 'b']].apply(tuple, axis=1)
0 (0.2008128669491346, a) 1 (0.3169711841447721, list) 2 (-0.1863916899789735, of) 3 (-0.5655926199699992, words) dtype: object 

So far, everything is fine. Now let's do it with 'a' and 'ts':

d[['a', 'ts']].apply(tuple, axis=1)
a ts
0 0.200813 2016-10-01 00:00:00
1 0.316971 2016-10-01 01:00:00
2 -0.186392 2016-10-01 02:00:00
3 -0.565593 2016-10-01 03:00:00

Oops.

It's easy to find a way around this, by coating the timestamps before apply and uncoating after:

def coating(t): return lambda: t def uncoating(x, f): return x, f()
d['coated_ts'] = d['ts'].apply(coating)
d[['a', 'coated_ts']].apply(tuple, axis=1).apply(lambda t: uncoating(*t))
0 (0.2008128669491346, 2016-10-01 00:00:00) 1 (0.3169711841447721, 2016-10-01 01:00:00) 2 (-0.1863916899789735, 2016-10-01 02:00:00) 3 (-0.5655926199699992, 2016-10-01 03:00:00) dtype: object 

It would be nice if this strange behaviour was corrected.

pd.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None 

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBug

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions