-
-
Couldn't load subscription status.
- Fork 19.2k
Closed
Labels
Milestone
Description
Unintended behaviour of pandas happens when one tries to create a series applying
tuple (or list) to two columns of a dataframe, one of which consists of timestamps:
import pandas as pd import numpy as np d = pd.DataFrame({'a': pd.Series(np.random.randn(4)), 'b': ['a', 'list', 'of', 'words'], 'ts': pd.date_range('2016-10-01', periods=4, freq='H')})d| a | b | ts | |
|---|---|---|---|
| 0 | 0.200813 | a | 2016-10-01 00:00:00 |
| 1 | 0.316971 | list | 2016-10-01 01:00:00 |
| 2 | -0.186392 | of | 2016-10-01 02:00:00 |
| 3 | -0.565593 | words | 2016-10-01 03:00:00 |
let's try first with columns 'a'and 'b':
d[['a', 'b']].apply(tuple, axis=1)0 (0.2008128669491346, a) 1 (0.3169711841447721, list) 2 (-0.1863916899789735, of) 3 (-0.5655926199699992, words) dtype: object So far, everything is fine. Now let's do it with 'a' and 'ts':
d[['a', 'ts']].apply(tuple, axis=1)| a | ts | |
|---|---|---|
| 0 | 0.200813 | 2016-10-01 00:00:00 |
| 1 | 0.316971 | 2016-10-01 01:00:00 |
| 2 | -0.186392 | 2016-10-01 02:00:00 |
| 3 | -0.565593 | 2016-10-01 03:00:00 |
Oops.
It's easy to find a way around this, by coating the timestamps before apply and uncoating after:
def coating(t): return lambda: t def uncoating(x, f): return x, f()d['coated_ts'] = d['ts'].apply(coating)d[['a', 'coated_ts']].apply(tuple, axis=1).apply(lambda t: uncoating(*t))0 (0.2008128669491346, 2016-10-01 00:00:00) 1 (0.3169711841447721, 2016-10-01 01:00:00) 2 (-0.1863916899789735, 2016-10-01 02:00:00) 3 (-0.5655926199699992, 2016-10-01 03:00:00) dtype: object It would be nice if this strange behaviour was corrected.
pd.show_versions()INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.20.1 pytest: 3.0.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 xarray: None IPython: 5.3.0 sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.9 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None