-
- Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd idx1 = pd.date_range("2024-01-01", periods=24*12, freq="5min", unit="us") idx2 = pd.date_range("2024-01-02", periods=24*12, freq="5min", unit="us") idx3 = pd.date_range("2024-01-03", periods=24*12, freq="5min", unit="us") ts1 = pd.Series(range(len(idx1)), index=idx1) ts2 = pd.Series(range(len(idx2)), index=idx2) ts3 = pd.Series(range(len(idx3)), index=idx3) df = pd.concat([ts1, ts2, ts3], axis=1) print(ts1.index) # DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 00:05:00', # '2024-01-01 00:10:00', '2024-01-01 00:15:00', # '2024-01-01 00:20:00', '2024-01-01 00:25:00', # '2024-01-01 00:30:00', '2024-01-01 00:35:00', # '2024-01-01 00:40:00', '2024-01-01 00:45:00', # ... # '2024-01-01 23:10:00', '2024-01-01 23:15:00', # '2024-01-01 23:20:00', '2024-01-01 23:25:00', # '2024-01-01 23:30:00', '2024-01-01 23:35:00', # '2024-01-01 23:40:00', '2024-01-01 23:45:00', # '2024-01-01 23:50:00', '2024-01-01 23:55:00'], # dtype='datetime64[us]', length=288, freq='5min') print(ts2.index) # DatetimeIndex(['2024-01-02 00:00:00', '2024-01-02 00:05:00', # '2024-01-02 00:10:00', '2024-01-02 00:15:00', # '2024-01-02 00:20:00', '2024-01-02 00:25:00', # '2024-01-02 00:30:00', '2024-01-02 00:35:00', # '2024-01-02 00:40:00', '2024-01-02 00:45:00', # ... # '2024-01-02 23:10:00', '2024-01-02 23:15:00', # '2024-01-02 23:20:00', '2024-01-02 23:25:00', # '2024-01-02 23:30:00', '2024-01-02 23:35:00', # '2024-01-02 23:40:00', '2024-01-02 23:45:00', # '2024-01-02 23:50:00', '2024-01-02 23:55:00'], # dtype='datetime64[us]', length=288, freq='5min') print(ts3.index) # DatetimeIndex(['2024-01-03 00:00:00', '2024-01-03 00:05:00', # '2024-01-03 00:10:00', '2024-01-03 00:15:00', # '2024-01-03 00:20:00', '2024-01-03 00:25:00', # '2024-01-03 00:30:00', '2024-01-03 00:35:00', # '2024-01-03 00:40:00', '2024-01-03 00:45:00', # ... # '2024-01-03 23:10:00', '2024-01-03 23:15:00', # '2024-01-03 23:20:00', '2024-01-03 23:25:00', # '2024-01-03 23:30:00', '2024-01-03 23:35:00', # '2024-01-03 23:40:00', '2024-01-03 23:45:00', # '2024-01-03 23:50:00', '2024-01-03 23:55:00'], # dtype='datetime64[us]', length=288, freq='5min') print(df.index) # DatetimeIndex(['2024-01-01 00:00:00', '2024-01-02 00:00:00', # '2024-01-03 00:00:00', '2024-01-03 00:05:00', # '2024-01-03 00:10:00', '2024-01-03 00:15:00', # '2024-01-03 00:20:00', '2024-01-03 00:25:00', # '2024-01-03 00:30:00', '2024-01-03 00:35:00', # ... # '2024-01-03 23:20:00', '2024-01-03 23:25:00', # '2024-01-03 23:30:00', '2024-01-03 23:35:00', # '2024-01-03 23:40:00', '2024-01-03 23:45:00', # '2024-01-03 23:50:00', '2024-01-03 23:55:00', # '2024-01-04 11:20:00', '2024-01-05 11:20:00'], # dtype='datetime64[us]', length=292, freq=None) print(df.to_string()) # 0 1 2 # 2024-01-01 00:00:00 0.0 NaN NaN # 2024-01-02 00:00:00 NaN 0.0 NaN # 2024-01-03 00:00:00 NaN NaN 0.0 # 2024-01-03 00:05:00 NaN NaN 1.0 # 2024-01-03 00:10:00 NaN NaN 2.0 # 2024-01-03 00:15:00 NaN NaN 3.0 # 2024-01-03 00:20:00 NaN NaN 4.0 # 2024-01-03 00:25:00 NaN NaN 5.0 # 2024-01-03 00:30:00 NaN NaN 6.0 # 2024-01-03 00:35:00 NaN NaN 7.0 # 2024-01-03 00:40:00 NaN NaN 8.0 # 2024-01-03 00:45:00 NaN NaN 9.0 # 2024-01-03 00:50:00 NaN NaN 10.0 # ... # 2024-01-03 22:50:00 NaN NaN 274.0 # 2024-01-03 22:55:00 NaN NaN 275.0 # 2024-01-03 23:00:00 NaN NaN 276.0 # 2024-01-03 23:05:00 NaN NaN 277.0 # 2024-01-03 23:10:00 NaN NaN 278.0 # 2024-01-03 23:15:00 NaN NaN 279.0 # 2024-01-03 23:20:00 NaN NaN 280.0 # 2024-01-03 23:25:00 NaN NaN 281.0 # 2024-01-03 23:30:00 NaN NaN 282.0 # 2024-01-03 23:35:00 NaN NaN 283.0 # 2024-01-03 23:40:00 NaN NaN 284.0 # 2024-01-03 23:45:00 NaN NaN 285.0 # 2024-01-03 23:50:00 NaN NaN 286.0 # 2024-01-03 23:55:00 NaN NaN 287.0 # 2024-01-04 11:20:00 NaN NaN NaN # 2024-01-05 11:20:00 NaN NaN NaN df2 = pd.concat([ts1, ts2], axis=1) print(df2.index) # DatetimeIndex(['2024-01-01 00:00:00', '2024-01-02 00:00:00', # '2024-01-04 11:20:00', '2024-01-05 11:20:00'], # dtype='datetime64[us]', freq=None) print(df2.to_string()) # 0 1 # 2024-01-01 00:00:00 0.0 NaN # 2024-01-02 00:00:00 NaN 0.0 # 2024-01-04 11:20:00 NaN NaN # 2024-01-05 11:20:00 NaN NaN
Issue Description
When trying to concatenate by column few non overlapping timeseries dataframes, if the units of the original dataframes are not 'ns' then the resulting dataframe will have missing data (and lose it's frequency value, in case it's relevant).
The example given has 3 dataframes and for some reason the result has missed most of the data from the first and the second dataframe. In case of concatenating 2 dataframes we end up with almost no data at all.
If we set the units to 'ns' everything works as expected, the resulting df has all the data and kept its frequency='5min'. Every other unit I tried failed with similar results than the example.
Expected Behavior
import pandas as pd idx1 = pd.date_range("2024-01-01", periods=24*12, freq="5min", unit="ns") idx2 = pd.date_range("2024-01-02", periods=24*12, freq="5min", unit="ns") idx3 = pd.date_range("2024-01-03", periods=24*12, freq="5min", unit="ns") ts1 = pd.Series(range(len(idx1)), index=idx1) ts2 = pd.Series(range(len(idx2)), index=idx2) ts3 = pd.Series(range(len(idx3)), index=idx3) df = pd.concat([ts1, ts2, ts3], axis=1) print(ts1.index) # DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 00:05:00', # '2024-01-01 00:10:00', '2024-01-01 00:15:00', # '2024-01-01 00:20:00', '2024-01-01 00:25:00', # '2024-01-01 00:30:00', '2024-01-01 00:35:00', # '2024-01-01 00:40:00', '2024-01-01 00:45:00', # ... # '2024-01-01 23:10:00', '2024-01-01 23:15:00', # '2024-01-01 23:20:00', '2024-01-01 23:25:00', # '2024-01-01 23:30:00', '2024-01-01 23:35:00', # '2024-01-01 23:40:00', '2024-01-01 23:45:00', # '2024-01-01 23:50:00', '2024-01-01 23:55:00'], # dtype='datetime64[ns]', length=288, freq='5min') print(ts2.index) # DatetimeIndex(['2024-01-02 00:00:00', '2024-01-02 00:05:00', # '2024-01-02 00:10:00', '2024-01-02 00:15:00', # '2024-01-02 00:20:00', '2024-01-02 00:25:00', # '2024-01-02 00:30:00', '2024-01-02 00:35:00', # '2024-01-02 00:40:00', '2024-01-02 00:45:00', # ... # '2024-01-02 23:10:00', '2024-01-02 23:15:00', # '2024-01-02 23:20:00', '2024-01-02 23:25:00', # '2024-01-02 23:30:00', '2024-01-02 23:35:00', # '2024-01-02 23:40:00', '2024-01-02 23:45:00', # '2024-01-02 23:50:00', '2024-01-02 23:55:00'], # dtype='datetime64[ns]', length=288, freq='5min') print(ts3.index) # DatetimeIndex(['2024-01-03 00:00:00', '2024-01-03 00:05:00', # '2024-01-03 00:10:00', '2024-01-03 00:15:00', # '2024-01-03 00:20:00', '2024-01-03 00:25:00', # '2024-01-03 00:30:00', '2024-01-03 00:35:00', # '2024-01-03 00:40:00', '2024-01-03 00:45:00', # ... # '2024-01-03 23:10:00', '2024-01-03 23:15:00', # '2024-01-03 23:20:00', '2024-01-03 23:25:00', # '2024-01-03 23:30:00', '2024-01-03 23:35:00', # '2024-01-03 23:40:00', '2024-01-03 23:45:00', # '2024-01-03 23:50:00', '2024-01-03 23:55:00'], # dtype='datetime64[ns]', length=288, freq='5min') print(df.index) # DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 00:05:00', # '2024-01-01 00:10:00', '2024-01-01 00:15:00', # '2024-01-01 00:20:00', '2024-01-01 00:25:00', # '2024-01-01 00:30:00', '2024-01-01 00:35:00', # '2024-01-01 00:40:00', '2024-01-01 00:45:00', # ... # '2024-01-03 23:10:00', '2024-01-03 23:15:00', # '2024-01-03 23:20:00', '2024-01-03 23:25:00', # '2024-01-03 23:30:00', '2024-01-03 23:35:00', # '2024-01-03 23:40:00', '2024-01-03 23:45:00', # '2024-01-03 23:50:00', '2024-01-03 23:55:00'], # dtype='datetime64[ns]', length=864, freq='5min') print(df.to_string()) # 0 1 2 # 2024-01-01 00:00:00 0.0 NaN NaN # 2024-01-01 00:05:00 1.0 NaN NaN # 2024-01-01 00:10:00 2.0 NaN NaN # 2024-01-01 00:15:00 3.0 NaN NaN # 2024-01-01 00:20:00 4.0 NaN NaN # 2024-01-01 00:25:00 5.0 NaN NaN # 2024-01-01 00:30:00 6.0 NaN NaN # 2024-01-01 00:35:00 7.0 NaN NaN # ... # 2024-01-01 23:40:00 284.0 NaN NaN # 2024-01-01 23:45:00 285.0 NaN NaN # 2024-01-01 23:50:00 286.0 NaN NaN # 2024-01-01 23:55:00 287.0 NaN NaN # 2024-01-02 00:00:00 NaN 0.0 NaN # 2024-01-02 00:05:00 NaN 1.0 NaN # 2024-01-02 00:10:00 NaN 2.0 NaN # 2024-01-02 00:15:00 NaN 3.0 NaN # ... # 2024-01-02 23:35:00 NaN 283.0 NaN # 2024-01-02 23:40:00 NaN 284.0 NaN # 2024-01-02 23:45:00 NaN 285.0 NaN # 2024-01-02 23:50:00 NaN 286.0 NaN # 2024-01-02 23:55:00 NaN 287.0 NaN # 2024-01-03 00:00:00 NaN NaN 0.0 # 2024-01-03 00:05:00 NaN NaN 1.0 # 2024-01-03 00:10:00 NaN NaN 2.0 # 2024-01-03 00:15:00 NaN NaN 3.0 # 2024-01-03 00:20:00 NaN NaN 4.0 # ... # 2024-01-03 23:25:00 NaN NaN 281.0 # 2024-01-03 23:30:00 NaN NaN 282.0 # 2024-01-03 23:35:00 NaN NaN 283.0 # 2024-01-03 23:40:00 NaN NaN 284.0 # 2024-01-03 23:45:00 NaN NaN 285.0 # 2024-01-03 23:50:00 NaN NaN 286.0 # 2024-01-03 23:55:00 NaN NaN 287.0 df2 = pd.concat([ts1, ts2], axis=1) print(df2.index) # DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 00:05:00', # '2024-01-01 00:10:00', '2024-01-01 00:15:00', # '2024-01-01 00:20:00', '2024-01-01 00:25:00', # '2024-01-01 00:30:00', '2024-01-01 00:35:00', # '2024-01-01 00:40:00', '2024-01-01 00:45:00', # ... # '2024-01-02 23:10:00', '2024-01-02 23:15:00', # '2024-01-02 23:20:00', '2024-01-02 23:25:00', # '2024-01-02 23:30:00', '2024-01-02 23:35:00', # '2024-01-02 23:40:00', '2024-01-02 23:45:00', # '2024-01-02 23:50:00', '2024-01-02 23:55:00'], # dtype='datetime64[ns]', length=576, freq='5min') print(df2.to_string()) # 0 1 # 2024-01-01 00:00:00 0.0 NaN # 2024-01-01 00:05:00 1.0 NaN # 2024-01-01 00:10:00 2.0 NaN # 2024-01-01 00:15:00 3.0 NaN # 2024-01-01 00:20:00 4.0 NaN # 2024-01-01 00:25:00 5.0 NaN # 2024-01-01 00:30:00 6.0 NaN # ... # 2024-01-01 23:25:00 281.0 NaN # 2024-01-01 23:30:00 282.0 NaN # 2024-01-01 23:35:00 283.0 NaN # 2024-01-01 23:40:00 284.0 NaN # 2024-01-01 23:45:00 285.0 NaN # 2024-01-01 23:50:00 286.0 NaN # 2024-01-01 23:55:00 287.0 NaN # 2024-01-02 00:00:00 NaN 0.0 # 2024-01-02 00:05:00 NaN 1.0 # 2024-01-02 00:10:00 NaN 2.0 # 2024-01-02 00:15:00 NaN 3.0 # 2024-01-02 00:20:00 NaN 4.0 # 2024-01-02 00:25:00 NaN 5.0 # ... # 2024-01-02 23:15:00 NaN 279.0 # 2024-01-02 23:20:00 NaN 280.0 # 2024-01-02 23:25:00 NaN 281.0 # 2024-01-02 23:30:00 NaN 282.0 # 2024-01-02 23:35:00 NaN 283.0 # 2024-01-02 23:40:00 NaN 284.0 # 2024-01-02 23:45:00 NaN 285.0 # 2024-01-02 23:50:00 NaN 286.0 # 2024-01-02 23:55:00 NaN 287.0 ### Installed Versions <details> INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.10.13.final.0 python-bits : 64 OS : Linux OS-release : 4.18.0-372.9.1.el8.x86_64 Version : #1 SMP Fri Apr 15 22:12:19 EDT 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C LOCALE : en_US.UTF-8 pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 67.7.2 pip : 23.1.2 Cython : None pytest : 8.1.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.23.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.4 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 16.0.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.0 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None </details>