-
- Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
import pandas as pd import pandas._testing as tm from io import StringIO df = pd.DataFrame( { "A": pd.to_datetime(["2013-01-01", "2013-01-02"]).as_unit("s"), "B": [3.5, 3.5], } ) written = df.to_json(orient="split") >>> written '{"A":{"0":1356,"1":1357},"B":{"0":3.5,"1":3.5}}' result = pd.read_json(StringIO(written), orient="split", convert_dates=["A"]) >>> result A B 0 1356 3.5 1 1357 3.5 tm.assert_frame_equal(result, df) # <- failsThe example here is based on test_frame_non_unique_columns, altered by 1) making the columns into ["A", "B"] and 2) changing the dtype for the first column from M8[ns] to M8[s].
This goes through a check in _try_convert_to_date:
# ignore numbers that are out of range if issubclass(new_data.dtype.type, np.number): in_range = ( isna(new_data._values) | (new_data > self.min_stamp) | (new_data._values == iNaT) ) if not in_range.all(): return data, False when the json is produced from M8[s] (or M8[ms]) data, these values are all under self.min_stamp, so this check causes us to short-circuit and not go through the pd.to_datetime conversion that comes just after (which itself looks sketchy but that can wait for another day).
cc @WillAyd my best guess is that there is nothing we can do at the reading stage and we should convert non-nano to nano at the writing stage, or maybe just warn users that they are doing something that doesn't round-trip?
Surfaced while implementing #55564 (which will cause users to get non-nano in many cases where they currently get nano).