-
- Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I wish I could use the newly added support for datetime64[ms] directly in pandas when opening a file.
Feature Description
import pandas as pd import io data = """\ index,date A,"0004-04-04T12:30" B,"2004-04-04T12:30" C,"3004-04-04T12:30" D,"" """ df = pd.read_csv(io.StringIO(data), parse_dates=['date']) dfwhich would return:
index date 0 A 4-04-04 12:30:00 1 B 2004-04-04 12:30:00 2 C 3004-04-04 12:30:00 3 D NaT Or alternatively, it could also be more explicit:
df = pd.read_csv(io.StringIO(data)) df['date'] = pd.to_datetime(df['date'], reso='ms')Alternative Solutions
Currently, the best way I have found to read in a CSV that has entries outside the 1677-2242 range is:
df = pd.read_csv(io.StringIO(data)) df['date'] = df['date'].fillna("").to_numpy().astype('datetime64[ms]')Thus, I am letting numpy to the actual date parsing. fillna is needed because the empty cell in data gets translated to np.nan by read_csv and numpy can't cast that as a datetime64. (I expected a NaT but I guess that's another issue anyway).
This solution requires that the dates be in ISO8601 format, which is much stricter than to_datetime.
Additional Context
See my S/O question for more alternative solutions: https://stackoverflow.com/questions/76608166/how-do-i-parse-a-list-of-datetimes-with-a-s-resolution-in-pandas-2
Thanks to ignoring-gravity for the answer, which I reused here.