Context
When dealing with dates, I’ve sometimes problems because the source is not clean, or not all the rows have the same format. Additionally, dates can be simple (year-month-day) or really complicate like a timestamp with timezone.
Here are some code snippets for several use cases.
How to read dates with python and pandas
# import pandas import pandas as pd # df is a dataframe with a column 'column_with_date' with a date like this '19.01.2023 16:45:46 +01:00' # convert date string to datetime with pd.to_datetime pd.to_datetime(df['column_with_date']) # sometimes the format is a bit weird and pandas cannot recognize it. In this case we give the date format as argument. # for this format 19.01.2023 16:45:46 +01:00 we can use: pd.to_datetime(df['column_with_date'], format='%Y-%m-%d %H:%M:%S%z') # if your string has timezone, use utr=True pd.to_datetime(df['column_with_date'], format='%Y-%m-%d %H:%M:%S%z', utc=True) # sometimes your columns are as objects (strings) and numbers (floats, ints) and to_datetime cannot process it. # You can force the type string to the whole column before giving it to to_datetime. pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True) # pd.to_datetime does not modify the column values in place, so you have to assign it to the same column. df['column_with_date'] = pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True) # or save it to another (new) column if you want to save the original value df['column_with_date_2'] = pd.to_datetime(df['column_with_date'].astype(str), format='%Y-%m-%d %H:%M:%S%z', utc=True)
How to save dates with UTC (tiemzone) to an Excel-File with Pandas
Saving datetime
columns with timezones in pandas is not supported, and you will get the following error if you try:
ValueError: Excel does not support datetimes with timezones. Please ensure that datetimes are timezone unaware before writing to Excel.
To remove the timezone from a datefield
(column dtype
datetime64[ns, UTC]
) you can use .dt.date()
# remove timezones for excel df['column_with_date'] = df['column_with_date'].apply(lambda d: pd.to_datetime(d).date()) # save te file as usual df.to_excel('filename.xlsx')
References
- https://stackoverflow.com/a/63238008/624088
- https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes -> format codes for datetime strings
- try and error :)
Top comments (0)