DEV Community

Cover image for Python: Setting data types when using to_sql
Luca Liu
Luca Liu

Posted on • Edited on • Originally published at blog.luca-liu.com

Python: Setting data types when using to_sql

Issue: column encoding is altered when saving a DataFrame from Python to Oracle

When dealing with databases, it's important to pay attention to data types to ensure that the table structure is preserved when saving a DataFrame from Python. The following is a basic code snippet to save a DataFrame to an Oracle database using SQLAlchemy and pandas:

import pandas as pd from sqlalchemy import create_engine # Define the table structure data = { 'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie'], 'hire_date': ['2020-01-15', '2019-05-20', '2021-02-10'], 'insert_datetime': ['2021-09-15 10:00:00', '2021-09-16 11:30:00', '2021-09-17 09:45:00'] } df = pd.DataFrame(data) # Display the DataFrame and Data Types print(df) print(df.dtypes) # Create SQLAlchemy engine engine = create_engine('oracle://username:password@hostname:port/service_name') # Use pd.to_sql() to replace/update the table in Oracle df.to_sql('employee', con=engine, if_exists='replace', index=False) 
Enter fullscreen mode Exit fullscreen mode
id name hire_date insert_datetime 0 1 Alice 2020-01-15 2021-09-15 10:00:00 1 2 Bob 2019-05-20 2021-09-16 11:30:00 2 3 Charlie 2021-02-10 2021-09-17 09:45:00 id int64 name object hire_date object insert_datetime object dtype: object 
Enter fullscreen mode Exit fullscreen mode

However, running this code may reveal a bug where the column encoding is altered in the Oracle database table. Upon inspection, it becomes apparent that only the first column id retains its integer data format, while the remaining columns name, hire_date, and insert_datetime are changed to CLOB (Character Large Object) encoding in Oracle.

Solution: using the dtype parameter when invoking to_sql()

To ensure that the correct data types are preserved when writing a DataFrame to an Oracle database, one approach is to explicitly define the data types for each column when using to_sql(). This can be achieved by creating an SQLAlchemy Table object with specified data types and then using the dtype parameter when invoking to_sql().

import pandas as pd from sqlalchemy import create_engine, types # Define the table structure data = { 'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie'], 'hire_date': ['2020-01-15', '2019-05-20', '2021-02-10'], 'insert_datetime': ['2021-09-15 10:00:00', '2021-09-16 11:30:00', '2021-09-17 09:45:00'] } df = pd.DataFrame(data) # convert data type to datetime df['hire_date'] = pd.to_datetime(df['hire_date'], format='%Y-%m-%d') df['insert_datetime'] = pd.to_datetime(df['insert_datetime'], format='%Y-%m-%d %H:%M:%S') # Create SQLAlchemy engine engine = create_engine('oracle://username:password@hostname:port/service_name') # Set data types dtype_dic = {'id': types.INTEGER(), 'name': types.NVARCHAR(length=50),'hire_date': types.DATE(),'insert_datetime': types.DateTime()} # Use pd.to_sql() to replace/update the table in Oracle df.to_sql('employee', con=engine, schema='TMP01', if_exists='replace', index=False, dtype=dtype_dic) 
Enter fullscreen mode Exit fullscreen mode

Ensure that the data type has been converted to datetime before using types.DATE().


Explore more

Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.

🚀 Connect with me on LinkedIn

Top comments (0)