Skip to content

Inconsistent changes of dtype on assignment to multiindexed columns #18415

@da-woods

Description

@da-woods

I've found some odd behaviour when assigning to columns with a multiindex. I'm trying to use an array with a float32 dtype, but it's being converted to a float64 dtype under some circumstances. For large arrays this is accompanied by a signifcant slowdown.

>>> import sys; sys.version sys.version '3.6.3 (default, Oct 11 2017, 14:49:33) [GCC]' >>> import pandas as pd >>> pd.__version__ '0.21.0' >>> import numpy as np >>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2]) >>> A 1 2 0 1 2 3 4 0 1 2 3 4 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 >>> A.loc[:,(1,1)] = np.ones((6,),dtype=np.float32) # index a single column - doesn't change dtypes >>> (A.dtypes==np.float32).all() True >>> A.loc[:,(1,slice(2,3))] = np.ones((6,2),dtype=np.float32) # Index multiple columns - changes dtypes >>> (A.dtypes==np.float32).all() False 

So indexing a single column keeps the dtype as float32 (as I would expect), but indexing multiple columns changes it to float64. The behaviour is also different if you write to part of a column (doesn't change) vs a whole column (does change):

>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2]) >>> A.loc[2:3,(1,slice(2,3))] = np.ones((2,2),dtype=np.float32) # index a section of multiple columns - doesn’t change dtypes >>> (A.dtypes==np.float32).all() True >>> A.loc[0:5,(1,slice(2,3))] = np.ones((6,2),dtype=np.float32) # but indexing a complete section does change dtypes >>> (A.dtypes==np.float32).all() False 

If the multiindex is on axis 0 rather than axis 1 then it does not change the dtypes

>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2]) >>> A = A.T >>> A.loc[(1,slice(2,3)),:] = np.ones((6,2),dtype=np.float32).T # doesn’t change any dtypes >>> (A.dtypes==np.float32).all() True 

This odd behaviour only applies to multiindexes:

>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)) >>> A.loc[:,2:3] = np.ones((6,2),dtype=np.float32) # does not change dtypes >>> (A.dtypes==np.float32).all() True 

Finally it also applies to iloc as well as loc:

>>> A = pd.DataFrame(np.zeros((6,5),dtype=np.float32)); A = pd.concat([A,A],axis=1,keys=[1,2]) >>> A.iloc[:,2:4] = np.ones((6,2),dtype=np.float32) # changes dtypes >>> (A.dtypes==np.float32).all() False 

Metadata

Metadata

Assignees

No one assigned

    Labels

    32bit32-bit systemsBugDtype ConversionsUnexpected or buggy dtype conversionsIndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions