-
- Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
In sample code below rolling_apply takes an argument 'ix' which is a numpy array of dtype = 'int64' and by the time this array gets to get_type() function, its dtype has changed to 'float64'. I can make an explicit call in get_type() function to change this back: ix = ix.astype('int64'), but was curious why it gets changed.
Example below. I'm on version '0.17.0':
import numpy as np import pandas as pd def get_type(ix, df, hours): # invoked by rolling_apply to illustrate the problem # of rolling_apply changing the dtype of 'ix' array from # int64 to float64 print ix.dtype # need to convert index dtype back to int64 #ix = ix.astype('int64') ixv = ix[ix > -1] print ixv.dtype # the data in ix must be int64 else following fails with # IndexError: arrays used as indices must be of integer (or boolean) type h = hours[ixv] - hours[ixv[0]] df.iloc[ix[-1]] = h[0] return 0.0 # we start out with ix.dtype = int64 but rolling_apply changes this to float64 ix = np.arange(0, 10) hours = np.random.randint(0, 10, len(ix)) df = pd.DataFrame(np.random.randn(10, 1), columns=['h']) pd.rolling_apply(ix, window=3, func=get_type, args=(df, hours,)) I also stepped through the code and believe I've identified the source of the problem. I thought I'd report it and see if others see this as an issue before trying to fix. Doing an explicit type change inside the get_type function as in this example also works.
The _process_data_structure() function turns this into a float.
Here's the logic that is explicitly changing the dtype to a float the first time. This can be omitted and the check updated to include 'float':
if kill_inf and values.dtype == float: values = values.copy() values[np.isinf(values)] = np.NaN However, the cython code that I assume does the rolling window, also expects a float64. In this case, maybe an option is to update the dtype after the call_cython function.