-
- Notifications
You must be signed in to change notification settings - Fork 19.3k
REF: Make PeriodArray an ExtensionArray #22862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
eaadcbc a05928a 3c0d9ee 63fc3fa 7d5d71c e5caac6 c194407 eb4506b 1b9fd7a 0fa0ed1 3247ea8 c162cdd 611d378 fb2ff82 25a380f 1b2c4ec d04293e eacad39 70cd3b8 9b22889 87d289a 6369c7f 01551f0 0437940 42ab137 298390f 23e5cfc 9d17fd2 959cd72 b66f617 5669675 2c0311c 012be1c c3a96d0 67faabc ff7c06c c2d57bd fbde770 1c4bbe7 b395c90 d68a5c5 0c7b704 d26d3d2 e4babea 7f6c144 b4aa4ca 6a70131 9aa077c 411738c 8e0fb69 6d98e85 6d9e150 4899479 634def1 1f18452 2f92b22 23f232c dd3b8cd 1a7c360 87ecb64 0bde329 2d85a82 438e6b5 a9456fd ac05365 36ed547 4652ca7 a047a1b a4a30d7 1441ae6 8003808 5f43753 1c13d0f bae6b3d f422cf0 0229d74 aa40cf4 29085e1 00ffddf e81fa9c 0c8925f 96204a1 82930f7 8d24582 1fc7744 6cd428c 21693e0 b65ffad 1f438e3 f3928fb 089f8ab 700650a 452c229 e3e0e57 78751c2 203d561 eb1c67d e08aa79 c1ee04b 827e563 ca4a7fd ed185c0 b3407ac a4011eb fc1ca3c 1b1841f b3b315a 3ab4176 8102475 8c329eb 78d4960 4e3d914 5e4aaa7 7f77563 f88d6f7 7aa78ba 2d737f8 833899a 236b49c 8230347 bf33a57 738acfe 032ec02 77e389a 61031d7 a094b3d ace4856 fc6a1c7 900afcf 0baa3e9 f95106e e57e24a ce1c970 a7e1216 2548d6a 02e3863 1997cff af2d1de 64f5778 4151510 ac9bd41 5462bd7 c1c6428 7ab2736 bd6f966 8068daf 5691506 575d61a 4065bdb File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -153,8 +153,8 @@ def __init__(self, values, freq=None, copy=False): | |
| values = values.values | ||
| | ||
| if isinstance(values, type(self)): | ||
| if freq is not None: | ||
| raise TypeError("Cannot pass 'freq' and a 'PeriodArray'.") | ||
| if freq is not None and freq != values.freq: | ||
| raise TypeError("freq does not match") | ||
| values, freq = values._data, values.freq | ||
| | ||
| values = np.array(values, dtype='int64', copy=copy) | ||
| Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we always need to copy here, e.g. if you pass in an ndarray with copy=False this makes the internals mutable which is not great Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should certainly have the option to not copy. Can you explain a bit more in detail what you are thinking about, where you see a problem in the internals? Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (deleted an incorrect comment). I think I was following Series, Index, and IntegerArray, IntervalArray, SparseArray, which don't copy (index is special because it's immutable). We should be internally consistent among the array classes. Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The default of numpy arrays of Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is very tricky. If there are outside refernces here the I think we need to copy. You are right technically we don't need to and for Series we do have this behavior. But this is an outside leakage. I guess Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So a default of Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fine with doing that ( Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fine with that ( Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am fine with doing that (copy=True the default), but that will need some updates in most places where we are using the constructor I think. | ||
| | @@ -164,100 +164,6 @@ def __init__(self, values, freq=None, copy=False): | |
| freq = Period._maybe_convert_freq(freq) | ||
| self._dtype = PeriodDtype(freq) | ||
| | ||
| @classmethod | ||
| def _complex_new(cls, data=None, ordinal=None, freq=None, start=None, | ||
| end=None, periods=None, tz=None, dtype=None, copy=False, | ||
| **fields): | ||
| from pandas import PeriodIndex, DatetimeIndex, Int64Index | ||
| | ||
| # copy-pase from PeriodIndex.__new__ with slight adjustments. | ||
| # | ||
| # - removed all uses of name | ||
| | ||
| # TODO: move fields validation to range init | ||
| valid_field_set = {'year', 'month', 'day', 'quarter', | ||
| 'hour', 'minute', 'second'} | ||
| | ||
| if not set(fields).issubset(valid_field_set): | ||
| raise TypeError('__new__() got an unexpected keyword argument {}'. | ||
| format(list(set(fields) - valid_field_set)[0])) | ||
| | ||
| if periods is not None: | ||
| if is_float(periods): | ||
| periods = int(periods) | ||
| elif not is_integer(periods): | ||
| msg = 'periods must be a number, got {periods}' | ||
| raise TypeError(msg.format(periods=periods)) | ||
| | ||
| periods = dtl.validate_periods(periods) | ||
| | ||
| if dtype is not None: | ||
| dtype = pandas_dtype(dtype) | ||
| if not is_period_dtype(dtype): | ||
| raise ValueError('dtype must be PeriodDtype') | ||
| if freq is None: | ||
| freq = dtype.freq | ||
| elif freq != dtype.freq: | ||
| msg = 'specified freq and dtype are different' | ||
| raise IncompatibleFrequency(msg) | ||
| | ||
| # coerce freq to freq object, otherwise it can be coerced elementwise | ||
| # which is slow | ||
| if freq: | ||
| freq = Period._maybe_convert_freq(freq) | ||
| | ||
| if data is None: | ||
| if ordinal is not None: | ||
| data = np.asarray(ordinal, dtype=np.int64) | ||
| else: | ||
| data, freq = cls._generate_range(start, end, periods, | ||
| freq, fields) | ||
| return cls(data, freq=freq) | ||
| | ||
| if isinstance(data, (cls, PeriodIndex)): | ||
| if freq is None or freq == data.freq: # no freq change | ||
| freq = data.freq | ||
| data = data._ndarray_values | ||
| else: | ||
| base1, _ = _gfc(data.freq) | ||
| base2, _ = _gfc(freq) | ||
| data = libperiod.period_asfreq_arr(data._ndarray_values, | ||
| base1, base2, 1) | ||
| if copy: | ||
| data = data.copy() | ||
| return cls(data, freq=freq) | ||
| | ||
| # not array / index | ||
| if not isinstance(data, (np.ndarray, PeriodIndex, | ||
| DatetimeIndex, Int64Index)): | ||
| if is_scalar(data): | ||
| raise TypeError('{0}(...) must be called with a ' | ||
| 'collection of some ' | ||
| 'kind, {1} was passed'.format(cls.__name__, | ||
| repr(data))) | ||
| | ||
| # other iterable of some kind | ||
| if not isinstance(data, (list, tuple)): | ||
| data = list(data) | ||
| | ||
| data = np.asarray(data) | ||
| | ||
| # datetime other than period | ||
| if is_datetime64_dtype(data.dtype): | ||
| data = dt64arr_to_periodarr(data, freq, tz) | ||
| return cls(data, freq=freq) | ||
| | ||
| # check not floats | ||
| if lib.infer_dtype(data) == 'floating' and len(data) > 0: | ||
| raise TypeError("PeriodIndex does not allow " | ||
| "floating point in construction") | ||
| | ||
| # anything else, likely an array of strings or periods | ||
| data = ensure_object(data) | ||
| if dtype is None and freq: | ||
| dtype = PeriodDtype(freq) | ||
| return cls._from_sequence(data, dtype=dtype) | ||
| | ||
| @classmethod | ||
| def _simple_new(cls, values, freq=None, **kwargs): | ||
| # alias from PeriodArray.__init__ | ||
| | @@ -408,7 +314,8 @@ def __setitem__(self, key, value): | |
| if len(key) == 0: | ||
| return | ||
| | ||
| value = type(self)._complex_new(value) | ||
| value = period_array(value) | ||
| | ||
| if self.freqstr != value.freqstr: | ||
| msg = DIFFERENT_FREQ_INDEX.format(self.freqstr, value.freqstr) | ||
| raise IncompatibleFrequency(msg) | ||
| | @@ -618,7 +525,7 @@ def asfreq(self, freq=None, how='E'): | |
| if self.hasnans: | ||
| new_data[self._isnan] = iNaT | ||
| | ||
| return self._shallow_copy(new_data, freq=freq) | ||
| return type(self)(new_data, freq=freq) | ||
| | ||
| def to_timestamp(self, freq=None, how='start'): | ||
| """ | ||
| | @@ -703,7 +610,9 @@ def _add_delta_td(self, other): | |
| # Note: when calling parent class's _add_delta_td, it will call | ||
| # delta_to_nanoseconds(delta). Because delta here is an integer, | ||
| # delta_to_nanoseconds will return it unchanged. | ||
| return type(self)._add_delta_td(self, delta) | ||
| ordinals = super(PeriodArray, self)._add_delta_td(delta) | ||
| return type(self)(ordinals, self.freq) | ||
| | ||
| | ||
| def _add_delta_tdi(self, other): | ||
| assert isinstance(self.freq, Tick) # checked by calling function | ||
| | @@ -736,11 +645,11 @@ def _add_delta(self, other): | |
| # i8 view or _shallow_copy | ||
| if isinstance(other, (Tick, timedelta, np.timedelta64)): | ||
| new_values = self._add_delta_td(other) | ||
| return self._shallow_copy(new_values) | ||
| return type(self)(new_values) | ||
TomAugspurger marked this conversation as resolved. Outdated Show resolved Hide resolved | ||
| elif is_timedelta64_dtype(other): | ||
| # ndarray[timedelta64] or TimedeltaArray/index | ||
| new_values = self._add_delta_tdi(other) | ||
| return self._shallow_copy(new_values) | ||
| return type(self)(new_values) | ||
TomAugspurger marked this conversation as resolved. Outdated Show resolved Hide resolved | ||
| else: # pragma: no cover | ||
| raise TypeError(type(other).__name__) | ||
| | ||
| | @@ -924,7 +833,7 @@ def item(self): | |
| # ------------------------------------------------------------------- | ||
| # Constructor Helpers | ||
| | ||
| def period_array(data, freq=None): | ||
| def period_array(data, freq=None, ordinal=None, copy=False): | ||
| # type: (Sequence[Optional[Period]], Optional[Tick]) -> PeriodArray | ||
| """ | ||
| Construct a new PeriodArray from a sequence of Period scalars. | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we also accept strings here? Or have a separate function for that (something like Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should prob have a separate function (there is already an issue for this). as you get into cases where you can have formatting, eg.. Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. though would not be averse to actually calling Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In any case, adding the ability of parsing strings is out of scope for this PR I would say, so we can discuss later where to add it. Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This already works. I'll add an example. I'm not sure what's required for Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, cool, didn't know that :) | ||
| | @@ -938,6 +847,8 @@ def period_array(data, freq=None): | |
| freq : str, Tick, or Offset | ||
| The frequency of every element of the array. This can be specified | ||
| to avoid inferring the `freq` from `data`. | ||
| copy : bool, default False | ||
| Whether to ensure a copy of the data is made. | ||
| | ||
| Returns | ||
| ------- | ||
| | @@ -963,10 +874,37 @@ def period_array(data, freq=None): | |
| ['2017', '2018', 'NaT'] | ||
| Length: 3, dtype: period[A-DEC] | ||
| """ | ||
| | ||
| if data is None and ordinal is None: | ||
| raise ValueError("range!") | ||
TomAugspurger marked this conversation as resolved. Outdated Show resolved Hide resolved | ||
| elif data is None: | ||
| ||
| data = np.asarray(ordinal, dtype=np.int64) | ||
| if copy: | ||
| data = data.copy() | ||
| return PeriodArray(data, freq=freq) | ||
| else: | ||
| if isinstance(data, (ABCPeriodIndex, ABCSeries, PeriodArray)): | ||
| return PeriodArray(data, freq) | ||
TomAugspurger marked this conversation as resolved. Outdated Show resolved Hide resolved | ||
| elif is_datetime64_dtype(data): | ||
| return PeriodArray._from_datetime64(data, freq) | ||
| | ||
| # other iterable of some kind | ||
| if not isinstance(data, (np.ndarray, list, tuple)): | ||
| data = list(data) | ||
| | ||
| if freq: | ||
| dtype = PeriodDtype(freq) | ||
| else: | ||
| dtype = None | ||
| | ||
| if lib.infer_dtype(data) == 'floating' and len(data) > 0: | ||
| ||
| # Can we avoid infer_dtype? Why pay that tax every time? | ||
| raise TypeError("PeriodIndex does not allow " | ||
| "floating point in construction") | ||
| | ||
| data = ensure_object(data) | ||
| | ||
| | ||
| return PeriodArray._from_sequence(data, dtype=dtype) | ||
| | ||
| | ||
| | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -28,8 +28,9 @@ | |
| from pandas._libs.tslibs import resolution | ||
| | ||
| from pandas.core.algorithms import unique1d | ||
| from pandas.core.dtypes.dtypes import PeriodDtype | ||
| from pandas.core.dtypes.generic import ABCIndexClass | ||
| from pandas.core.arrays.period import PeriodArray | ||
| from pandas.core.arrays.period import PeriodArray, period_array | ||
| from pandas.core.base import _shared_docs | ||
| from pandas.core.indexes.base import _index_shared_docs, ensure_index | ||
| | ||
| | @@ -190,13 +191,54 @@ def __new__(cls, data=None, ordinal=None, freq=None, start=None, end=None, | |
| periods=None, tz=None, dtype=None, copy=False, name=None, | ||
| **fields): | ||
| | ||
| valid_field_set = {'year', 'month', 'day', 'quarter', | ||
| 'hour', 'minute', 'second'} | ||
| | ||
| if not set(fields).issubset(valid_field_set): | ||
| raise TypeError('__new__() got an unexpected keyword argument {}'. | ||
| format(list(set(fields) - valid_field_set)[0])) | ||
| | ||
| if name is None and hasattr(data, 'name'): | ||
| name = data.name | ||
| | ||
| data = PeriodArray._complex_new(data=data, ordinal=ordinal, freq=freq, | ||
| start=start, end=end, periods=periods, | ||
| tz=tz, dtype=dtype, copy=copy, | ||
| **fields) | ||
| if data is None and ordinal is None: | ||
| # range-based. | ||
| if periods is not None: | ||
| if is_float(periods): | ||
| periods = int(periods) | ||
| | ||
| elif not is_integer(periods): | ||
| msg = 'periods must be a number, got {periods}' | ||
| raise TypeError(msg.format(periods=periods)) | ||
| | ||
| data, freq = PeriodArray._generate_range(start, end, periods, | ||
| freq, fields) | ||
| data = PeriodArray(data, freq=freq) | ||
| else: | ||
| if freq is None and dtype is not None: | ||
| freq = PeriodDtype(dtype).freq | ||
| elif freq and dtype: | ||
| freq = PeriodDtype(freq).freq | ||
| dtype = PeriodDtype(dtype).freq | ||
| | ||
| if freq != dtype: | ||
| msg = "specified freq and dtype are different" | ||
| raise IncompatibleFrequency(msg) | ||
| | ||
| # PeriodIndex allow PeriodIndex(period_index, freq=different) | ||
| # Let's not encourage that kind of behavior in PeriodArray. | ||
| | ||
| if freq and isinstance(data, cls) and data.freq != freq: | ||
| # TODO: We can do some of these with no-copy / coercion? | ||
| # e.g. D -> 2D seems to be OK | ||
| data = data.asfreq(freq) | ||
| | ||
| data = period_array(data=data, ordinal=ordinal, freq=freq, | ||
| copy=copy) | ||
| | ||
| if copy: | ||
| data = data.copy() | ||
| | ||
| return cls._simple_new(data, name=name) | ||
| | ||
| # ------------------------------------------------------------------------ | ||
| | @@ -268,18 +310,22 @@ def _shallow_copy(self, values=None, **kwargs): | |
| # TODO: simplify, figure out type of values | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So from some printing in the tests, some exploration on what is passed here:
Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is basically what motivated #23095. Even if the solution is unwanted there, I think it identifies all the extant places where unwanted types are currently passed to _shallow_copy | ||
| if values is None: | ||
| values = self._values | ||
TomAugspurger marked this conversation as resolved. Outdated Show resolved Hide resolved | ||
| | ||
| if isinstance(values, type(self)): | ||
| values = values.values | ||
| | ||
| if not isinstance(values, PeriodArray): | ||
| if (isinstance(values, np.ndarray) and | ||
| is_integer_dtype(values.dtype)): | ||
| values = PeriodArray(values, freq=self.freq) | ||
| else: | ||
| # in particular, I would like to avoid complex_new here. | ||
| # in particular, I would like to avoid period_array here. | ||
| # Some people seem to be calling use with unexpected types | ||
| # Index.difference -> ndarray[Period] | ||
| # DatetimelikeIndexOpsMixin.repeat -> ndarray[ordinal] | ||
| # I think that once all of Datetime* are EAs, we can simplify | ||
| # this quite a bit. | ||
| values = PeriodArray._complex_new(values, freq=self.freq) | ||
| values = period_array(values, freq=self.freq) | ||
| | ||
| # I don't like overloading shallow_copy with freq changes. | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yah, I'm not wild about this. I still think the best option is to nail down the constructors before doing the whole PeriodArray changeover. | ||
| # See if it's used anywhere outside of test_resample_empty_dataframe | ||
| | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the call we discussed avoiding
_data. Did that get un-done?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we talked about two things
._datafor the attribute storing the actual values (conflict with use in blocks, and elsewhere)I've punted on 1 since I don't know what a better name would be, and that would require additional changes in DatetimeLikeArrayMixin, which uses
._data, in addition to.valuesand._ndarray_values.PeriodArray should just use
.asi8for places it needs an integer array (one more push coming fixing a few ndarray_values I missed).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the call we discussed setting
_ndarray_valuesdirectly in__init__. Don't worry about this too much; if I'm the only one with a strong opinion I can push to change it in a follow-on PR.