Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
eaadcbc
WIP: PeriodArray
TomAugspurger Sep 26, 2018
a05928a
WIP
TomAugspurger Sep 27, 2018
3c0d9ee
Just moves
TomAugspurger Sep 27, 2018
63fc3fa
PeriodArray.shift definition
TomAugspurger Sep 27, 2018
7d5d71c
_data type
TomAugspurger Sep 27, 2018
e5caac6
clean
TomAugspurger Sep 27, 2018
c194407
accessor wip
TomAugspurger Sep 27, 2018
eb4506b
some more wip
TomAugspurger Sep 27, 2018
1b9fd7a
tshift, shift
TomAugspurger Sep 28, 2018
0fa0ed1
Arithmetic
TomAugspurger Sep 28, 2018
3247ea8
repr changes
TomAugspurger Sep 28, 2018
c162cdd
wip
TomAugspurger Sep 28, 2018
611d378
freq setter
TomAugspurger Sep 28, 2018
fb2ff82
Added disabled ops
TomAugspurger Sep 28, 2018
25a380f
copy
TomAugspurger Sep 28, 2018
1b2c4ec
Support concat
TomAugspurger Sep 28, 2018
d04293e
object ctor
TomAugspurger Sep 28, 2018
eacad39
Updates
TomAugspurger Sep 28, 2018
70cd3b8
lint
TomAugspurger Sep 28, 2018
9b22889
lint
TomAugspurger Sep 28, 2018
87d289a
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 1, 2018
6369c7f
wip
TomAugspurger Oct 1, 2018
01551f0
more wip
TomAugspurger Oct 1, 2018
0437940
array-setitem
TomAugspurger Oct 1, 2018
42ab137
wip
TomAugspurger Oct 1, 2018
298390f
wip
TomAugspurger Oct 1, 2018
23e5cfc
Use ._tshift internally for datetimelike ops
TomAugspurger Oct 2, 2018
9d17fd2
deep
TomAugspurger Oct 2, 2018
959cd72
Squashed commit of the following:
TomAugspurger Oct 2, 2018
b66f617
Squashed commit of the following:
TomAugspurger Oct 2, 2018
5669675
fixup
TomAugspurger Oct 2, 2018
2c0311c
The rest of the EA tests
TomAugspurger Oct 2, 2018
012be1c
docs
TomAugspurger Oct 2, 2018
c3a96d0
Merge remote-tracking branch 'upstream/master' into datetimelike-tshift
TomAugspurger Oct 3, 2018
67faabc
rename to time_shift
TomAugspurger Oct 3, 2018
ff7c06c
Squashed commit of the following:
TomAugspurger Oct 3, 2018
c2d57bd
Squashed commit of the following:
TomAugspurger Oct 3, 2018
fbde770
Squashed commit of the following:
TomAugspurger Oct 3, 2018
1c4bbe7
Squashed commit of the following:
TomAugspurger Oct 3, 2018
b395c90
fixed merge conflict
TomAugspurger Oct 3, 2018
d68a5c5
Handle divmod test
TomAugspurger Oct 3, 2018
0c7b704
extension tests passing
TomAugspurger Oct 3, 2018
d26d3d2
Squashed commit of the following:
TomAugspurger Oct 4, 2018
e4babea
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 4, 2018
7f6c144
merge conflict
TomAugspurger Oct 4, 2018
b4aa4ca
wip
TomAugspurger Oct 4, 2018
6a70131
indexes passing
TomAugspurger Oct 4, 2018
9aa077c
op names
TomAugspurger Oct 4, 2018
411738c
extension, arrays passing
TomAugspurger Oct 4, 2018
8e0fb69
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 9, 2018
6d98e85
fixup
TomAugspurger Oct 9, 2018
6d9e150
lint
TomAugspurger Oct 9, 2018
4899479
Fixed to_timestamp
TomAugspurger Oct 9, 2018
634def1
Same error message for index, series
TomAugspurger Oct 9, 2018
1f18452
Fix freq handling in to_timestamp
TomAugspurger Oct 9, 2018
2f92b22
dtype update
TomAugspurger Oct 9, 2018
23f232c
accept kwargs
TomAugspurger Oct 9, 2018
dd3b8cd
fixups
TomAugspurger Oct 9, 2018
1a7c360
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 9, 2018
87ecb64
updates
TomAugspurger Oct 9, 2018
0bde329
explicit
TomAugspurger Oct 9, 2018
2d85a82
add to assert
TomAugspurger Oct 9, 2018
438e6b5
wip period_array
TomAugspurger Oct 10, 2018
a9456fd
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 10, 2018
ac05365
wip period_array
TomAugspurger Oct 10, 2018
36ed547
order
TomAugspurger Oct 10, 2018
4652ca7
sort order
TomAugspurger Oct 10, 2018
a047a1b
test for hashing
TomAugspurger Oct 10, 2018
a4a30d7
update
TomAugspurger Oct 10, 2018
1441ae6
lint
TomAugspurger Oct 10, 2018
8003808
boxing
TomAugspurger Oct 10, 2018
5f43753
fix fixtures
TomAugspurger Oct 10, 2018
1c13d0f
infer
TomAugspurger Oct 10, 2018
bae6b3d
Remove seemingly unreachable code
TomAugspurger Oct 10, 2018
f422cf0
lint
TomAugspurger Oct 10, 2018
0229d74
wip
TomAugspurger Oct 12, 2018
aa40cf4
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 12, 2018
29085e1
Updates for master
TomAugspurger Oct 12, 2018
00ffddf
simplify
TomAugspurger Oct 12, 2018
e81fa9c
wip
TomAugspurger Oct 12, 2018
0c8925f
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 15, 2018
96204a1
remove view
TomAugspurger Oct 15, 2018
82930f7
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 17, 2018
8d24582
simplify
TomAugspurger Oct 17, 2018
1fc7744
lint
TomAugspurger Oct 17, 2018
6cd428c
Removed add_comparison_methods
TomAugspurger Oct 17, 2018
21693e0
xfail op
TomAugspurger Oct 17, 2018
b65ffad
remove some
TomAugspurger Oct 17, 2018
1f438e3
constructors
TomAugspurger Oct 17, 2018
f3928fb
Constructor cleanup
TomAugspurger Oct 17, 2018
089f8ab
misc fixups
TomAugspurger Oct 17, 2018
700650a
more xfails
TomAugspurger Oct 17, 2018
452c229
typo
TomAugspurger Oct 17, 2018
e3e0e57
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 18, 2018
78751c2
Added asi8
TomAugspurger Oct 18, 2018
203d561
Allow setting nan
TomAugspurger Oct 18, 2018
eb1c67d
revert breaking docs
TomAugspurger Oct 18, 2018
e08aa79
Override _add_sub_int_array
TomAugspurger Oct 18, 2018
c1ee04b
lint
TomAugspurger Oct 18, 2018
827e563
Update PeriodIndex._simple_new
TomAugspurger Oct 18, 2018
ca4a7fd
Clean up uses of .values, ._values, ._ndarray_values, ._data
TomAugspurger Oct 18, 2018
ed185c0
one more values
TomAugspurger Oct 18, 2018
b3407ac
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 18, 2018
a4011eb
remove xfails
TomAugspurger Oct 18, 2018
fc1ca3c
Fixed freq handling in _shallow_copy with a freq
TomAugspurger Oct 18, 2018
1b1841f
test updates
TomAugspurger Oct 18, 2018
b3b315a
API: Keep PeriodIndex.values an ndarray
TomAugspurger Oct 18, 2018
3ab4176
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 18, 2018
8102475
BUG: Raise for non-equal freq in take
TomAugspurger Oct 18, 2018
8c329eb
Punt on DataFrame.replace specializing
TomAugspurger Oct 18, 2018
78d4960
lint
TomAugspurger Oct 18, 2018
4e3d914
fixed xfail message
TomAugspurger Oct 18, 2018
5e4aaa7
TST: _from_datetime64
TomAugspurger Oct 19, 2018
7f77563
Fixups
TomAugspurger Oct 19, 2018
f88d6f7
escape
TomAugspurger Oct 19, 2018
7aa78ba
dtype
TomAugspurger Oct 19, 2018
2d737f8
revert and unxfail values
TomAugspurger Oct 19, 2018
833899a
error catching
TomAugspurger Oct 19, 2018
236b49c
isort
TomAugspurger Oct 19, 2018
8230347
Avoid PeriodArray.values
TomAugspurger Oct 19, 2018
bf33a57
clarify _box_func usage
TomAugspurger Oct 19, 2018
738acfe
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 19, 2018
032ec02
TST: unxfail ops tests
TomAugspurger Oct 19, 2018
77e389a
Avoid use of .values
jorisvandenbossche Oct 19, 2018
61031d7
__setitem__ type
TomAugspurger Oct 19, 2018
a094b3d
Misc cleanups
TomAugspurger Oct 19, 2018
ace4856
lint
TomAugspurger Oct 19, 2018
fc6a1c7
API: remove ordinal from period_array
TomAugspurger Oct 19, 2018
900afcf
catch exception
TomAugspurger Oct 19, 2018
0baa3e9
misc cleanup
TomAugspurger Oct 19, 2018
f95106e
Handle astype integer size
TomAugspurger Oct 19, 2018
e57e24a
Bump test coverage
TomAugspurger Oct 19, 2018
ce1c970
remove partial test
TomAugspurger Oct 19, 2018
a7e1216
close bracket
TomAugspurger Oct 19, 2018
2548d6a
change the test
TomAugspurger Oct 19, 2018
02e3863
isort
TomAugspurger Oct 19, 2018
1997cff
consistent _data
TomAugspurger Oct 19, 2018
af2d1de
lint
TomAugspurger Oct 19, 2018
64f5778
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 20, 2018
4151510
ndarray_values -> asi8
TomAugspurger Oct 20, 2018
ac9bd41
colocate ops
TomAugspurger Oct 20, 2018
5462bd7
refactor PeriodIndex.item
TomAugspurger Oct 20, 2018
c1c6428
return NotImplemented for Series / Index
TomAugspurger Oct 20, 2018
7ab2736
remove xpass
TomAugspurger Oct 20, 2018
bd6f966
release note
TomAugspurger Oct 22, 2018
8068daf
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 23, 2018
5691506
types, use data
TomAugspurger Oct 23, 2018
575d61a
remove ufunc xpass
TomAugspurger Oct 24, 2018
4065bdb
Merge remote-tracking branch 'upstream/master' into ea-period
TomAugspurger Oct 25, 2018
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Constructor cleanup
  • Loading branch information
TomAugspurger committed Oct 17, 2018
commit f3928fbccc88f0462d5fb0b4f8c025dbe96929ba
142 changes: 40 additions & 102 deletions pandas/core/arrays/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,8 @@ def __init__(self, values, freq=None, copy=False):
values = values.values

if isinstance(values, type(self)):
if freq is not None:
raise TypeError("Cannot pass 'freq' and a 'PeriodArray'.")
if freq is not None and freq != values.freq:
raise TypeError("freq does not match")
values, freq = values._data, values.freq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the call we discussed avoiding _data. Did that get un-done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we talked about two things

  1. Avoiding the name ._data for the attribute storing the actual values (conflict with use in blocks, and elsewhere)
  2. Standardize the name(s) used by the array

I've punted on 1 since I don't know what a better name would be, and that would require additional changes in DatetimeLikeArrayMixin, which uses ._data, in addition to .values and ._ndarray_values.

PeriodArray should just use .asi8 for places it needs an integer array (one more push coming fixing a few ndarray_values I missed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've punted on 1 since I don't know what a better name would be

On the call we discussed setting _ndarray_values directly in __init__. Don't worry about this too much; if I'm the only one with a strong opinion I can push to change it in a follow-on PR.


values = np.array(values, dtype='int64', copy=copy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we always need to copy here, e.g. if you pass in an ndarray with copy=False this makes the internals mutable which is not great

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should certainly have the option to not copy.
Assume things like take or fillna: here we already create a new ndarray, that does not need to be copied.

Can you explain a bit more in detail what you are thinking about, where you see a problem in the internals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(deleted an incorrect comment).

I think I was following Series, Index, and IntegerArray, IntervalArray, SparseArray, which don't copy (index is special because it's immutable).

We should be internally consistent among the array classes. Categorical doesn't have a copy parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default of numpy arrays of copy=True is of course a safer default. But we should indeed try to be consistent. IntegerArray we can still change. For SparseArray, I suppose the default of not copy will in practice only be true if you already pass sparse values + sparse index. For Categorical.from_codes (the equivalent fast constructor) has no copy argument, and does not necessarily copy, but this can still depend whether the codes are eg converted from int64 to int8 depending on the number of categories.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very tricky. If there are outside refernces here the I think we need to copy. You are right technically we don't need to and for Series we do have this behavior. But this is an outside leakage. I guess copy=True as a default just works here then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a default of copy=True for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with doing that (copy=True the default), but that will need some updates in most places where we are using the constructor I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that (copy=True as the default), but this will need an update in most places where the constructor is used I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with doing that (copy=True the default), but that will need some updates in most places where we are using the constructor I think.

Expand All @@ -164,100 +164,6 @@ def __init__(self, values, freq=None, copy=False):
freq = Period._maybe_convert_freq(freq)
self._dtype = PeriodDtype(freq)

@classmethod
def _complex_new(cls, data=None, ordinal=None, freq=None, start=None,
end=None, periods=None, tz=None, dtype=None, copy=False,
**fields):
from pandas import PeriodIndex, DatetimeIndex, Int64Index

# copy-pase from PeriodIndex.__new__ with slight adjustments.
#
# - removed all uses of name

# TODO: move fields validation to range init
valid_field_set = {'year', 'month', 'day', 'quarter',
'hour', 'minute', 'second'}

if not set(fields).issubset(valid_field_set):
raise TypeError('__new__() got an unexpected keyword argument {}'.
format(list(set(fields) - valid_field_set)[0]))

if periods is not None:
if is_float(periods):
periods = int(periods)
elif not is_integer(periods):
msg = 'periods must be a number, got {periods}'
raise TypeError(msg.format(periods=periods))

periods = dtl.validate_periods(periods)

if dtype is not None:
dtype = pandas_dtype(dtype)
if not is_period_dtype(dtype):
raise ValueError('dtype must be PeriodDtype')
if freq is None:
freq = dtype.freq
elif freq != dtype.freq:
msg = 'specified freq and dtype are different'
raise IncompatibleFrequency(msg)

# coerce freq to freq object, otherwise it can be coerced elementwise
# which is slow
if freq:
freq = Period._maybe_convert_freq(freq)

if data is None:
if ordinal is not None:
data = np.asarray(ordinal, dtype=np.int64)
else:
data, freq = cls._generate_range(start, end, periods,
freq, fields)
return cls(data, freq=freq)

if isinstance(data, (cls, PeriodIndex)):
if freq is None or freq == data.freq: # no freq change
freq = data.freq
data = data._ndarray_values
else:
base1, _ = _gfc(data.freq)
base2, _ = _gfc(freq)
data = libperiod.period_asfreq_arr(data._ndarray_values,
base1, base2, 1)
if copy:
data = data.copy()
return cls(data, freq=freq)

# not array / index
if not isinstance(data, (np.ndarray, PeriodIndex,
DatetimeIndex, Int64Index)):
if is_scalar(data):
raise TypeError('{0}(...) must be called with a '
'collection of some '
'kind, {1} was passed'.format(cls.__name__,
repr(data)))

# other iterable of some kind
if not isinstance(data, (list, tuple)):
data = list(data)

data = np.asarray(data)

# datetime other than period
if is_datetime64_dtype(data.dtype):
data = dt64arr_to_periodarr(data, freq, tz)
return cls(data, freq=freq)

# check not floats
if lib.infer_dtype(data) == 'floating' and len(data) > 0:
raise TypeError("PeriodIndex does not allow "
"floating point in construction")

# anything else, likely an array of strings or periods
data = ensure_object(data)
if dtype is None and freq:
dtype = PeriodDtype(freq)
return cls._from_sequence(data, dtype=dtype)

@classmethod
def _simple_new(cls, values, freq=None, **kwargs):
# alias from PeriodArray.__init__
Expand Down Expand Up @@ -408,7 +314,8 @@ def __setitem__(self, key, value):
if len(key) == 0:
return

value = type(self)._complex_new(value)
value = period_array(value)

if self.freqstr != value.freqstr:
msg = DIFFERENT_FREQ_INDEX.format(self.freqstr, value.freqstr)
raise IncompatibleFrequency(msg)
Expand Down Expand Up @@ -618,7 +525,7 @@ def asfreq(self, freq=None, how='E'):
if self.hasnans:
new_data[self._isnan] = iNaT

return self._shallow_copy(new_data, freq=freq)
return type(self)(new_data, freq=freq)

def to_timestamp(self, freq=None, how='start'):
"""
Expand Down Expand Up @@ -703,7 +610,9 @@ def _add_delta_td(self, other):
# Note: when calling parent class's _add_delta_td, it will call
# delta_to_nanoseconds(delta). Because delta here is an integer,
# delta_to_nanoseconds will return it unchanged.
return type(self)._add_delta_td(self, delta)
ordinals = super(PeriodArray, self)._add_delta_td(delta)
return type(self)(ordinals, self.freq)


def _add_delta_tdi(self, other):
assert isinstance(self.freq, Tick) # checked by calling function
Expand Down Expand Up @@ -736,11 +645,11 @@ def _add_delta(self, other):
# i8 view or _shallow_copy
if isinstance(other, (Tick, timedelta, np.timedelta64)):
new_values = self._add_delta_td(other)
return self._shallow_copy(new_values)
return type(self)(new_values)
elif is_timedelta64_dtype(other):
# ndarray[timedelta64] or TimedeltaArray/index
new_values = self._add_delta_tdi(other)
return self._shallow_copy(new_values)
return type(self)(new_values)
else: # pragma: no cover
raise TypeError(type(other).__name__)

Expand Down Expand Up @@ -924,7 +833,7 @@ def item(self):
# -------------------------------------------------------------------
# Constructor Helpers

def period_array(data, freq=None):
def period_array(data, freq=None, ordinal=None, copy=False):
# type: (Sequence[Optional[Period]], Optional[Tick]) -> PeriodArray
"""
Construct a new PeriodArray from a sequence of Period scalars.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also accept strings here? Or have a separate function for that (something like to_period similar as we have to_datetime)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should prob have a separate function (there is already an issue for this). as you get into cases where you can have formatting, eg.. 2012Q1 and provide fromat strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though would not be averse to actually calling to_period here with a default format on an inferred string type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, adding the ability of parsing strings is out of scope for this PR I would say, so we can discuss later where to add it.

Copy link
Contributor Author

@TomAugspurger TomAugspurger Oct 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already works. I'll add an example.

In [6]: pd.core.arrays.period_array(['2012Q1', '2013Q1'], freq='Q') Out[6]: <PeriodArray> ['2012Q1', '2013Q1'] Length: 2, dtype: period[Q-DEC] 

I'm not sure what's required for freq to be inferred correctly, but libperiod.extract_freq and libperiod.extract_ordinals are doing all the heavy lifting here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool, didn't know that :)
(it's of course limited to standard formatted strings I suppose?)

Expand All @@ -938,6 +847,8 @@ def period_array(data, freq=None):
freq : str, Tick, or Offset
The frequency of every element of the array. This can be specified
to avoid inferring the `freq` from `data`.
copy : bool, default False
Whether to ensure a copy of the data is made.

Returns
-------
Expand All @@ -963,10 +874,37 @@ def period_array(data, freq=None):
['2017', '2018', 'NaT']
Length: 3, dtype: period[A-DEC]
"""

if data is None and ordinal is None:
raise ValueError("range!")
elif data is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of ordinal? its not in the doc-string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing ordinal triggers the same behavior as PeriodArray.__new__(ordinal=...).

I'll see if this is actually used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a brief survey, it seems like it's just used in tests. The only callers are PeriodIndex(ordinal=...).

pandas/tests/indexes/period/test_period.py 426: idx1 = PeriodIndex(ordinal=[-1, 0, 1], freq='A') 427: idx2 = PeriodIndex(ordinal=np.array([-1, 0, 1]), freq='A') 
pandas/tests/indexes/common.py 318: result = index_type(ordinal=index.asi8, copy=False, 

So the question is: do we want to give users the ability to construct a PeriodIndex / PeriodArray from an array of integer ordinals + a freq? I don't think this is likely to occur, so I'd recommend:

  1. removing the ordinal argument from period_array
  2. deprecating ordinal in the PeriodIndex constructor, in favor of passing a PeriodArray.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

data = np.asarray(ordinal, dtype=np.int64)
if copy:
data = data.copy()
return PeriodArray(data, freq=freq)
else:
if isinstance(data, (ABCPeriodIndex, ABCSeries, PeriodArray)):
return PeriodArray(data, freq)
elif is_datetime64_dtype(data):
return PeriodArray._from_datetime64(data, freq)

# other iterable of some kind
if not isinstance(data, (np.ndarray, list, tuple)):
data = list(data)

if freq:
dtype = PeriodDtype(freq)
else:
dtype = None

if lib.infer_dtype(data) == 'floating' and len(data) > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite fast for the vast majority of things, only object dtype is fully inferred

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we're going to have objects here typically though.

For a length 1,000 array of period objects we spend 70us in lib.infer_dtype and 330us in the actual constructor, so about 20% of the time is just to raise this error message. I refactored it a bit to just check for is_float_dtype.

# Can we avoid infer_dtype? Why pay that tax every time?
raise TypeError("PeriodIndex does not allow "
"floating point in construction")

data = ensure_object(data)


return PeriodArray._from_sequence(data, dtype=dtype)


Expand Down
60 changes: 53 additions & 7 deletions pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,9 @@
from pandas._libs.tslibs import resolution

from pandas.core.algorithms import unique1d
from pandas.core.dtypes.dtypes import PeriodDtype
from pandas.core.dtypes.generic import ABCIndexClass
from pandas.core.arrays.period import PeriodArray
from pandas.core.arrays.period import PeriodArray, period_array
from pandas.core.base import _shared_docs
from pandas.core.indexes.base import _index_shared_docs, ensure_index

Expand Down Expand Up @@ -190,13 +191,54 @@ def __new__(cls, data=None, ordinal=None, freq=None, start=None, end=None,
periods=None, tz=None, dtype=None, copy=False, name=None,
**fields):

valid_field_set = {'year', 'month', 'day', 'quarter',
'hour', 'minute', 'second'}

if not set(fields).issubset(valid_field_set):
raise TypeError('__new__() got an unexpected keyword argument {}'.
format(list(set(fields) - valid_field_set)[0]))

if name is None and hasattr(data, 'name'):
name = data.name

data = PeriodArray._complex_new(data=data, ordinal=ordinal, freq=freq,
start=start, end=end, periods=periods,
tz=tz, dtype=dtype, copy=copy,
**fields)
if data is None and ordinal is None:
# range-based.
if periods is not None:
if is_float(periods):
periods = int(periods)

elif not is_integer(periods):
msg = 'periods must be a number, got {periods}'
raise TypeError(msg.format(periods=periods))

data, freq = PeriodArray._generate_range(start, end, periods,
freq, fields)
data = PeriodArray(data, freq=freq)
else:
if freq is None and dtype is not None:
freq = PeriodDtype(dtype).freq
elif freq and dtype:
freq = PeriodDtype(freq).freq
dtype = PeriodDtype(dtype).freq

if freq != dtype:
msg = "specified freq and dtype are different"
raise IncompatibleFrequency(msg)

# PeriodIndex allow PeriodIndex(period_index, freq=different)
# Let's not encourage that kind of behavior in PeriodArray.

if freq and isinstance(data, cls) and data.freq != freq:
# TODO: We can do some of these with no-copy / coercion?
# e.g. D -> 2D seems to be OK
data = data.asfreq(freq)

data = period_array(data=data, ordinal=ordinal, freq=freq,
copy=copy)

if copy:
data = data.copy()

return cls._simple_new(data, name=name)

# ------------------------------------------------------------------------
Expand Down Expand Up @@ -268,18 +310,22 @@ def _shallow_copy(self, values=None, **kwargs):
# TODO: simplify, figure out type of values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So from some printing in the tests, some exploration on what is passed here:

  • PeriodArray and int64 ndarray ordinals. I think those both are fine, it will probably hard to avoid mixing both? Or do we want a separate one for ordinals?
  • object array of Periods.
    • One example of this is PeriodIndex.difference (from the base Index implementation). This base implementation basically works, except that there is a sorting.safe_sort call on the resulting PeriodArray, which destroys the PeriodArray. But this is of course solvable in sorting.safe_sort, by making that EA aware.
    • So I think eventually we could try to solve all those cases where object is passed. But I would say, let's leave that for follow-ups ?
  • None -> this is from plain self.shallow_copy() calls without arguments. This is fine I think.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically what motivated #23095. Even if the solution is unwanted there, I think it identifies all the extant places where unwanted types are currently passed to _shallow_copy

if values is None:
values = self._values

if isinstance(values, type(self)):
values = values.values

if not isinstance(values, PeriodArray):
if (isinstance(values, np.ndarray) and
is_integer_dtype(values.dtype)):
values = PeriodArray(values, freq=self.freq)
else:
# in particular, I would like to avoid complex_new here.
# in particular, I would like to avoid period_array here.
# Some people seem to be calling use with unexpected types
# Index.difference -> ndarray[Period]
# DatetimelikeIndexOpsMixin.repeat -> ndarray[ordinal]
# I think that once all of Datetime* are EAs, we can simplify
# this quite a bit.
values = PeriodArray._complex_new(values, freq=self.freq)
values = period_array(values, freq=self.freq)

# I don't like overloading shallow_copy with freq changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I'm not wild about this. I still think the best option is to nail down the constructors before doing the whole PeriodArray changeover.

# See if it's used anywhere outside of test_resample_empty_dataframe
Expand Down