Skip to content

Conversation

@jorisvandenbossche
Copy link
Member

While doing some unrelated profiling, I noticed typing overhead in DatetimeArray.__getitem__. Isolating that here, there is a typing cast(..) call in that method, and removing that as done in the PR, I see the following impact:

arr = pd.date_range("2012", periods=3).array In [4]: %timeit -r 20 arr[0] 4.76 µs ± 36 ns per loop (mean ± std. dev. of 20 runs, 100000 loops each) # <-- master 3.44 µs ± 34.4 ns per loop (mean ± std. dev. of 20 runs, 100000 loops each) # <-- PR

(I did this timing several times back and forth on a computer with no other activity at that moment, and the difference was stable)

Personally, I wouldn't have expected such a relative big impact of a typing cast. Because if that's the case, we should be very careful with using that in "hot paths".

cc @jbrockmendel @simonjayhawkins @Dr-Irv

@jbrockmendel
Copy link
Member

wow. ive been assuming that the casts are negligible. if that wrong, then there are probably a lot of small improvements to be made by avoiding them across the codebase.

On the other hand, most of this is in the Union:

In [1]: from pandas._typing import * In [2]: from pandas.core.arrays.datetimelike import * In [3]: %timeit Union[DatetimeLikeArrayT, DTScalarOrNaT] 879 ns ± 19.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit cast(Union[DatetimeLikeArrayT, DTScalarOrNaT], NaT) 920 ns ± 8.15 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 

So at least in this case we can get most of the benefit by just doing the Union once at the module level

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Nov 26, 2021

So at least in this case we can get most of the benefit by just doing the Union once at the module level

Another (maybe easier) solution is to just put the type in double quotes:

 result = cast( "Union[DatetimeLikeArrayT, DTScalarOrNaT]", super().__getitem__(key) )

I did a grep exercise, and that is what is done elsewhere in the code. The only other place I could find (via grep) that we need to fix this is in pandas/core/arrays/categorical.py, line 517

@jreback jreback added Performance Memory or execution speed performance Typing type annotations, mypy/pyright type checking labels Nov 26, 2021
@jbrockmendel
Copy link
Member

@jorisvandenbossche want to change the annotation to a string and call this a win?

@jbrockmendel jbrockmendel mentioned this pull request Dec 2, 2021
4 tasks
@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review December 3, 2021 14:08
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Dec 3, 2021

@jorisvandenbossche can you also make a similar change in pandas/core/arrays/categorical.py:astype() so we are consistent in terms of doing casts with Union

@jorisvandenbossche
Copy link
Member Author

Yeah, I had actually searched for other casts in the /arrays submodule, and this categorical one is the only other cast with a composite type (or that is not predefined). Since that's not typically a hot path, I didn't change it, but of course also can't harm, so pushed that change now anyway.

@Dr-Irv Dr-Irv merged commit 4f3b32b into pandas-dev:master Dec 3, 2021
@jorisvandenbossche jorisvandenbossche deleted the perf-datetime-getitem-typing branch December 3, 2021 16:03
jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance Typing type annotations, mypy/pyright type checking

4 participants