Skip to content

Commit fcf70b8

Browse files
Merge remote-tracking branch 'upstream/master' into redirects
2 parents c5d3e9e + 6d3565a commit fcf70b8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+1158
-660
lines changed

ci/code_checks.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then
101101

102102
# Imports - Check formatting using isort see setup.cfg for settings
103103
MSG='Check import format using isort ' ; echo $MSG
104-
isort --recursive --check-only pandas
104+
isort --recursive --check-only pandas asv_bench
105105
RET=$(($RET + $?)) ; echo $MSG "DONE"
106106

107107
fi

doc/source/api/general_utility_functions.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ Dtype introspection
6363
api.types.is_datetime64_ns_dtype
6464
api.types.is_datetime64tz_dtype
6565
api.types.is_extension_type
66+
api.types.is_extension_array_dtype
6667
api.types.is_float_dtype
6768
api.types.is_int64_dtype
6869
api.types.is_integer_dtype

doc/source/whatsnew/v0.24.0.rst

Lines changed: 58 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ These dtypes can be merged & reshaped & casted.
149149
pd.concat([df[['A']], df[['B', 'C']]], axis=1).dtypes
150150
df['A'].astype(float)
151151
152-
Reduction and groupby operations such as 'sum' work.
152+
Reduction and groupby operations such as ``sum`` work.
153153

154154
.. ipython:: python
155155
@@ -1128,7 +1128,7 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
11281128
- :meth:`~Series.shift` now dispatches to :meth:`ExtensionArray.shift` (:issue:`22386`)
11291129
- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`)
11301130
- :meth:`Series.combine()` with scalar argument now works for any function type (:issue:`21248`)
1131-
- :meth:`Series.astype` and :meth:`DataFrame.astype` now dispatch to :meth:`ExtensionArray.astype` (:issue:`21185:`).
1131+
- :meth:`Series.astype` and :meth:`DataFrame.astype` now dispatch to :meth:`ExtensionArray.astype` (:issue:`21185`).
11321132
- Slicing a single row of a ``DataFrame`` with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`)
11331133
- Bug when concatenating multiple ``Series`` with different extension dtypes not casting to object dtype (:issue:`22994`)
11341134
- Series backed by an ``ExtensionArray`` now work with :func:`util.hash_pandas_object` (:issue:`23066`)
@@ -1235,7 +1235,6 @@ Datetimelike API Changes
12351235
- :class:`PeriodIndex` subtraction of another ``PeriodIndex`` will now return an object-dtype :class:`Index` of :class:`DateOffset` objects instead of raising a ``TypeError`` (:issue:`20049`)
12361236
- :func:`cut` and :func:`qcut` now returns a :class:`DatetimeIndex` or :class:`TimedeltaIndex` bins when the input is datetime or timedelta dtype respectively and ``retbins=True`` (:issue:`19891`)
12371237
- :meth:`DatetimeIndex.to_period` and :meth:`Timestamp.to_period` will issue a warning when timezone information will be lost (:issue:`21333`)
1238-
- :class:`DatetimeIndex` now accepts :class:`Int64Index` arguments as epoch timestamps (:issue:`20997`)
12391238
- :meth:`PeriodIndex.tz_convert` and :meth:`PeriodIndex.tz_localize` have been removed (:issue:`21781`)
12401239

12411240
.. _whatsnew_0240.api.other:
@@ -1262,7 +1261,7 @@ Other API Changes
12621261
- The order of the arguments of :func:`DataFrame.to_html` and :func:`DataFrame.to_string` is rearranged to be consistent with each other. (:issue:`23614`)
12631262
- :meth:`CategoricalIndex.reindex` now raises a ``ValueError`` if the target index is non-unique and not equal to the current index. It previously only raised if the target index was not of a categorical dtype (:issue:`23963`).
12641263
- :func:`Series.to_list` and :func:`Index.to_list` are now aliases of ``Series.tolist`` respectively ``Index.tolist`` (:issue:`8826`)
1265-
- The result of ``SparseSeries.unstack`` is now a :class:`DataFrame` with sparse values, rather than a :class:`SparseDataFrame` (issue:`24372`).
1264+
- The result of ``SparseSeries.unstack`` is now a :class:`DataFrame` with sparse values, rather than a :class:`SparseDataFrame` (:issue:`24372`).
12661265

12671266
.. _whatsnew_0240.deprecations:
12681267

@@ -1301,14 +1300,15 @@ Deprecations
13011300
- The ``keep_tz=False`` option (the default) of the ``keep_tz`` keyword of
13021301
:meth:`DatetimeIndex.to_series` is deprecated (:issue:`17832`).
13031302
- Timezone converting a tz-aware ``datetime.datetime`` or :class:`Timestamp` with :class:`Timestamp` and the ``tz`` argument is now deprecated. Instead, use :meth:`Timestamp.tz_convert` (:issue:`23579`)
1304-
- :func:`pandas.api.types.is_period` is deprecated in favor of `pandas.api.types.is_period_dtype` (:issue:`23917`)
1305-
- :func:`pandas.api.types.is_datetimetz` is deprecated in favor of `pandas.api.types.is_datetime64tz` (:issue:`23917`)
1303+
- :func:`pandas.api.types.is_period` is deprecated in favor of ``pandas.api.types.is_period_dtype`` (:issue:`23917`)
1304+
- :func:`pandas.api.types.is_datetimetz` is deprecated in favor of ``pandas.api.types.is_datetime64tz`` (:issue:`23917`)
13061305
- Creating a :class:`TimedeltaIndex`, :class:`DatetimeIndex`, or :class:`PeriodIndex` by passing range arguments `start`, `end`, and `periods` is deprecated in favor of :func:`timedelta_range`, :func:`date_range`, or :func:`period_range` (:issue:`23919`)
13071306
- Passing a string alias like ``'datetime64[ns, UTC]'`` as the ``unit`` parameter to :class:`DatetimeTZDtype` is deprecated. Use :class:`DatetimeTZDtype.construct_from_string` instead (:issue:`23990`).
13081307
- The ``skipna`` parameter of :meth:`~pandas.api.types.infer_dtype` will switch to ``True`` by default in a future version of pandas (:issue:`17066`, :issue:`24050`)
13091308
- In :meth:`Series.where` with Categorical data, providing an ``other`` that is not present in the categories is deprecated. Convert the categorical to a different dtype or add the ``other`` to the categories first (:issue:`24077`).
13101309
- :meth:`Series.clip_lower`, :meth:`Series.clip_upper`, :meth:`DataFrame.clip_lower` and :meth:`DataFrame.clip_upper` are deprecated and will be removed in a future version. Use ``Series.clip(lower=threshold)``, ``Series.clip(upper=threshold)`` and the equivalent ``DataFrame`` methods (:issue:`24203`)
13111310
- :meth:`Series.nonzero` is deprecated and will be removed in a future version (:issue:`18262`)
1311+
- Passing an integer to :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``timedelta64[ns]`` dtypes is deprecated, will raise ``TypeError`` in a future version. Use ``obj.fillna(pd.Timedelta(...))`` instead (:issue:`24694`)
13121312

13131313
.. _whatsnew_0240.deprecations.datetimelike_int_ops:
13141314

@@ -1352,6 +1352,52 @@ the object's ``freq`` attribute (:issue:`21939`, :issue:`23878`).
13521352
dti + pd.Index([1 * dti.freq, 2 * dti.freq])
13531353
13541354
1355+
.. _whatsnew_0240.deprecations.integer_tz:
1356+
1357+
Passing Integer data and a timezone to DatetimeIndex
1358+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1359+
1360+
The behavior of :class:`DatetimeIndex` when passed integer data and
1361+
a timezone is changing in a future version of pandas. Previously, these
1362+
were interpreted as wall times in the desired timezone. In the future,
1363+
these will be interpreted as wall times in UTC, which are then converted
1364+
to the desired timezone (:issue:`24559`).
1365+
1366+
The default behavior remains the same, but issues a warning:
1367+
1368+
.. code-block:: ipython
1369+
1370+
In [3]: pd.DatetimeIndex([946684800000000000], tz="US/Central")
1371+
/bin/ipython:1: FutureWarning:
1372+
Passing integer-dtype data and a timezone to DatetimeIndex. Integer values
1373+
will be interpreted differently in a future version of pandas. Previously,
1374+
these were viewed as datetime64[ns] values representing the wall time
1375+
*in the specified timezone*. In the future, these will be viewed as
1376+
datetime64[ns] values representing the wall time *in UTC*. This is similar
1377+
to a nanosecond-precision UNIX epoch. To accept the future behavior, use
1378+
1379+
pd.to_datetime(integer_data, utc=True).tz_convert(tz)
1380+
1381+
To keep the previous behavior, use
1382+
1383+
pd.to_datetime(integer_data).tz_localize(tz)
1384+
1385+
#!/bin/python3
1386+
Out[3]: DatetimeIndex(['2000-01-01 00:00:00-06:00'], dtype='datetime64[ns, US/Central]', freq=None)
1387+
1388+
As the warning message explains, opt in to the future behavior by specifying that
1389+
the integer values are UTC, and then converting to the final timezone:
1390+
1391+
.. ipython:: python
1392+
1393+
pd.to_datetime([946684800000000000], utc=True).tz_convert('US/Central')
1394+
1395+
The old behavior can be retained with by localizing directly to the final timezone:
1396+
1397+
.. ipython:: python
1398+
1399+
pd.to_datetime([946684800000000000]).tz_localize('US/Central')
1400+
13551401
.. _whatsnew_0240.deprecations.tz_aware_array:
13561402

13571403
Converting Timezone-Aware Series and Index to NumPy Arrays
@@ -1479,7 +1525,7 @@ Performance Improvements
14791525
- Improved performance of :meth:`~DataFrame.where` for Categorical data (:issue:`24077`)
14801526
- Improved performance of iterating over a :class:`Series`. Using :meth:`DataFrame.itertuples` now creates iterators
14811527
without internally allocating lists of all elements (:issue:`20783`)
1482-
- Improved performance of :class:`Period` constructor, additionally benefitting ``PeriodArray`` and ``PeriodIndex`` creation (:issue:`24084` and :issue:`24118`)
1528+
- Improved performance of :class:`Period` constructor, additionally benefitting ``PeriodArray`` and ``PeriodIndex`` creation (:issue:`24084`, :issue:`24118`)
14831529
- Improved performance of tz-aware :class:`DatetimeArray` binary operations (:issue:`24491`)
14841530

14851531
.. _whatsnew_0240.bug_fixes:
@@ -1568,6 +1614,7 @@ Timedelta
15681614
- Bug in :class:`Timedelta` and :func:`to_timedelta()` have inconsistencies in supported unit string (:issue:`21762`)
15691615
- Bug in :class:`TimedeltaIndex` division where dividing by another :class:`TimedeltaIndex` raised ``TypeError`` instead of returning a :class:`Float64Index` (:issue:`23829`, :issue:`22631`)
15701616
- Bug in :class:`TimedeltaIndex` comparison operations where comparing against non-``Timedelta``-like objects would raise ``TypeError`` instead of returning all-``False`` for ``__eq__`` and all-``True`` for ``__ne__`` (:issue:`24056`)
1617+
- Bug in :class:`Timedelta` comparisons when comparing with a ``Tick`` object incorrectly raising ``TypeError`` (:issue:`24710`)
15711618

15721619
Timezones
15731620
^^^^^^^^^
@@ -1625,7 +1672,7 @@ Numeric
16251672
- Bug in :class:`DataFrame` with ``timedelta64[ns]`` dtype arithmetic operations with ``ndarray`` with integer dtype incorrectly treating the narray as ``timedelta64[ns]`` dtype (:issue:`23114`)
16261673
- Bug in :meth:`Series.rpow` with object dtype ``NaN`` for ``1 ** NA`` instead of ``1`` (:issue:`22922`).
16271674
- :meth:`Series.agg` can now handle numpy NaN-aware methods like :func:`numpy.nansum` (:issue:`19629`)
1628-
- Bug in :meth:`Series.rank` and :meth:`DataFrame.rank` when ``pct=True`` and more than 2:sup:`24` rows are present resulted in percentages greater than 1.0 (:issue:`18271`)
1675+
- Bug in :meth:`Series.rank` and :meth:`DataFrame.rank` when ``pct=True`` and more than 2\ :sup:`24` rows are present resulted in percentages greater than 1.0 (:issue:`18271`)
16291676
- Calls such as :meth:`DataFrame.round` with a non-unique :meth:`CategoricalIndex` now return expected data. Previously, data would be improperly duplicated (:issue:`21809`).
16301677
- Added ``log10``, `floor` and `ceil` to the list of supported functions in :meth:`DataFrame.eval` (:issue:`24139`, :issue:`24353`)
16311678
- Logical operations ``&, |, ^`` between :class:`Series` and :class:`Index` will no longer raise ``ValueError`` (:issue:`22092`)
@@ -1638,6 +1685,7 @@ Conversion
16381685
- Bug in :meth:`DataFrame.combine_first` in which column types were unexpectedly converted to float (:issue:`20699`)
16391686
- Bug in :meth:`DataFrame.clip` in which column types are not preserved and casted to float (:issue:`24162`)
16401687
- Bug in :meth:`DataFrame.clip` when order of columns of dataframes doesn't match, result observed is wrong in numeric values (:issue:`20911`)
1688+
- Bug in :meth:`DataFrame.astype` where converting to an extension dtype when duplicate column names are present causes a ``RecursionError`` (:issue:`24704`)
16411689

16421690
Strings
16431691
^^^^^^^
@@ -1711,7 +1759,7 @@ I/O
17111759
- Bug in :meth:`read_excel()` when ``parse_cols`` is specified with an empty dataset (:issue:`9208`)
17121760
- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
17131761
- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
1714-
- :func:`read_csv()` and func:`read_table()` will throw ``UnicodeError`` and not coredump on badly encoded strings (:issue:`22748`)
1762+
- :func:`read_csv()` and :func:`read_table()` will throw ``UnicodeError`` and not coredump on badly encoded strings (:issue:`22748`)
17151763
- :func:`read_csv()` will correctly parse timezone-aware datetimes (:issue:`22256`)
17161764
- Bug in :func:`read_csv()` in which memory management was prematurely optimized for the C engine when the data was being read in chunks (:issue:`23509`)
17171765
- Bug in :func:`read_csv()` in unnamed columns were being improperly identified when extracting a multi-index (:issue:`23687`)
@@ -1742,6 +1790,7 @@ I/O
17421790
- Bug in :meth:`DataFrame.to_dict` when the resulting dict contains non-Python scalars in the case of numeric data (:issue:`23753`)
17431791
- :func:`DataFrame.to_string()`, :func:`DataFrame.to_html()`, :func:`DataFrame.to_latex()` will correctly format output when a string is passed as the ``float_format`` argument (:issue:`21625`, :issue:`22270`)
17441792
- Bug in :func:`read_csv` that caused it to raise ``OverflowError`` when trying to use 'inf' as ``na_value`` with integer index column (:issue:`17128`)
1793+
- Bug in :func:`read_csv` that caused the C engine on Python 3.6+ on Windows to improperly read CSV filenames with accented or special characters (:issue:`15086`)
17451794
- Bug in :func:`read_fwf` in which the compression type of a file was not being properly inferred (:issue:`22199`)
17461795
- Bug in :func:`pandas.io.json.json_normalize` that caused it to raise ``TypeError`` when two consecutive elements of ``record_path`` are dicts (:issue:`22706`)
17471796
- Bug in :meth:`DataFrame.to_stata`, :class:`pandas.io.stata.StataWriter` and :class:`pandas.io.stata.StataWriter117` where a exception would leave a partially written and invalid dta file (:issue:`23573`)

environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ dependencies:
1414
- cython>=0.28.2
1515
- flake8
1616
- flake8-comprehensions
17-
- flake8-rst>=0.6.0
17+
- flake8-rst>=0.6.0,<=0.7.0
1818
- gitpython
1919
- hypothesis>=3.82
2020
- isort

pandas/_libs/parsers.pyx

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -677,7 +677,13 @@ cdef class TextReader:
677677

678678
if isinstance(source, basestring):
679679
if not isinstance(source, bytes):
680-
source = source.encode(sys.getfilesystemencoding() or 'utf-8')
680+
if compat.PY36 and compat.is_platform_windows():
681+
# see gh-15086.
682+
encoding = "mbcs"
683+
else:
684+
encoding = sys.getfilesystemencoding() or "utf-8"
685+
686+
source = source.encode(encoding)
681687

682688
if self.memory_map:
683689
ptr = new_mmap(source)

pandas/_libs/tslibs/offsets.pyx

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ import cython
55
import time
66
from cpython.datetime cimport (PyDateTime_IMPORT,
77
PyDateTime_Check,
8+
PyDelta_Check,
89
datetime, timedelta,
910
time as dt_time)
1011
PyDateTime_IMPORT
@@ -28,6 +29,9 @@ from pandas._libs.tslibs.np_datetime cimport (
2829
npy_datetimestruct, dtstruct_to_dt64, dt64_to_dtstruct)
2930
from pandas._libs.tslibs.timezones import UTC
3031

32+
33+
PY2 = bytes == str
34+
3135
# ---------------------------------------------------------------------
3236
# Constants
3337

@@ -126,6 +130,26 @@ def apply_index_wraps(func):
126130
return wrapper
127131

128132

133+
cdef _wrap_timedelta_result(result):
134+
"""
135+
Tick operations dispatch to their Timedelta counterparts. Wrap the result
136+
of these operations in a Tick if possible.
137+
138+
Parameters
139+
----------
140+
result : object
141+
142+
Returns
143+
-------
144+
object
145+
"""
146+
if PyDelta_Check(result):
147+
# convert Timedelta back to a Tick
148+
from pandas.tseries.offsets import _delta_to_tick
149+
return _delta_to_tick(result)
150+
151+
return result
152+
129153
# ---------------------------------------------------------------------
130154
# Business Helpers
131155

@@ -388,12 +412,12 @@ class _BaseOffset(object):
388412
**self.kwds)
389413

390414
def __neg__(self):
391-
# Note: we are defering directly to __mul__ instead of __rmul__, as
415+
# Note: we are deferring directly to __mul__ instead of __rmul__, as
392416
# that allows us to use methods that can go in a `cdef class`
393417
return self * -1
394418

395419
def copy(self):
396-
# Note: we are defering directly to __mul__ instead of __rmul__, as
420+
# Note: we are deferring directly to __mul__ instead of __rmul__, as
397421
# that allows us to use methods that can go in a `cdef class`
398422
return self * 1
399423

@@ -508,7 +532,13 @@ class _Tick(object):
508532
dummy class to mix into tseries.offsets.Tick so that in tslibs.period we
509533
can do isinstance checks on _Tick and avoid importing tseries.offsets
510534
"""
511-
pass
535+
536+
def __truediv__(self, other):
537+
result = self.delta.__truediv__(other)
538+
return _wrap_timedelta_result(result)
539+
540+
if PY2:
541+
__div__ = __truediv__
512542

513543

514544
# ----------------------------------------------------------------------

pandas/_libs/tslibs/timedeltas.pyx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ from pandas._libs.tslibs.nattype import nat_strings
3636
from pandas._libs.tslibs.nattype cimport (
3737
checknull_with_nat, NPY_NAT, c_NaT as NaT)
3838
from pandas._libs.tslibs.offsets cimport to_offset
39+
from pandas._libs.tslibs.offsets import _Tick as Tick
3940

4041
# ----------------------------------------------------------------------
4142
# Constants
@@ -757,7 +758,7 @@ cdef class _Timedelta(timedelta):
757758

758759
if isinstance(other, _Timedelta):
759760
ots = other
760-
elif PyDelta_Check(other):
761+
elif PyDelta_Check(other) or isinstance(other, Tick):
761762
ots = Timedelta(other)
762763
else:
763764
ndim = getattr(other, "ndim", -1)

pandas/core/arrays/datetimelike.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -606,7 +606,7 @@ def _concat_same_type(cls, to_concat):
606606

607607
def copy(self, deep=False):
608608
values = self.asi8.copy()
609-
return type(self)(values, dtype=self.dtype, freq=self.freq)
609+
return type(self)._simple_new(values, dtype=self.dtype, freq=self.freq)
610610

611611
def _values_for_factorize(self):
612612
return self.asi8, iNaT

0 commit comments

Comments
 (0)