Skip to content

Commit bb6e2b5

Browse files
authored
Merge branch 'main' into #29049_make_holiday_support_offsets_of_offsets
2 parents d18187c + 4f145b3 commit bb6e2b5

40 files changed

+271
-384
lines changed

asv_bench/benchmarks/join_merge.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -328,6 +328,23 @@ def time_i8merge(self, how):
328328
merge(self.left, self.right, how=how)
329329

330330

331+
class UniqueMerge:
332+
params = [4_000_000, 1_000_000]
333+
param_names = ["unique_elements"]
334+
335+
def setup(self, unique_elements):
336+
N = 1_000_000
337+
self.left = DataFrame({"a": np.random.randint(1, unique_elements, (N,))})
338+
self.right = DataFrame({"a": np.random.randint(1, unique_elements, (N,))})
339+
uniques = self.right.a.drop_duplicates()
340+
self.right["a"] = concat(
341+
[uniques, Series(np.arange(0, -(N - len(uniques)), -1))], ignore_index=True
342+
)
343+
344+
def time_unique_merge(self, unique_elements):
345+
merge(self.left, self.right, how="inner")
346+
347+
331348
class MergeDatetime:
332349
params = [
333350
[

doc/source/development/contributing_docstring.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -940,7 +940,7 @@ Finally, docstrings can also be appended to with the ``doc`` decorator.
940940

941941
In this example, we'll create a parent docstring normally (this is like
942942
``pandas.core.generic.NDFrame``). Then we'll have two children (like
943-
``pandas.core.series.Series`` and ``pandas.core.frame.DataFrame``). We'll
943+
``pandas.core.series.Series`` and ``pandas.DataFrame``). We'll
944944
substitute the class names in this docstring.
945945

946946
.. code-block:: python

doc/source/development/maintaining.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -151,15 +151,15 @@ and then run::
151151
git bisect start
152152
git bisect good v1.4.0
153153
git bisect bad v1.5.0
154-
git bisect run bash -c "python setup.py build_ext -j 4; python t.py"
154+
git bisect run bash -c "python -m pip install -ve . --no-build-isolation --config-settings editable-verbose=true; python t.py"
155155

156156
This finds the first commit that changed the behavior. The C extensions have to be
157157
rebuilt at every step, so the search can take a while.
158158

159159
Exit bisect and rebuild the current version::
160160

161161
git bisect reset
162-
python setup.py build_ext -j 4
162+
python -m pip install -ve . --no-build-isolation --config-settings editable-verbose=true
163163

164164
Report your findings under the corresponding issue and ping the commit author to get
165165
their input.

doc/source/user_guide/enhancingperf.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -453,7 +453,7 @@ by evaluate arithmetic and boolean expression all at once for large :class:`~pan
453453
:func:`~pandas.eval` is many orders of magnitude slower for
454454
smaller expressions or objects than plain Python. A good rule of thumb is
455455
to only use :func:`~pandas.eval` when you have a
456-
:class:`.DataFrame` with more than 10,000 rows.
456+
:class:`~pandas.core.frame.DataFrame` with more than 10,000 rows.
457457

458458
Supported syntax
459459
~~~~~~~~~~~~~~~~

doc/source/user_guide/io.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6400,7 +6400,7 @@ ignored.
64006400
In [2]: df = pd.DataFrame({'A': np.random.randn(sz), 'B': [1] * sz})
64016401
64026402
In [3]: df.info()
6403-
<class 'pandas.core.frame.DataFrame'>
6403+
<class 'pandas.DataFrame'>
64046404
RangeIndex: 1000000 entries, 0 to 999999
64056405
Data columns (total 2 columns):
64066406
A 1000000 non-null float64

doc/source/whatsnew/v0.24.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -840,7 +840,7 @@ then all the columns are dummy-encoded, and a :class:`SparseDataFrame` was retur
840840
In [2]: df = pd.DataFrame({"A": [1, 2], "B": ['a', 'b'], "C": ['a', 'a']})
841841
842842
In [3]: type(pd.get_dummies(df, sparse=True))
843-
Out[3]: pandas.core.frame.DataFrame
843+
Out[3]: pandas.DataFrame
844844
845845
In [4]: type(pd.get_dummies(df[['B', 'C']], sparse=True))
846846
Out[4]: pandas.core.sparse.frame.SparseDataFrame

doc/source/whatsnew/v1.0.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -414,7 +414,7 @@ Extended verbose info output for :class:`~pandas.DataFrame`
414414
... "text_col": ["a", "b", "c"],
415415
... "float_col": [0.0, 0.1, 0.2]})
416416
In [2]: df.info(verbose=True)
417-
<class 'pandas.core.frame.DataFrame'>
417+
<class 'pandas.DataFrame'>
418418
RangeIndex: 3 entries, 0 to 2
419419
Data columns (total 3 columns):
420420
int_col 3 non-null int64

doc/source/whatsnew/v3.0.0.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,7 @@ Removal of prior version deprecations/changes
211211
- Enforced deprecation of strings ``T``, ``L``, ``U``, and ``N`` denoting frequencies in :class:`Minute`, :class:`Second`, :class:`Milli`, :class:`Micro`, :class:`Nano` (:issue:`57627`)
212212
- Enforced deprecation of strings ``T``, ``L``, ``U``, and ``N`` denoting units in :class:`Timedelta` (:issue:`57627`)
213213
- Enforced deprecation of the behavior of :func:`concat` when ``len(keys) != len(objs)`` would truncate to the shorter of the two. Now this raises a ``ValueError`` (:issue:`43485`)
214+
- Enforced deprecation of values "pad", "ffill", "bfill", and "backfill" for :meth:`Series.interpolate` and :meth:`DataFrame.interpolate` (:issue:`57869`)
214215
- Enforced silent-downcasting deprecation for :ref:`all relevant methods <whatsnew_220.silent_downcasting>` (:issue:`54710`)
215216
- In :meth:`DataFrame.stack`, the default value of ``future_stack`` is now ``True``; specifying ``False`` will raise a ``FutureWarning`` (:issue:`55448`)
216217
- Iterating over a :class:`.DataFrameGroupBy` or :class:`.SeriesGroupBy` will return tuples of length 1 for the groups when grouping by ``level`` a list of length 1 (:issue:`50064`)
@@ -256,6 +257,7 @@ Removal of prior version deprecations/changes
256257
- Removed unused arguments ``*args`` and ``**kwargs`` in :class:`Resampler` methods (:issue:`50977`)
257258
- Unrecognized timezones when parsing strings to datetimes now raises a ``ValueError`` (:issue:`51477`)
258259
- Removed the :class:`Grouper` attributes ``ax``, ``groups``, ``indexer``, and ``obj`` (:issue:`51206`, :issue:`51182`)
260+
- Removed deprecated keyword ``verbose`` on :func:`read_csv` and :func:`read_table` (:issue:`56556`)
259261
- Removed the attribute ``dtypes`` from :class:`.DataFrameGroupBy` (:issue:`51997`)
260262

261263
.. ---------------------------------------------------------------------------
@@ -284,6 +286,7 @@ Performance improvements
284286
- Performance improvement in :meth:`RangeIndex.join` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57651`, :issue:`57752`)
285287
- Performance improvement in :meth:`RangeIndex.reindex` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57647`, :issue:`57752`)
286288
- Performance improvement in :meth:`RangeIndex.take` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57445`, :issue:`57752`)
289+
- Performance improvement in :func:`merge` if hash-join can be used (:issue:`57970`)
287290
- Performance improvement in ``DataFrameGroupBy.__len__`` and ``SeriesGroupBy.__len__`` (:issue:`57595`)
288291
- Performance improvement in indexing operations for string dtypes (:issue:`56997`)
289292
- Performance improvement in unary methods on a :class:`RangeIndex` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57825`)

pandas/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@
2828
raise ImportError(
2929
f"C extension: {_module} not built. If you want to import "
3030
"pandas from the source directory, you may need to run "
31-
"'python setup.py build_ext' to build the C extensions first."
31+
"'python -m pip install -ve . --no-build-isolation --config-settings "
32+
"editable-verbose=true' to build the C extensions first."
3233
) from _err
3334

3435
from pandas._config import (

pandas/_libs/hashtable.pyi

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ def unique_label_indices(
1616
class Factorizer:
1717
count: int
1818
uniques: Any
19-
def __init__(self, size_hint: int) -> None: ...
19+
def __init__(self, size_hint: int, uses_mask: bool = False) -> None: ...
2020
def get_count(self) -> int: ...
2121
def factorize(
2222
self,
@@ -25,6 +25,9 @@ class Factorizer:
2525
na_value=...,
2626
mask=...,
2727
) -> npt.NDArray[np.intp]: ...
28+
def hash_inner_join(
29+
self, values: np.ndarray, mask=...
30+
) -> tuple[np.ndarray, np.ndarray]: ...
2831

2932
class ObjectFactorizer(Factorizer):
3033
table: PyObjectHashTable
@@ -216,6 +219,9 @@ class HashTable:
216219
mask=...,
217220
ignore_na: bool = True,
218221
) -> tuple[np.ndarray, npt.NDArray[np.intp]]: ... # np.ndarray[subclass-specific]
222+
def hash_inner_join(
223+
self, values: np.ndarray, mask=...
224+
) -> tuple[np.ndarray, np.ndarray]: ...
219225

220226
class Complex128HashTable(HashTable): ...
221227
class Complex64HashTable(HashTable): ...

0 commit comments

Comments
 (0)