Skip to content
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1150,6 +1150,7 @@ Indexing
- Bug in :meth:`Series.__setitem__` when assigning boolean series with boolean indexer will raise ``LossySetitemError`` (:issue:`57338`)
- Bug in printing :attr:`Index.names` and :attr:`MultiIndex.levels` would not escape single quotes (:issue:`60190`)
- Bug in reindexing of :class:`DataFrame` with :class:`PeriodDtype` columns in case of consolidated block (:issue:`60980`, :issue:`60273`)
- Bug in :meth:`DataFrame.__setitem__` throwing a ``ValueError`` when setting a column with a 2D object array (:issue:`61026`)
- Bug in :meth:`DataFrame.loc.__getitem__` and :meth:`DataFrame.iloc.__getitem__` with a :class:`CategoricalDtype` column with integer categories raising when trying to index a row containing a ``NaN`` entry (:issue:`58954`)
- Bug in :meth:`Index.__getitem__` incorrectly raising with a 0-dim ``np.ndarray`` key (:issue:`55601`)
- Bug in :meth:`Index.get_indexer` not casting missing values correctly for new string datatype (:issue:`55833`)
Expand Down
25 changes: 24 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5502,7 +5502,30 @@ def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:

if is_list_like(value):
com.require_length_match(value, self.index)
return sanitize_array(value, self.index, copy=True, allow_2d=True), None

# GH#61026: special-case 2D inputs for single-column assignment.
# - accept shape (n, 1) by flattening to 1D
# - disallow 2D *object* arrays with more than one column, since those
# correspond to a single column key and should be rejected
arr = value

# np.matrix is always 2D; gonna convert to regular ndarray
if isinstance(arr, np.matrix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case do we get a matrix here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sanitize_column(...) can see an np.matrix when the user assigns one directly. for example: df["col"] = np.matrix([[1], [2], [3]]).

Since, np.matrix is always 2D and preserves its 2D shape under the slicing operation, calling arr[:, 0] (which occurs on line 5517) on a matrix still gives the shape (n, 1) rather than (n,). Essentially, this would mean that we wouldn't actually end up producing a 1D array for matrices in that case.

Hence, I thought converting matrics to a regular ndarray first will ensure that the upcoming blocks behave consistently for both np.ndarray and np.matrix.

arr = np.asarray(arr)

if isinstance(arr, np.ndarray) and arr.ndim == 2:
if arr.shape[1] == 1:
# treating (n, 1) as a length-n 1D array
arr = arr[:, 0]
elif arr.dtype == object:
# single-column setitem with a 2D object array is not allowed.
Comment on lines +5520 to +5521
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only object dtype here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dtype == object guard is there to keep this bugfix scoped tightly to the case that regressed in issue #61026.

The problematic behaviour (ValueError: Buffer has wrong number of dimensions (expected 1, got 2)) only arose when assigning a 2D dtype=object array to a single column. For other dtypes, assigning a 2D array either already behaves correctly or raises a clearer, existing error, so this change leaves those paths alone to avoid altering semantics outside this issue.

msg = (
"Setting a DataFrame column with a 2D array requires "
f"shape (n, 1); got shape {arr.shape}."
)
raise ValueError(msg)
subarr = sanitize_array(arr, self.index, copy=True, allow_2d=True)
return subarr, None

@property
def _series(self):
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -816,6 +816,24 @@ def test_setitem_index_object_dtype_not_inferring(self):
)
tm.assert_frame_equal(df, expected)

def test_setitem_2d_object_array(self):
# GH#61026
df = DataFrame(
{
"c1": [1, 2, 3, 4, 5],
}
)

arr = np.array([["A"], ["B"], ["C"], ["D"], ["E"]], dtype=object)
df["c1"] = arr

expected = DataFrame(
{
"c1": ["A", "B", "C", "D", "E"],
}
)
tm.assert_frame_equal(df, expected)


class TestSetitemTZAwareValues:
@pytest.fixture
Expand Down
Loading