BUG: fix `DataFrame.setitem` with 2D object arrays #63184

akkik04 · 2025-11-24T02:13:54Z

closes BUG: setting column with 2D object array raises #61026
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.
If I used AI to develop this pull request, I prompted it to follow AGENTS.md.

Fixed DataFrame.__setitem__ so that assigning a 2D NumPy array with dtype=object and shape (n, 1) to a single column works the same way as the non-object case, and raise clearer, high-level errors for unsupported shapes. More detail below:

Before this change:

Assigning a 2D NumPy dtype=object array with shape (n, 1) to a single DataFrame column (e.g., df["c1"] = t2) raised a low-level ValueError: Buffer has wrong number of dimensions (expected 1, got 2). This was coming from lib.maybe_convert_objects, instead of behaving like the non-object case.
2D non-object arrays with shape (n, 1) already worked just fine, and assigning a 2D array with multiple columns to multiple columns (e.g., df[["c1", "c2"]] = t3) also worked, but ndim > 2 arrays could surface confusing internal errors.

After this change:

Assigning a 2D NumPy dtype=object array with shape (n, 1) to a single column now works by flattening (n, 1) to a 1D (n,) array, matching the behaviour of non-object arrays.
Assigning a 2D array with more than one column to a single column raises a clear, user-facing ValueError explaining that only (n, 1) is supported and suggesting multi-column assignment (e.g., df[["c1", "c2"]] = some_values) for wider arrays.
Assigning arrays with ndim >= 3 to a single column is now raises an explicit ValueError indicating that setting a column with that spec is not supported. The existing multi-column assignment with 2D arrays remains unchanged.

akkik04 · 2025-12-02T17:07:30Z

can I get some eyes on this when you get a chance @rhshadrach 🙌

rhshadrach

Thanks for the PR!

rhshadrach · 2025-12-11T02:52:57Z

pandas/core/frame.py

+ arr = value
+
+ # np.matrix is always 2D; gonna convert to regular ndarray
+ if isinstance(arr, np.matrix):


In what case do we get a matrix here?

_sanitize_column(...) can see an np.matrix when the user assigns one directly. for example: df["col"] = np.matrix([[1], [2], [3]]).

Since, np.matrix is always 2D and preserves its 2D shape under the slicing operation, calling arr[:, 0] (which occurs on line 5517) on a matrix still gives the shape (n, 1) rather than (n,). Essentially, this would mean that we wouldn't actually end up producing a 1D array for matrices in that case.

Hence, I thought converting matrics to a regular ndarray first will ensure that the upcoming blocks behave consistently for both np.ndarray and np.matrix.

rhshadrach · 2025-12-11T02:54:02Z

pandas/core/frame.py

+ elif arr.dtype == object:
+ # single-column setitem with a 2D object array is not allowed.


Why only object dtype here?

The dtype == object guard is there to keep this bugfix scoped tightly to the case that regressed in issue #61026.

The problematic behaviour (ValueError: Buffer has wrong number of dimensions (expected 1, got 2)) only arose when assigning a 2D dtype=object array to a single column. For other dtypes, assigning a 2D array either already behaves correctly or raises a clearer, existing error, so this change leaves those paths alone to avoid altering semantics outside this issue.

akkik04 added 7 commits November 23, 2025 19:28

proposed fix for issue pandas-dev#61026

65df683

applied linting

8a8c670

documenting my changes

62f7c4b

fix comment error

aa707b1

fixing pyarrow errors in control flow logic

c5c8953

new fix

78f8ce7

trying new patch location and logic & revamped test infra

e2ad3fb

rhshadrach reviewed Dec 11, 2025

View reviewed changes

akkik04 requested a review from rhshadrach December 11, 2025 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: fix `DataFrame.setitem` with 2D object arrays #63184

BUG: fix `DataFrame.setitem` with 2D object arrays #63184

akkik04 commented Nov 24, 2025

akkik04 commented Dec 2, 2025

rhshadrach left a comment

rhshadrach Dec 11, 2025

akkik04 Dec 11, 2025

rhshadrach Dec 11, 2025

akkik04 Dec 11, 2025

Labels

2 participants

		elif arr.dtype == object:
		# single-column setitem with a 2D object array is not allowed.

Uh oh!

BUG: fix DataFrame.__setitem__ with 2D object arrays #63184

Are you sure you want to change the base?

BUG: fix DataFrame.__setitem__ with 2D object arrays #63184

Conversation

akkik04 commented Nov 24, 2025

Before this change:

After this change:

akkik04 commented Dec 2, 2025

rhshadrach left a comment

Choose a reason for hiding this comment

rhshadrach Dec 11, 2025

Choose a reason for hiding this comment

akkik04 Dec 11, 2025

Choose a reason for hiding this comment

rhshadrach Dec 11, 2025

Choose a reason for hiding this comment

akkik04 Dec 11, 2025

Choose a reason for hiding this comment

Labels

2 participants

BUG: fix `DataFrame.setitem` with 2D object arrays #63184

BUG: fix `DataFrame.setitem` with 2D object arrays #63184