Skip to content

Conversation

@akkik04
Copy link

@akkik04 akkik04 commented Nov 24, 2025

Fixed DataFrame.__setitem__ so that assigning a 2D NumPy array with dtype=object and shape (n, 1) to a single column works the same way as the non-object case, and raise clearer, high-level errors for unsupported shapes. More detail below:

Before this change:

  • Assigning a 2D NumPy dtype=object array with shape (n, 1) to a single DataFrame column (e.g., df["c1"] = t2) raised a low-level ValueError: Buffer has wrong number of dimensions (expected 1, got 2). This was coming from lib.maybe_convert_objects, instead of behaving like the non-object case.
  • 2D non-object arrays with shape (n, 1) already worked just fine, and assigning a 2D array with multiple columns to multiple columns (e.g., df[["c1", "c2"]] = t3) also worked, but ndim > 2 arrays could surface confusing internal errors.

After this change:

  • Assigning a 2D NumPy dtype=object array with shape (n, 1) to a single column now works by flattening (n, 1) to a 1D (n,) array, matching the behaviour of non-object arrays.
  • Assigning a 2D array with more than one column to a single column raises a clear, user-facing ValueError explaining that only (n, 1) is supported and suggesting multi-column assignment (e.g., df[["c1", "c2"]] = some_values) for wider arrays.
  • Assigning arrays with ndim >= 3 to a single column is now raises an explicit ValueError indicating that setting a column with that spec is not supported. The existing multi-column assignment with 2D arrays remains unchanged.
@akkik04
Copy link
Author

akkik04 commented Dec 2, 2025

can I get some eyes on this when you get a chance @rhshadrach 🙌

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

arr = value

# np.matrix is always 2D; gonna convert to regular ndarray
if isinstance(arr, np.matrix):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case do we get a matrix here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sanitize_column(...) can see an np.matrix when the user assigns one directly. for example: df["col"] = np.matrix([[1], [2], [3]]).

Since, np.matrix is always 2D and preserves its 2D shape under the slicing operation, calling arr[:, 0] (which occurs on line 5517) on a matrix still gives the shape (n, 1) rather than (n,). Essentially, this would mean that we wouldn't actually end up producing a 1D array for matrices in that case.

Hence, I thought converting matrics to a regular ndarray first will ensure that the upcoming blocks behave consistently for both np.ndarray and np.matrix.

Comment on lines +5520 to +5521
elif arr.dtype == object:
# single-column setitem with a 2D object array is not allowed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only object dtype here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dtype == object guard is there to keep this bugfix scoped tightly to the case that regressed in issue #61026.

The problematic behaviour (ValueError: Buffer has wrong number of dimensions (expected 1, got 2)) only arose when assigning a 2D dtype=object array to a single column. For other dtypes, assigning a 2D array either already behaves correctly or raises a clearer, existing error, so this change leaves those paths alone to avoid altering semantics outside this issue.

@akkik04 akkik04 requested a review from rhshadrach December 11, 2025 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants