PERF: DataFrame.copy(deep=True) returns a view on the original pyarrow buffer

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

Over in dask/dask#12022 (comment), I'm debugging a test failure with dask and pandas 3.x that comes down to the behavior of DataFrame.copy(deep=True) with an arrow-backed extension array.

In

pandas/pandas/core/arrays/arrow/array.py

Line 1092 in 628c7fb

def copy(self) -> Self:

, we deliberately return a shallow copy (a new object with a view on the original buffers) of the backing array. For correctness, this is fine since pyarrow arrays are immutable, so copying should be unnecessary. However, it does mean that after a DataFrame.copy(deep=True), you'll still have a reference back to the original buffer. If the output of the .copy(deep=True) is the only one with a reference to the original buffer, then it won't be garbage collected. Consider:

import pandas as pd import pyarrow as pa pool = pa.default_memory_pool() print("before", pool.bytes_allocated()) # 0 df = pd.DataFrame({"a": ["a", "b", "c"] * 1000}) print("df", pool.bytes_allocated()) # 27200 del df print("df", pool.bytes_allocated()) # 0 df2 = pd.DataFrame({"a": ["a", "b", "c"] * 1000}) clone = df2.iloc[:0].copy(deep=True) print("df2", pool.bytes_allocated()) # 27200 del df2 print("after - clone", pool.bytes_allocated()) # 27200

Maybe this is fine. We can probably figure out some workaround in dask (in this case we're making an empty dataframe object as a kind of Schema object. We can probably do something other than df.iloc[:0].copy(deep=True)). But perhaps pandas could consider changing the behavior here.

The downside is that df.copy(deep=True) will become more expensive and use more memory.

Installed Versions

In [4]: pd.show_versions() INSTALLED VERSIONS ------------------ commit : 962168f06d15d1aced28b414eb82909d3c930916 python : 3.12.8 python-bits : 64 OS : Darwin OS-release : 24.5.0 Version : Darwin Kernel Version 24.5.0: Tue Apr 22 19:53:27 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6041 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 3.0.0.dev0+2254.g962168f06d numpy : 2.4.0.dev0+git20250717.d02611a dateutil : 2.9.0.post0 pip : None Cython : None sphinx : None IPython : 9.4.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None fastparquet : None fsspec : 2025.7.0 html5lib : None hypothesis : 6.136.1 gcsfs : None jinja2 : 3.1.6 lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None psycopg2 : None pymysql : None pyarrow : 21.0.0 pyiceberg : None pyreadstat : None pytest : 8.4.1 python-calamine : None pytz : 2025.2 pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None qtpy : None pyqt5 : None

Prior Performance

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: `DataFrame.copy(deep=True)` returns a view on the original pyarrow buffer #61930

Pandas version checks

Reproducible Example

Installed Versions

Prior Performance

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PERF: DataFrame.copy(deep=True) returns a view on the original pyarrow buffer #61930

Description

Pandas version checks

Reproducible Example

Installed Versions

Prior Performance

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

PERF: `DataFrame.copy(deep=True)` returns a view on the original pyarrow buffer #61930