BUG: pd.read_parquet raises exception filtering on Period type columns

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

#!/usr/bin/env python3 # Repro: pandas.read_parquet(filters=...) does not accept pandas Period values, # and there is no documented way to pass the correct physical scalar via pandas API. # # Expected: Either accept Period in filters (map to physical storage), or document # an official helper to build Arrow-coercible filters from pandas logical types. import os, sys, json, tempfile import numpy as np import pandas as pd import pyarrow as pa import pyarrow.parquet as pq import pyarrow.dataset as ds print("Versions:") print(" python :", sys.version.split()[0]) print(" pandas :", pd.__version__) print(" pyarrow :", pa.__version__) print() # --- build a tiny dataset: ints, Period[M], datetimes --- months = pd.period_range("2024-01", "2024-12", freq="M") df = pd.DataFrame({ "n": np.arange(1, 13, dtype="int64"), "per_m": months, # pandas Period[M] "ts_m": months.to_timestamp(how="start") # datetime64[ns], first of month }) # write to a temp dir/file tmpdir = tempfile.mkdtemp(prefix="pandas_period_filter_repro_") path = os.path.join(tmpdir, "mini.parquet") df.to_parquet(path, engine="pyarrow", index=False) print("Wrote:", path) print() # --- inspect physical schema + pandas metadata --- dset = ds.dataset(path, format="parquet") schema = dset.schema print("Arrow physical schema:", schema) pmeta = json.loads((schema.metadata or {}).get(b"pandas", b"{}").decode() or "{}") cur_meta = {c["name"]: c for c in pmeta.get("columns", [])} print("Pandas metadata for columns:") for k, v in cur_meta.items(): if "metadata" in v and isinstance(v["metadata"], dict): v = dict(v, metadata={"keys": list(v["metadata"].keys())}) print(" ", k, "→ pandas_type:", v.get("pandas_type"), "metadata:", v.get("metadata")) print() # Helper for pretty printing results def show(label, pdf): print(label) print(pdf.sort_values(["n"]).reset_index(drop=True)) print() # --- 1) numeric filter: works --- pdf_num = pd.read_parquet( path, engine="pyarrow", columns=["n", "per_m", "ts_m"], filters=[("n", ">=", 7)] ) show("Numeric filter n>=7 (EXPECTED TO WORK):", pdf_num) # --- 2) timestamp range filter: works --- start = pd.Timestamp("2024-07-01") pdf_ts = pd.read_parquet( path, engine="pyarrow", columns=["n", "per_m", "ts_m"], filters=[("ts_m", ">=", start)] ) show("Timestamp range filter July 2024 (EXPECTED TO WORK):", pdf_ts) # --- 3a) period equality filter: fails (cannot convert Period) --- m = pd.Period("2024-07", freq="M") try: pdf_per = pd.read_parquet( path, engine="pyarrow", columns=["n", "per_m", "ts_m"], filters=[("per_m", ">=", m)] ) show("Period equality filter per_m >= 2024-07 (UNEXPECTED, but if this prints it worked):", pdf_per) except Exception as e: print("Period equality filter per_m>=Period('2024-07','M') raised (EXPECTED BUG/UNSUPPORTED):") print(" ", type(e).__name__ + ":", e) print() # --- 3b) period ordinal equality filter: also does not match via pandas filters --- # Even though the physical column is integer (ordinals), passing m.ordinal here still # goes through pandas' filter adapter, which treats values as Python scalars and does # not apply the pandas metadata mapping used on read. try: pdf_ord = pd.read_parquet( path, engine="pyarrow", columns=["n", "per_m", "ts_m"], filters=[("per_m", ">=", m.ordinal)] # try integer ordinal directly ) show("Period ordinal filter per_m>=m.ordinal (CURRENT BEHAVIOR):", pdf_ord) if pdf_ord.empty: print(" Note: No rows returned. Ordinal inequality through pandas filters did not match.\n") except Exception as e: print("Period ordinal filter raised:") print(" ", type(e).__name__ + ":", e) print()

Issue Description

I created a small Parquet file with 3 columns:

n – integers 1 .. 12
per_m – period[M] values (2024-01 .. 2024-12)
ts_m – timestamps (2024-01-01 .. 2024-12-01)

Then used pandas.read_parquet(..., filters=[...]) to test inequality filters (>=) on each column.

Expected Behavior

All three columns should be filterable. For a Period column, either:

pandas should accept Period scalars in filters=... and coerce them to the correct Arrow scalar, or
there should be a documented way to build a filter that matches the underlying Arrow storage.

Observed behavior

Numeric (n >= 7) ✅ works

Timestamp (ts_m >= '2024-07-01') ✅ works

Period (per_m >= Period('2024-07','M')) ❌ fails

The error message on the Period column with a date comparison is:

ArrowInvalid: Could not convert Period('2024-07', 'M') with type Period: did not recognize Python value type when inferring an Arrow data type

And on the Period.ordinal (which matches the internal arrow representation) I get:

ArrowNotImplementedError: Function 'greater_equal' has no kernel matching input types (extension<pandas.period<ArrowPeriodType>>, int16)

Installed Versions

INSTALLED VERSIONS

commit : 9c8bc3e
python : 3.13.2
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.26100
machine : AMD64
processor : AMD64 Family 26 Model 36 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_United States.1252

pandas : 2.3.3
numpy : 2.2.4
pytz : 2025.2
dateutil : 2.9.0.post0
pip : 24.3.1
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.4
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2025.9.0
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.6
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 21.0.0
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.15.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.2
qtpy : None
pyqt5 : None

Full Program Output:

Versions:
python : 3.13.2
pandas : 2.3.3
pyarrow : 21.0.0

Wrote: C:\Users\damie\AppData\Local\Temp\pandas_period_filter_repro_min05yqt\mini.parquet

Arrow physical schema: n: int64
per_m: extension<pandas.period>
ts_m: timestamp[ns]
-- schema metadata --
pandas: '{"index_columns": [], "column_indexes": [], "columns": [{"name":' + 403
Pandas metadata for columns:
n → pandas_type: int64 metadata: None
per_m → pandas_type: object metadata: None
ts_m → pandas_type: datetime metadata: None

Numeric filter n>=7 (EXPECTED TO WORK):
n per_m ts_m
0 7 2024-07 2024-07-01
1 8 2024-08 2024-08-01
2 9 2024-09 2024-09-01
3 10 2024-10 2024-10-01
4 11 2024-11 2024-11-01
5 12 2024-12 2024-12-01

Timestamp range filter July 2024 (EXPECTED TO WORK):
n per_m ts_m
0 7 2024-07 2024-07-01
1 8 2024-08 2024-08-01
2 9 2024-09 2024-09-01
3 10 2024-10 2024-10-01
4 11 2024-11 2024-11-01
5 12 2024-12 2024-12-01

Period equality filter per_m>=Period('2024-07','M') raised (EXPECTED BUG/UNSUPPORTED):
ArrowInvalid: Could not convert Period('2024-07', 'M') with type Period: did not recognize Python value type when inferring an Arrow data type

Period ordinal filter raised:
ArrowNotImplementedError: Function 'greater_equal' has no kernel matching input types (extension<pandas.period>, int16)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: pd.read_parquet raises exception filtering on Period type columns #62769

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Observed behavior

Installed Versions

INSTALLED VERSIONS

Full Program Output:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: pd.read_parquet raises exception filtering on Period type columns #62769

Description

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Observed behavior

Installed Versions

INSTALLED VERSIONS

Full Program Output:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions