-  
-   Notifications  You must be signed in to change notification settings 
- Fork 19.2k
Description
Pandas version checks
-  I have checked that this issue has not already been reported. 
-  I have confirmed this bug exists on the latest version of pandas. 
-  I have confirmed this bug exists on the main branch of pandas. 
Reproducible Example
#!/usr/bin/env python3 # Repro: pandas.read_parquet(filters=...) does not accept pandas Period values, # and there is no documented way to pass the correct physical scalar via pandas API. # # Expected: Either accept Period in filters (map to physical storage), or document # an official helper to build Arrow-coercible filters from pandas logical types. import os, sys, json, tempfile import numpy as np import pandas as pd import pyarrow as pa import pyarrow.parquet as pq import pyarrow.dataset as ds print("Versions:") print(" python :", sys.version.split()[0]) print(" pandas :", pd.__version__) print(" pyarrow :", pa.__version__) print() # --- build a tiny dataset: ints, Period[M], datetimes --- months = pd.period_range("2024-01", "2024-12", freq="M") df = pd.DataFrame({ "n": np.arange(1, 13, dtype="int64"), "per_m": months, # pandas Period[M] "ts_m": months.to_timestamp(how="start") # datetime64[ns], first of month }) # write to a temp dir/file tmpdir = tempfile.mkdtemp(prefix="pandas_period_filter_repro_") path = os.path.join(tmpdir, "mini.parquet") df.to_parquet(path, engine="pyarrow", index=False) print("Wrote:", path) print() # --- inspect physical schema + pandas metadata --- dset = ds.dataset(path, format="parquet") schema = dset.schema print("Arrow physical schema:", schema) pmeta = json.loads((schema.metadata or {}).get(b"pandas", b"{}").decode() or "{}") cur_meta = {c["name"]: c for c in pmeta.get("columns", [])} print("Pandas metadata for columns:") for k, v in cur_meta.items(): if "metadata" in v and isinstance(v["metadata"], dict): v = dict(v, metadata={"keys": list(v["metadata"].keys())}) print(" ", k, "→ pandas_type:", v.get("pandas_type"), "metadata:", v.get("metadata")) print() # Helper for pretty printing results def show(label, pdf): print(label) print(pdf.sort_values(["n"]).reset_index(drop=True)) print() # --- 1) numeric filter: works --- pdf_num = pd.read_parquet( path, engine="pyarrow", columns=["n", "per_m", "ts_m"], filters=[("n", ">=", 7)] ) show("Numeric filter n>=7 (EXPECTED TO WORK):", pdf_num) # --- 2) timestamp range filter: works --- start = pd.Timestamp("2024-07-01") pdf_ts = pd.read_parquet( path, engine="pyarrow", columns=["n", "per_m", "ts_m"], filters=[("ts_m", ">=", start)] ) show("Timestamp range filter July 2024 (EXPECTED TO WORK):", pdf_ts) # --- 3a) period equality filter: fails (cannot convert Period) --- m = pd.Period("2024-07", freq="M") try: pdf_per = pd.read_parquet( path, engine="pyarrow", columns=["n", "per_m", "ts_m"], filters=[("per_m", ">=", m)] ) show("Period equality filter per_m >= 2024-07 (UNEXPECTED, but if this prints it worked):", pdf_per) except Exception as e: print("Period equality filter per_m>=Period('2024-07','M') raised (EXPECTED BUG/UNSUPPORTED):") print(" ", type(e).__name__ + ":", e) print() # --- 3b) period ordinal equality filter: also does not match via pandas filters --- # Even though the physical column is integer (ordinals), passing m.ordinal here still # goes through pandas' filter adapter, which treats values as Python scalars and does # not apply the pandas metadata mapping used on read. try: pdf_ord = pd.read_parquet( path, engine="pyarrow", columns=["n", "per_m", "ts_m"], filters=[("per_m", ">=", m.ordinal)] # try integer ordinal directly ) show("Period ordinal filter per_m>=m.ordinal (CURRENT BEHAVIOR):", pdf_ord) if pdf_ord.empty: print(" Note: No rows returned. Ordinal inequality through pandas filters did not match.\n") except Exception as e: print("Period ordinal filter raised:") print(" ", type(e).__name__ + ":", e) print()Issue Description
I created a small Parquet file with 3 columns:
-  n – integers 1 .. 12 
-  per_m – period[M] values (2024-01 .. 2024-12) 
-  ts_m – timestamps (2024-01-01 .. 2024-12-01) 
Then used pandas.read_parquet(..., filters=[...]) to test inequality filters (>=) on each column.
Expected Behavior
All three columns should be filterable. For a Period column, either:
-  pandas should accept Period scalars in filters=... and coerce them to the correct Arrow scalar, or 
-  there should be a documented way to build a filter that matches the underlying Arrow storage. 
Observed behavior
Numeric (n >= 7) ✅ works
Timestamp (ts_m >= '2024-07-01') ✅ works
Period (per_m >= Period('2024-07','M')) ❌ fails
The error message on the Period column with a date comparison is:
 ArrowInvalid: Could not convert Period('2024-07', 'M') with type Period: did not recognize Python value type when inferring an Arrow data type
And on the Period.ordinal (which matches the internal arrow representation) I get:
ArrowNotImplementedError: Function 'greater_equal' has no kernel matching input types (extension<pandas.period<ArrowPeriodType>>, int16)
Installed Versions
INSTALLED VERSIONS
commit : 9c8bc3e
 python : 3.13.2
 python-bits : 64
 OS : Windows
 OS-release : 11
 Version : 10.0.26100
 machine : AMD64
 processor : AMD64 Family 26 Model 36 Stepping 0, AuthenticAMD
 byteorder : little
 LC_ALL : None
 LANG : en_US.UTF-8
 LOCALE : English_United States.1252
pandas : 2.3.3
 numpy : 2.2.4
 pytz : 2025.2
 dateutil : 2.9.0.post0
 pip : 24.3.1
 Cython : None
 sphinx : None
 IPython : None
 adbc-driver-postgresql: None
 adbc-driver-sqlite : None
 bs4 : 4.13.4
 blosc : None
 bottleneck : None
 dataframe-api-compat : None
 fastparquet : None
 fsspec : 2025.9.0
 html5lib : None
 hypothesis : None
 gcsfs : None
 jinja2 : 3.1.6
 lxml.etree : None
 matplotlib : None
 numba : None
 numexpr : None
 odfpy : None
 openpyxl : 3.1.5
 pandas_gbq : None
 psycopg2 : None
 pymysql : None
 pyarrow : 21.0.0
 pyreadstat : None
 pytest : None
 python-calamine : None
 pyxlsb : None
 s3fs : None
 scipy : 1.15.3
 sqlalchemy : None
 tables : None
 tabulate : None
 xarray : None
 xlrd : None
 xlsxwriter : None
 zstandard : None
 tzdata : 2025.2
 qtpy : None
 pyqt5 : None
Full Program Output:
Versions:
 python : 3.13.2
 pandas : 2.3.3
 pyarrow : 21.0.0
Wrote: C:\Users\damie\AppData\Local\Temp\pandas_period_filter_repro_min05yqt\mini.parquet
Arrow physical schema: n: int64
 per_m: extension<pandas.period>
 ts_m: timestamp[ns]
 -- schema metadata --
 pandas: '{"index_columns": [], "column_indexes": [], "columns": [{"name":' + 403
 Pandas metadata for columns:
 n → pandas_type: int64 metadata: None
 per_m → pandas_type: object metadata: None
 ts_m → pandas_type: datetime metadata: None
Numeric filter n>=7 (EXPECTED TO WORK):
 n per_m ts_m
 0 7 2024-07 2024-07-01
 1 8 2024-08 2024-08-01
 2 9 2024-09 2024-09-01
 3 10 2024-10 2024-10-01
 4 11 2024-11 2024-11-01
 5 12 2024-12 2024-12-01
Timestamp range filter July 2024 (EXPECTED TO WORK):
 n per_m ts_m
 0 7 2024-07 2024-07-01
 1 8 2024-08 2024-08-01
 2 9 2024-09 2024-09-01
 3 10 2024-10 2024-10-01
 4 11 2024-11 2024-11-01
 5 12 2024-12 2024-12-01
Period equality filter per_m>=Period('2024-07','M') raised (EXPECTED BUG/UNSUPPORTED):
 ArrowInvalid: Could not convert Period('2024-07', 'M') with type Period: did not recognize Python value type when inferring an Arrow data type
Period ordinal filter raised:
 ArrowNotImplementedError: Function 'greater_equal' has no kernel matching input types (extension<pandas.period>, int16)