-  
-   Notifications  You must be signed in to change notification settings 
- Fork 19.2k
Description
Pandas version checks
-  I have checked that this issue has not already been reported. 
-  I have confirmed this bug exists on the latest version of pandas. 
-  I have confirmed this bug exists on the main branch of pandas. 
Reproducible Example
from pathlib import Path import numpy as np import pandas as pd path = Path("file.csv") # Empty values in MultiIndex for both index & columns df = pd.DataFrame( np.arange(6).reshape((2, 3)), columns=pd.MultiIndex.from_tuples((("a", ""), ("b", ""), ("b", "b2"))), index=pd.MultiIndex.from_tuples((("i1", ""), ("i2", ""))), ) df.to_csv(path) # CSV has empty cells in multiindex; just like the df: # , ,a ,b ,b # , , , ,b2 # i1 , ,0 ,1 ,2 # i2 , ,3 ,4 ,5 df_read = pd.read_csv(path, header=[0, 1], index_col=[0, 1]) # Reading from CSV fills multiindex xwith "Unnamed: ..." # , ,a ,b ,b # , ,Unnamed: 2_level_1 ,Unnamed: 3_level_1 ,b2 # i1 , ,0 ,1 ,2 # i2 , ,3 ,4 ,5Issue Description
Hi,
Closely related to #51252 #51824 #50953. I opened a new issue as it's not exactly the same example, but feel free to close if it's not relevant. Not even sure it's a bug or not, sorry if i'm mistaking.
When reading a dataframe from CSV, if the columns have empty values, it gets filled with "Unnamed: ..." pattern. This makes sense for regular 1D columns, as they need to be unique. When it comes to columns using multiindex, the uniqueness can be ensured even with empty values, if the other levels are different. Current implementation uses "Unnamed: ..." in both cases.
When reading index with Multiindex, empty values don't get replaced. The index read from CSV in the minimal example is identical to the initial index.
Expected Behavior
-  I would expect consistency in the handling of MultiIndex by index & columns. Maybe it's not feasible due to the uniqueness required by columns & not by index 
-  Empty values should be left unfilled if the index remains unique. This would require an identification of duplicates in the multiindex before filling the values, thus losing consistency within the handling of MultiIndex for columns 
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2e
 python : 3.12.2.final.0
 python-bits : 64
 OS : Windows
 OS-release : 11
 Version : 10.0.22631
 machine : AMD64
 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
 byteorder : little
 LC_ALL : None
 LANG : en_US.UTF-8
 LOCALE : fr_FR.cp1252
pandas : 2.2.2
 numpy : 1.26.4
 pytz : 2024.1
 dateutil : 2.9.0.post0
 setuptools : None
 pip : 24.2
 Cython : None
 pytest : None
 hypothesis : None
 sphinx : None
 blosc : None
 feather : None
 xlsxwriter : None
 lxml.etree : None
 html5lib : None
 pymysql : None
 psycopg2 : None
 jinja2 : None
 IPython : None
 pandas_datareader : None
 adbc-driver-postgresql: None
 adbc-driver-sqlite : None
 bs4 : None
 bottleneck : None
 dataframe-api-compat : None
 fastparquet : None
 fsspec : None
 gcsfs : None
 matplotlib : None
 numba : None
 numexpr : None
 odfpy : None
 openpyxl : 3.1.2
 pandas_gbq : None
 pyarrow : None
 pyreadstat : None
 python-calamine : None
 pyxlsb : None
 s3fs : None
 scipy : None
 sqlalchemy : None
 tables : None
 tabulate : None
 xarray : None
 xlrd : None
 zstandard : None
 tzdata : 2024.1
 qtpy : None
 pyqt5 : None