@@ -37,12 +37,19 @@ So that a ``pandas.DataFrame`` can be faithfully reconstructed, we store a
3737
3838.. code-block :: text
3939
40- {'index_columns': ['__index_level_0__', '__index_level_1__' , ...],
40+ {'index_columns': [<descr0>, <descr1> , ...],
4141 'column_indexes': [<ci0>, <ci1>, ..., <ciN>],
4242 'columns': [<c0>, <c1>, ...],
43- 'pandas_version': $VERSION}
43+ 'pandas_version': $VERSION,
44+ 'creator': {
45+ 'library': $LIBRARY,
46+ 'version': $LIBRARY_VERSION
47+ }}
4448
45- Here, ``<c0> ``/``<ci0> `` and so forth are dictionaries containing the metadata
49+ The "descriptor" values ``<descr0> `` in the ``'index_columns' `` field are
50+ strings (referring to a column) or dictionaries with values as described below.
51+
52+ The ``<c0> ``/``<ci0> `` and so forth are dictionaries containing the metadata
4653for each column, *including the index columns *. This has JSON form:
4754
4855.. code-block :: text
@@ -53,26 +60,37 @@ for each column, *including the index columns*. This has JSON form:
5360 'numpy_type': numpy_type,
5461 'metadata': metadata}
5562
56- .. note ::
63+ See below for the detailed specification for these.
64+
65+ Index Metadata Descriptors
66+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
67+
68+ ``RangeIndex `` can be stored as metadata only, not requiring serialization. The
69+ descriptor format for these as is follows:
5770
58- Every index column is stored with a name matching the pattern
59- ``__index_level_\d+__ `` and its corresponding column information is can be
60- found with the following code snippet.
71+ .. code-block :: python
6172
62- Following this naming convention isn't strictly necessary, but strongly
63- suggested for compatibility with Arrow.
73+ index = pd.RangeIndex(0 , 10 , 2 )
74+ {' kind' : ' range' ,
75+ ' name' : index.name,
76+ ' start' : index.start,
77+ ' stop' : index.stop,
78+ ' step' : index.step}
6479
65- Here's an example of how the index metadata is structured in pyarrow:
80+ Other index types must be serialized as data columns along with the other
81+ DataFrame columns. The metadata for these is a string indicating the name of
82+ the field in the data columns, for example ``'__index_level_0__' ``.
6683
67- .. code-block :: python
84+ If an index has a non-None ``name `` attribute, and there is no other column
85+ with a name matching that value, then the ``index.name `` value can be used as
86+ the descriptor. Otherwise (for unnamed indexes and ones with names colliding
87+ with other column names) a disambiguating name with pattern matching
88+ ``__index_level_\d+__ `` should be used. In cases of named indexes as data
89+ columns, ``name `` attribute is always stored in the column descriptors as
90+ above.
6891
69- # assuming there's at least 3 levels in the index
70- index_columns = metadata[' index_columns' ] # noqa: F821
71- columns = metadata[' columns' ] # noqa: F821
72- ith_index = 2
73- assert index_columns[ith_index] == ' __index_level_2__'
74- ith_index_info = columns[- len (index_columns):][ith_index]
75- ith_index_level_name = ith_index_info[' name' ]
92+ Column Metadata
93+ ~~~~~~~~~~~~~~~
7694
7795``pandas_type `` is the logical type of the column, and is one of:
7896
@@ -161,4 +179,8 @@ As an example of fully-formed metadata:
161179 'numpy_type': 'int64',
162180 'metadata': None}
163181 ],
164- 'pandas_version': '0.20.0'}
182+ 'pandas_version': '0.20.0',
183+ 'creator': {
184+ 'library': 'pyarrow',
185+ 'version': '0.13.0'
186+ }}
0 commit comments