ENH: Support pd.json_normalize for normalizing only meta fields #60460

Ynjxsjmh · 2024-12-01T11:39:46Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Currently, meta is used with when record_path is not None. The logic is to extract both the record_path and ethe meta. For example:

data = [ { "state": "Florida", "shortname": "FL", "info": {"governor": "Rick Scott", "year": 2014}, "counties": [ {"name": "Dade", "population": 12345}, {"name": "Broward", "population": 40000}, {"name": "Palm Beach", "population": 60000}, ], }, { "state": "Ohio", "shortname": "OH", "info": {"governor": "John Kasich", "year": 2015}, "counties": [ {"name": "Summit", "population": 1234}, {"name": "Cuyahoga", "population": 1337}, ], }, ] result = pd.json_normalize( data, "counties", ["state", "shortname", ["info", "governor"]] ) result name population state shortname info.governor 0 Dade 12345 Florida FL Rick Scott 1 Broward 40000 Florida FL Rick Scott 2 Palm Beach 60000 Florida FL Rick Scott 3 Summit 1234 Ohio OH John Kasich 4 Cuyahoga 1337 Ohio OH John Kasich

In the above example, pd.json_normalize not only retrieves counties, but also retrieves state, shortname and info.governor.

When record_path is not given, meta is ignored, for example:

result = pd.json_normalize( data, meta=["state", "shortname", ["info", "governor"]] ) result state shortname counties info.governor info.year 0 Florida FL [{'name': 'Dade', 'population': 12345}, {'name... Rick Scott 2014 1 Ohio OH [{'name': 'Summit', 'population': 1234}, {'nam... John Kasich 2015

This PR adds a feature when record_path is None or an empty list, only extracts the meta.

result = pd.json_normalize( data, meta=["state", "shortname", ["info", "governor"]] ) result shortname state info.governor 0 FL Florida Rick Scott 1 OH Ohio John Kasich

The behavior can be summarized as:

record_path is None, meta is None: normalize all records.
record_path is not None, meta is None: normalize only record_path.
record_path is not None, meta is not None: normalize record_path and meta.
record_path is None, meta is not None: normalize only meta. [This PR]

Ynjxsjmh · 2024-12-02T10:44:17Z

I couldn't reproduce the dtype of df.columns unmatch error (Future infer strings (without pyarrow) and Future infer strings). I think my test is just the same with other test with record_path like test_nonetype_record_path and test_nested_meta_path_with_nested_record_path. I don't understand why my test gets these errors while the other doesn't.

mroeschke

Thanks for the PR but is there a related open issue discussing this feature? We require that (and agreement from the core team) before proceeding.

Ynjxsjmh · 2024-12-03T06:16:00Z

@mroeschke I didn't know that rule before. I did a thorough search and found no related issues. Need I close this pr and post a new issue?

mroeschke · 2024-12-03T18:33:48Z

Yes let's wait for feedback on the issue before proceeding with a PR. We can reopen if there's agreement from the core team to support this feature

Ynjxsjmh added 3 commits December 1, 2024 18:28

ENH: Support pd.json_normalize for normalizing only meta fields

79b68a9

Update unit tests

7b56e27

Update release notes

a8e9882

mroeschke requested changes Dec 2, 2024

View reviewed changes

Ynjxsjmh mentioned this pull request Dec 3, 2024

ENH: Support pd.json_normalize for normalizing only meta fields #60479

Open

3 tasks

mroeschke closed this Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Support pd.json_normalize for normalizing only meta fields #60460

ENH: Support pd.json_normalize for normalizing only meta fields #60460

Uh oh!

Ynjxsjmh commented Dec 1, 2024 •

edited

Loading

Ynjxsjmh commented Dec 2, 2024

mroeschke left a comment

Ynjxsjmh commented Dec 3, 2024

mroeschke commented Dec 3, 2024

Labels

2 participants

Uh oh!

ENH: Support pd.json_normalize for normalizing only meta fields #60460

ENH: Support pd.json_normalize for normalizing only meta fields #60460

Uh oh!

Conversation

Ynjxsjmh commented Dec 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ynjxsjmh commented Dec 2, 2024

mroeschke left a comment

Choose a reason for hiding this comment

Ynjxsjmh commented Dec 3, 2024

mroeschke commented Dec 3, 2024

Labels

2 participants

Ynjxsjmh commented Dec 1, 2024 •

edited

Loading