Skip to content

Conversation

@Ynjxsjmh
Copy link

@Ynjxsjmh Ynjxsjmh commented Dec 1, 2024

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Currently, meta is used with when record_path is not None. The logic is to extract both the record_path and ethe meta. For example:

data = [ { "state": "Florida", "shortname": "FL", "info": {"governor": "Rick Scott", "year": 2014}, "counties": [ {"name": "Dade", "population": 12345}, {"name": "Broward", "population": 40000}, {"name": "Palm Beach", "population": 60000}, ], }, { "state": "Ohio", "shortname": "OH", "info": {"governor": "John Kasich", "year": 2015}, "counties": [ {"name": "Summit", "population": 1234}, {"name": "Cuyahoga", "population": 1337}, ], }, ] result = pd.json_normalize( data, "counties", ["state", "shortname", ["info", "governor"]] ) result name population state shortname info.governor 0 Dade 12345 Florida FL Rick Scott 1 Broward 40000 Florida FL Rick Scott 2 Palm Beach 60000 Florida FL Rick Scott 3 Summit 1234 Ohio OH John Kasich 4 Cuyahoga 1337 Ohio OH John Kasich 

In the above example, pd.json_normalize not only retrieves counties, but also retrieves state, shortname and info.governor.

When record_path is not given, meta is ignored, for example:

result = pd.json_normalize( data, meta=["state", "shortname", ["info", "governor"]] ) result state shortname counties info.governor info.year 0 Florida FL [{'name': 'Dade', 'population': 12345}, {'name... Rick Scott 2014 1 Ohio OH [{'name': 'Summit', 'population': 1234}, {'nam... John Kasich 2015 

This PR adds a feature when record_path is None or an empty list, only extracts the meta.

result = pd.json_normalize( data, meta=["state", "shortname", ["info", "governor"]] ) result shortname state info.governor 0 FL Florida Rick Scott 1 OH Ohio John Kasich 

The behavior can be summarized as:

  • record_path is None, meta is None: normalize all records.
  • record_path is not None, meta is None: normalize only record_path.
  • record_path is not None, meta is not None: normalize record_path and meta.
  • record_path is None, meta is not None: normalize only meta. [This PR]
@Ynjxsjmh
Copy link
Author

Ynjxsjmh commented Dec 2, 2024

I couldn't reproduce the dtype of df.columns unmatch error (Future infer strings (without pyarrow) and Future infer strings). I think my test is just the same with other test with record_path like test_nonetype_record_path and test_nested_meta_path_with_nested_record_path. I don't understand why my test gets these errors while the other doesn't.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR but is there a related open issue discussing this feature? We require that (and agreement from the core team) before proceeding.

@Ynjxsjmh
Copy link
Author

Ynjxsjmh commented Dec 3, 2024

@mroeschke I didn't know that rule before. I did a thorough search and found no related issues. Need I close this pr and post a new issue?

@mroeschke
Copy link
Member

Yes let's wait for feedback on the issue before proceeding with a PR. We can reopen if there's agreement from the core team to support this feature

@mroeschke mroeschke closed this Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants