Skip to content

Conversation

@maxbelanger
Copy link
Contributor

@maxbelanger maxbelanger commented Dec 12, 2017

Introduces unicodedata.is_normalized, which can check whether a unistr is in a given normal form.

This makes use of the internal helper (also called is_normalized) that can "quick check" normalization, but falls back on creating a normalized copy and comparing when necessary.

https://bugs.python.org/issue32285

@maxbelanger
Copy link
Contributor Author

This was tested locally using ./python.exe -E -Wd -m test -u urlfetch -v test_normalization on macOS 10.13.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new function, but it should be documented in Doc/library/unicodedata.rst.

You may also add it to Doc/whatsnew/3.7.rst, in a "unicodedata" section of Improved Modules.

@vstinner
Copy link
Member

Please add also a NEWS entry for the Changelog using the "blurb" tool:
https://devguide.python.org/committing/#what-s-new-and-news-entries

@maxbelanger
Copy link
Contributor Author

@vstinner any other changes you'd like to see here? Just made a tiny signature change to ensure consistency with the rest of the module, otherwise I think this is good to go.

@maxbelanger
Copy link
Contributor Author

@vstinner should I rebase this patch for 3.8?

self.assertTrue(is_normalized("NFC", c2))
self.assertTrue(is_normalized("NFD", c3))
self.assertTrue(is_normalized("NFKC", c4))
self.assertTrue(is_normalized("NFKD", c5))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some negative cases, too. Make sure the MAYBE case is being exercised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increased coverage + confirmed that this is exercising the MAYBE path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add also tests when it returns False. If the function always returns True, the test still pass ;-)


PyObject *result;
int nfc = 0;
int k = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be bool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant to conform to the existing implementation of is_normalized, which takes in ints. Could change is_normalized, but I preferred to avoid making changes outside the scope of my own.

@benjaminp benjaminp merged commit 2810dd7 into python:master Nov 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants