- Notifications
You must be signed in to change notification settings - Fork 9
Closed
Labels
help wantedExtra attention is neededExtra attention is needed
Description
I noticed that when running detect on certain upper-case texts, they tend to be detected as Chinese (ZH), while applying .lower() results in correctly detecting them as English.
For example:
>>> from fast_langdetect import detect_language >>> detect_language('MY FRIEND IS A BIRD', low_memory=False) 'ZH' >>> detect_language('MY FRIEND IS A BIRD'.lower(), low_memory=False) 'EN' >>> detect_language('DANCING FOR FUN', low_memory=False) 'ZH' >>> detect_language('DANCING FOR FUN'.lower(), low_memory=False) 'EN'For other phrases language detection works as expected, so it's not a universal issue:
>>> detect_language('HELLO THERE MY FRIEND', low_memory=False) 'EN'Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is needed