Detect the script (writing system) of text based on ISO 15924.
- Unicode version: 15.0.0
- The codes were sourced from Wikipedia ISO_15924.
- Unicode ranges were extracted from Unicode Character Database.
Zinhcode is the Unicode script property value of characters that may be used with multiple scripts, and that inherit their script from a preceding base character. In some cases, we opted to integrate parts of the Zinh code (e.g. ARABIC FATHATAN..ARABIC HAMZA BELOW, ARABIC LETTER SUPERSCRIPT ALEF) into a different block.Zyyycode is the Unicode script for "Common" characters.
pip3 install GlotScript@git+https://github.com/cisnlp/GlotScriptfrom GlotScript import get_script_predictor sp = get_script_predictor()sp('これは日本人です') >> ('Hira', 0.625, {'details': {'Hira': 0.625, 'Hani': 0.375}, 'tie': False, 'interval': 0.25})sp('This is Latin')[:1] >> ('Latn', 1.0)sp('මේක සිංහල')[0] >> 'Sinh'sp('𝄞𝄫 𒊕𒀸') >> ('Xsux', 0.5, {'details': {'Xsux': 0.5, 'Zyyy': 0.5}, 'tie': True, 'interval': 0.0})If you use any part of this library in your research, please cite it using the following BibTex entry.
@misc{glotscript, author = {Kargaran, Amir Hossein and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich}, title = {GlotScript}, year = {2023}, publisher = {GitHub}, journal = {GitHub Repository}, howpublished = {\url{https://github.com/cisnlp/GlotScript}}, } Click to Exapand
- List of Unicode characters - Wikipedia
- Lightweight Plain-Text Editor for macOS - CotEditor
- The Cygwin Terminal – terminal emulator for Cygwin, MSYS, and WSL - mintty
- ISO_15924 Wikipedia
- Unicode Character Database (Blocks) - Unicode
- Unicode Character Database (Scripts) - Unicode
- A free, web-based font editor, focusing on font design hobbyists. - Glyphr-Studio-1
- Kotlin - JetBrains
- UNIX-like reverse engineering framework and command-line toolset - radare2
- FreeOrion Game
- DOMinator - Firefox
- SHSans-derived CJK font family - glow-sans
- Unicode Subset Bitfields - Microsoft
- Stops - FAIR NLLB FB
- Gradient Boosting on Decision Trees - catboost
- Blender
- Unicode Wikipedia