MULTYPO provides realistic, keyboard-based typographical noise across 12+ languages, enabling robust evaluation, stress testing, and synthetic data generation for NLP and LLM systems.
It was originally designed for multilingual robustness research, and is now packaged for public use.
- Multilingual support (English, German, French, Russian, Greek, Arabic, Hindi, Bengali, Tamil, Armenian, Georgian, Hebrew)
- Keyboard-aware typo modeling
replace: wrong nearby keyinsert: accidental double pressdelete: skipped keytranspose: swapped left/right-hand keys
- Horizontal + vertical keyboard neighbors with configurable weights
- Configurable typo distributions
- Built-in excluding sets (numbers, number words, etc.)
- Register custom keyboards and ignoring sets
pip install multypofrom multypo import generate_typos text = "This is an example sentence. And here is another one." noisy = generate_typos( text=text, language="english", typo_rate=0.25, ) print(noisy)Example output:
Thi is an ezample sentence. Amd here is anoter one. from multypo import MultiTypoGenerator gen = MultiTypoGenerator( language="english", use_excluding_set=True, typo_distribution={"delete":0.2, "insert":0.2, "replace":0.4, "transpose":0.2}, horizontal_vs_vertical=(2.0, 1.0), ) typoed = gen.insert_typos_in_text( "This is a test sentence. Second sentence here.", typo_rate=0.3 ) print(typoed)from multypo import register_keyboard_layout custom_keyboard = [ list("qwertyuiop"), list("asdfghjkl"), list("zxcvbnm"), ] register_keyboard_layout( lang_code="en-custom", language="english-custom", keyboard_rows=custom_keyboard, left_keys=list("qwertasdfgzxcvb"), right_keys=list("yuiophjklbnm"), ignoring_set={"million", "billion", "42"}, )Use like any other language:
generate_typos("This is custom.", language="english-custom", typo_rate=0.3)Set global defaults:
from multypo import set_default_typo_distribution set_default_typo_distribution({ "delete": 0.1, "insert": 0.2, "replace": 0.5, "transpose": 0.2, })Or override per call:
noisy = generate_typos( text="Example", language="english", typo_rate=0.3, typo_distribution={"delete":0.2, "insert":0.2, "replace":0.4, "transpose":0.2}, )from multypo import get_supported_languages print(get_supported_languages())Of course, you can always register new languages and their custom keyboard layouts!
MIT License.
@misc{liu2025evaluating, title={Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors}, author={Yihong Liu and Raoyuan Zhao and Lena Altinger and Hinrich Schütze and Michael A. Hedderich}, year={2025}, eprint={2510.09536}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.09536}, }