Skip to content

cisnlp/multypo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MULTYPO

Multilingual Keyboard-Aware Typo Generator for NLP Robustness

PyPI version Python >=3.8


🚀 Overview

MULTYPO provides realistic, keyboard-based typographical noise across 12+ languages, enabling robust evaluation, stress testing, and synthetic data generation for NLP and LLM systems.

It was originally designed for multilingual robustness research, and is now packaged for public use.

🔥 Key Features

  • Multilingual support (English, German, French, Russian, Greek, Arabic, Hindi, Bengali, Tamil, Armenian, Georgian, Hebrew)
  • Keyboard-aware typo modeling
    • replace: wrong nearby key
    • insert: accidental double press
    • delete: skipped key
    • transpose: swapped left/right-hand keys
  • Horizontal + vertical keyboard neighbors with configurable weights
  • Configurable typo distributions
  • Built-in excluding sets (numbers, number words, etc.)
  • Register custom keyboards and ignoring sets

📦 Installation

pip install multypo

📝 Quick Start

Generate typos in English

from multypo import generate_typos text = "This is an example sentence. And here is another one." noisy = generate_typos( text=text, language="english", typo_rate=0.25, ) print(noisy)

Example output:

Thi is an ezample sentence. Amd here is anoter one. 

🛠️ Advanced Usage: MultiTypoGenerator

from multypo import MultiTypoGenerator gen = MultiTypoGenerator( language="english", use_excluding_set=True, typo_distribution={"delete":0.2, "insert":0.2, "replace":0.4, "transpose":0.2}, horizontal_vs_vertical=(2.0, 1.0), ) typoed = gen.insert_typos_in_text( "This is a test sentence. Second sentence here.", typo_rate=0.3 ) print(typoed)

🎹 Custom Keyboard Layouts

from multypo import register_keyboard_layout custom_keyboard = [ list("qwertyuiop"), list("asdfghjkl"), list("zxcvbnm"), ] register_keyboard_layout( lang_code="en-custom", language="english-custom", keyboard_rows=custom_keyboard, left_keys=list("qwertasdfgzxcvb"), right_keys=list("yuiophjklbnm"), ignoring_set={"million", "billion", "42"}, )

Use like any other language:

generate_typos("This is custom.", language="english-custom", typo_rate=0.3)

📏 Adjust Typo Type Probabilities

Set global defaults:

from multypo import set_default_typo_distribution set_default_typo_distribution({ "delete": 0.1, "insert": 0.2, "replace": 0.5, "transpose": 0.2, })

Or override per call:

noisy = generate_typos( text="Example", language="english", typo_rate=0.3, typo_distribution={"delete":0.2, "insert":0.2, "replace":0.4, "transpose":0.2}, )

🌍 Supported Languages

from multypo import get_supported_languages print(get_supported_languages())

Of course, you can always register new languages and their custom keyboard layouts!

📜 License

MIT License.

Citation

@misc{liu2025evaluating, title={Evaluating Robustness of Large Language Models Against Multilingual Typographical Errors}, author={Yihong Liu and Raoyuan Zhao and Lena Altinger and Hinrich Schütze and Michael A. Hedderich}, year={2025}, eprint={2510.09536}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.09536}, } 

About

A Multilingual Keyboard Layout-Based Typo Generator

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages