We just released Kren-M™, a production-ready bilingual foundation model for Khasi and English.
No outside funding rounds.
No imported talent.
No compromise on local understanding.
We did it internally at MWire Labs (the AI research division of MWire, a Shillong-based firm that has delivered IT systems and solutions serving 8+ million citizens since 2017).
Because when it comes to Northeast languages, the deepest expertise isn’t in Bangalore or California — it’s right here in the hills.
Why Local Roots Beat Everything Else
Big labs throw hundreds at Indic models.
We threw eight years of on-the-ground experience.
We know Khasi isn’t just tokens, it’s morphology, dialect variation, cultural nuance that only someone who grew up hearing it can capture.
That’s why our tokenizer cuts Khasi token count by 36 %.
That’s why the model never auto-translates unless asked.
That’s why it sounds like home.
What We Shipped
Kren-M™ (Gemma-2-2B base, 2.6B params):
- Custom tokenizer with 2,135 Khasi/Garo tokens
- 5.43 M hand-cleaned Khasi sentences (proprietary — our moat)
- Fully task-aware SFT — natural bilingual behaviour
- Runs offline on 6 GB VRAM
Live: https://huggingface.co/MWirelabs/Kren-M
White paper: https://mwirelabs.com/models/kren-m
Preprint (DOI): https://www.researchsquare.com/article/rs-8144118/v1
We also open-sourced the one of the largest public Assamese & Mizo corpora + the first Garo corpus ever.
This Is Just the Beginning
Early 2026: Expect Kren-NE, Gemma-2-9B multilingual covering Khasi, Garo, Mizo, Assamese, Meitei, Nagamese, Kokborok and more.
All built the same way: local team, local data, local control.
The future of Northeast AI won’t be built in glass towers far away.
It will be built here, by us, for us.
Top comments (0)