Here are 6 public repositories matching this topic...
💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
Updated Nov 19, 2025 Python 🕸 GlotCC Dataset and Pipline -- NeurIPS 2024
Updated Apr 6, 2025 Jupyter Notebook 🕷️ The pipeline for the OSCAR/GlotCC corpus
Updated Oct 23, 2024 Rust The original tooling for the GlotCC/OSCAR corpus rewritten in Rust
Updated Oct 23, 2024 Rust Readers/Writers for GlotCC/OSCAR corpus
Updated Oct 23, 2024 Rust Rust implementation of the langid library for language identification. Easily classify text with a simple API. 🌍🔍
Updated Dec 25, 2025 Rust Improve this page Add a description, image, and links to the glotcc topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo To associate your repository with the glotcc topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.