Universal transformers (UTs) have been widely used for complex reasoning tasks such as ARC-AGI and Sudoku, yet the specific sources of their performance gains remain underexplored. In this work, we systematically analyze UTs variants and show that improvements on ARC-AGI primarily arise from the recurrent inductive bias and strong nonlinear components of Transformer, rather than from elaborate architectural designs. Motivated by this finding, we propose the Universal Reasoning Model (URM), which enhances the UT with short convolution and truncated backpropagation. Our approach substantially improves reasoning performance, achieving state-of-the-art 53.8% pass@1 on ARC-AGI 1 and 16.0% pass@1 on ARC-AGI 2.
The reported score of 87.4% in the TRM paper is obtained using an MLP model, which we believe it is completely different from the TRM architecture in ARC-AGI task. Therefore, for fair comparison, when reproducing the results, we unified the architectures for ARC-AGI 1, ARC-AGI 2, and Sudoku to be exactly the same, which means the architecture used to reproduce Sudoku is the same TRM architecture used to run ARC-AGI.
Reproducing the correct TRM Sudoku score:
git clone https://github.com/SamsungSAILMontreal/TinyRecursiveModels cd TinyRecursiveModels python dataset/build_sudoku_dataset.py --output-dir data/sudoku-extreme-1k-aug-1000 --subsample-size 1000 --num-aug 1000 run_name="pretrain_att_sudoku" python pretrain.py \ arch=trm \ data_paths="[data/sudoku-extreme-1k-aug-1000]" \ evaluators="[]" \ epochs=50000 eval_interval=5000 \ lr=1e-4 puzzle_emb_lr=1e-4 weight_decay=1.0 puzzle_emb_weight_decay=1.0 \ arch.L_layers=2 \ arch.H_cycles=3 arch.L_cycles=6 \ +run_name=${run_name} ema=TrueResults:
pip install -r requirements.txtwandb login YOUR_API_KEY# ARC-AGI-1 python -m data.build_arc_dataset \ --input-file-prefix kaggle/combined/arc-agi \ --output-dir data/arc1concept-aug-1000 \ --subsets training evaluation concept \ --test-set-name evaluation # ARC-AGI-2 python -m data.build_arc_dataset \ --input-file-prefix kaggle/combined/arc-agi \ --output-dir data/arc2concept-aug-1000 \ --subsets training2 evaluation2 concept \ --test-set-name evaluation2 # Sudoku python data/build_sudoku_dataset.py --output-dir data/sudoku-extreme-1k-aug-1000 --subsample-size 1000 --num-aug 1000bash scripts/URM_arcagi1.shbash scripts/URM_arcagi2.shbash scripts/URM_sudoku.sh@misc{gao2025universalreasoningmodel, title={Universal Reasoning Model}, author={Zitian Gao and Lynx Chen and Yihao Xiao and He Xing and Ran Tao and Haoming Luo and Joey Zhou and Bryan Dai}, year={2025}, eprint={2512.14693}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2512.14693}, }