Skip to content

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

License

Notifications You must be signed in to change notification settings

WeiboAI/VibeThinker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VibeThinker

🤗 Hugging Face   |   🤖 Model Scope   |   📄 Techical Report |   🏆 arxiv paper

Introduction

VibeThinker-1.5B is a 1.5B-parameter dense model that challenges the prevailing notion that small models inherently lack robust reasoning capabilities. Developed with an innovative post-training methodology centered on the "Spectrum-to-Signal Principle (SSP)", VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models Magistral Medium and Claude Opus 4, while achieving performance on par with open-source models like GPT OSS-20B Medium.

Most remarkably, VibeThinker-1.5B surpasses the initial DeepSeek R1 model—which is over 400 times larger—across three challenging mathematical benchmarks: AIME24 (80.3 vs. 79.8), AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7).

News

[2025.11.19] 🔥🔥VibeThinker-1.5B hit #1 on huggingface’s trending models today!

[2025.11.11] 🎉🎉🎉 VibeThinker-1.5B is now open source! The model weights and technical report can be accessed via the links at the top.

[2025.11.05] 📢📢📢 VibeThinker-1.5B will be open-sourced soon. Stay tuned!

Key Features

  • Ultra-Efficient: VibeThinker-1.5B redefines the efficiency frontier for reasoning models, achieving state-of-the-art performance in mathematical and coding tasks with only 1.5B parameters—100× to 600× smaller than giants like Kimi K2 (1000B+) and DeepSeek R1(671B).

  • Innovative Methodology: We propose an innovative post-training technique centered on the “Spectrum-to-Signal Principle (SSP)”. This framework systematically enhances output diversity by first employing a “Two-Stage Diversity-Exploring Distillation” in the SFT phase to generate a broad spectrum of solutions, followed by the “MaxEnt-Guided Policy Optimization (MGPO)” framework in the RL phase to amplify the correct signal.

  • Outstanding Capabilities: Despite a substantial parameter gap—competing with models 10 to hundreds of times larger—our 1.5B model demonstrates remarkable performance. On the AIME24, AIME25, and HMMT25 benchmarks, it surpasses open-source contenders like DeepSeek R1-0120 and GPT-OSS-20B-Medium, while achieving results comparable to MiniMax-M1.

  • Cost-Effective: While state-of-the-art models like DeepSeek R1 and MiniMax-M1 incur post-training costs of $294K and $535K respectively, our approach achieves this for just $7,800. This represents a reduction by a factor of “30 to 60”, fundamentally changing the economics of developing high-performance reasoning models.

Model Downloads

The model checkpoint is available at: Hugging Face and ModelScope.

Eval

If you wish to reproduce the results reported in our technical report, the evaluation program and usage guide have been prepared and are available at the following links.: Math Eval​ and Code Eval.

Sample responses from some benchmarks:here.

Usage Guidelines

We recommend using this model for competitive-style math and coding problems.

To facilitate quick verification by the community, we recommend the following parameter settings: temperature: 0.6 or 1.0, max token length: 40960, top_p: 0.95, top_k: -1.

Quick Start

Required: transformers>=4.54.0

Recommended for better inference performance: vLLM==0.10.1 or SGLang>=0.4.9.post6

Here is a code snippet to show you how to use the chat model with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig class VibeThinker: def __init__(self, model_path): self.model_path = model_path self.model = AutoModelForCausalLM.from_pretrained( self.model_path, low_cpu_mem_usage=True, torch_dtype="bfloat16", device_map="auto" ) self.tokenizer = AutoTokenizer.from_pretrained(self.model_path, trust_remote_code=True) def infer_text(self, prompt): messages = [ {"role": "user", "content": prompt} ] text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) model_inputs = self.tokenizer([text], return_tensors="pt").to(self.model.device) text = self.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = self.tokenizer([text], return_tensors="pt").to(self.model.device) generation_config = dict( max_new_tokens=40960, do_sample=True, temperature=0.6, # 0.6 or 1.0, you can set it according to your needs top_p=0.95, top_k=None # in vLLM or SGlang, please set top_k to -1, it means skip top_k for sampling ) generated_ids = self.model.generate( **model_inputs, generation_config=GenerationConfig(**generation_config) ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] return response if __name__ == '__main__': model = VibeThinker('Your model path') prompt = 'Your Prompt' print(model.infer_text(prompt))

License

This code repository is licensed under the MIT License.

Citations

If you use VibeThinker in your research or product, please cite:

@misc{xu2025tinymodelbiglogic, title={Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B}, author={Sen Xu and Yi Zhou and Wei Wang and Jixin Min and Zhibin Yin and Yingwei Dai and Shixi Liu and Lianyu Pang and Yirong Chen and Junlin Zhang}, year={2025}, eprint={2511.06221}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2511.06221}, } 

About

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published