VibeThinker-3B The Small AI Model
VibeThinker-3B: The artificial intelligence world is used to a simple rule: bigger models usually win. More parameters generally mean more power, more knowledge, and better results. But a new release from a team inside Sina Weibo, the company behind China’s popular microblogging platform, is challenging that idea in a big way. Their model, called VibeThinker-3B, is tiny by industry standards, yet it is going head-to-head with some of the most advanced AI systems in the world and in several cases, winning.

This article breaks down what VibeThinker-3B is, how it performs, how it was built, and what its release could mean for the future of AI.
Also Read: Punjab Unveils Record Rs. 750 Billion Education Budget To Transform Schools & Universities
Quick Information Table
| Detail | Information |
| Model Name | VibeThinker-3B |
| Developer | Sina Weibo research team (9 researchers) |
| Parameters | 3 billion |
| Base Model | Alibaba’s Qwen2.5-Coder-3B |
| License | MIT (open-source) |
| AIME 2026 Score | 94.3 (97.1 with test-time scaling) |
| LiveCodeBench v6 Score | 80.2 Pass@1 |
| LeetCode Acceptance Rate | 96.1% (unseen contests) |
| Training Cost | Approx. $7,800 |
| Comparable Model Training Cost | Approx. $294,000 (DeepSeek R1) |
| Availability | Hugging Face, ModelScope |
| Predecessor | VibeThinker-1.5B (released November 2025) |
A Small Model With Surprisingly Big Results
What makes VibeThinker-3B stand out is not just that it performs well, but that it performs well while being remarkably small. With only 3 billion parameters, it is small enough to run on an everyday consumer laptop, unlike massive systems that need powerful data centers. Despite this tiny size, the model scored 94.3 on the AIME 2026 mathematics benchmark, a result that puts it in the same range as DeepSeek V3.2, a model with 671 billion parameters.
It even edged past Gemini 3 Pro’s score of 91.7 on the same test. When the team applied a special technique called Claim-Level Reliability Assessment, the score climbed even higher, to 97.1, showing that smart engineering can sometimes matter more than raw size.
Also Read: Redmi A7 Pro Arrives In Pakistan With 6000mAh Battery And 120Hz Display
Strong Numbers Across Multiple Tests
VibeThinker-3B did not just shine in one isolated benchmark it backed up its performance across a wide range of tests. It posted a score of 91.4 on AIME 2025, 89.3 on HMMT 2025, and 93.8 on BruMO 2025, all of which are respected mathematics competitions used to test reasoning ability. On IMO-AnswerBench, a benchmark inspired by International Mathematical Olympiad-style problems, it scored 76.4.
The model also showed strength in following instructions accurately, scoring 93.4 on IFEval. These numbers suggest that the model’s strength is not limited to a single narrow skill but spreads across different types of structured reasoning challenges, which is unusual for a model of this size.
Coding Skills That Rival Bigger Names
Coding ability is one of the toughest tests for any AI model, and this is where VibeThinker-3B truly turned heads. It achieved an 80.2 Pass@1 score on LiveCodeBench v6, a well-known coding benchmark. Even more impressive, it managed a 96.1% acceptance rate on brand-new LeetCode contest problems released between late April and late May 2026 meaning the questions could not have been part of its training data.
In direct first-attempt testing, the model solved 123 out of 128 LeetCode problems correctly. That performance reportedly placed it ahead of well-known names including GPT-5.2, Doubao Seed 2.0 Pro, Kimi K2.5, and Claude Opus 4.6 under the same testing conditions.
Also Read: Right Of Way (ROW) Provisions In Pakistan’s Telecom Bill 2026
Where the Model Falls Short
No model is perfect, and VibeThinker-3B is no exception. While it dominates in mathematics and coding, it struggles when it comes to general knowledge. On GPQA-Diamond, a benchmark that tests broad factual and scientific knowledge, it scored only 70.2, well behind Gemini 3 Pro’s 91.9 and Claude Opus 4.5’s 87.0. This gap is not surprising, since the model was built specifically to handle reasoning tasks rather than to store huge amounts of general world knowledge.
The research team openly acknowledges this limitation, explaining that their goal was never to replace large, broad-knowledge AI systems, but to show that smaller models can specialize effectively in specific areas.
Also Read: Punjab Education Foundation Announces Learning Material Assessment Test For Partner Schools
The Theory Behind the Model
The team behind VibeThinker-3B has proposed an idea they call the Parametric Compression-Coverage Hypothesis. In simple terms, this theory suggests that tasks with clear right-or-wrong answers, like solving a math problem or writing working code, can be packed into a smaller model far more efficiently than general knowledge, which requires storing huge amounts of loosely connected facts.
Because reasoning tasks can be verified and corrected during training, the model can be taught to reason well without needing billions of extra parameters dedicated to memorizing trivia or rare facts. This idea forms the foundation of why the team believes their compact model was able to compete with systems hundreds of times its size.
How the Model Was Trained
VibeThinker-3B did not start from nothing. It was built on top of Alibaba’s Qwen2.5-Coder-3B and then refined through a detailed four-stage training process. The first stage focused on supervised fine-tuning using math, coding, science reasoning, and instruction-following data, later shifting toward harder and longer reasoning problems while removing easier or shorter training examples.
The second stage introduced reinforcement learning through a method called MaxEnt-Guided Policy Optimization, using a single large 64,000-token context window instead of slowly expanding it. A separate stage encouraged shorter, more efficient math answers, while the final stage distilled the best reasoning patterns into one unified, polished model.
Also Read: Join Pakistan Navy 2026 Sailor Recruitment, Eligibility & Online Registration
Remarkably Low Training Cost
One of the most eye-catching details about VibeThinker-3B is how cheaply it was trained. According to the research team, the entire post-training process cost only about $7,800. To put that into perspective, that is a small fraction of the estimated $294,000 it reportedly took to train DeepSeek R1.
This dramatic cost difference highlights a growing trend in AI research: smart, efficient training methods may soon matter just as much as access to massive computing budgets. For smaller companies, universities, or independent developers without huge funding, this kind of approach could open doors that were previously closed due to the high cost of building competitive AI models.
Mixed Reactions From the Community
As exciting as the benchmark scores are, not everyone is fully convinced. Some users who tested the model on everyday coding tasks reported weaker results, especially when working with commonly used development tools outside of benchmark conditions. Others raised questions about why the team did not test the model on broader, real-world software engineering benchmarks instead of relying mostly on competitive math and coding tests.
The research team responded by stating that the training data went through strict checks to avoid overlap with benchmark questions, and they pointed to the fresh LeetCode contests as strong evidence against data leakage. Even so, the gap between lab benchmark scores and everyday practical performance remains a valid concern worth watching.
Also Read: پیٹرول، ڈیزل اور مٹی کے تیل کی قیمتوں میں نمایاں کمی، کرایوں میں بھی 15 فیصد کمی کا اعلان
Open-Source and Already Spreading Fast
True to its open and accessible approach, VibeThinker-3B was released under the MIT License, meaning developers are free to use, modify, and build upon it without heavy restrictions. Its model weights are publicly available on Hugging Face and ModelScope, two major platforms for sharing AI models. Interest in the model grew quickly: within just one day of release, developers had already created GGUF quantized versions, which are lighter formats that make the model easier to run on regular hardware.
The release also gained solid traction online, with the research paper earning 62 upvotes on Hugging Face’s daily papers page, the model repository collecting 130 likes, and its GitHub project reaching 685 stars.
Not Weibo’s First Surprise in AI
While Sina Weibo is best known as a major social media platform rather than an AI research powerhouse, this is not its first notable entry into advanced AI development. VibeThinker-3B is actually the company’s second significant open-source AI release within roughly seven months. Its predecessor, VibeThinker-1.5B, was released back in November 2025 and reportedly outperformed the original DeepSeek R1 model on several mathematics benchmarks.
This pattern suggests that Weibo’s research team is steadily building expertise and credibility in the reasoning-focused AI space, rather than this being a one-time lucky result.
Also REad: BISP Introduces Flexible Payment System For Women
A Future Built on Hybrid AI Systems
The team behind VibeThinker-3B is careful not to overstate their achievement. They are not claiming that this small model can replace massive, general-purpose AI systems. Instead, their vision points toward a hybrid future, where compact, highly efficient models like VibeThinker-3B handle specific reasoning-heavy tasks such as math and coding, while larger models continue to provide broad general knowledge and context.
This kind of teamwork between small and large models could make advanced reasoning tools more affordable and accessible, even on devices with limited computing power, such as laptops or smaller local servers, instead of requiring expensive cloud-based supercomputers.
Final Thoughts
VibeThinker-3B is a strong reminder that progress in AI does not always come from simply building bigger and bigger models. With smart training techniques, a clear focus on reasoning tasks, and a surprisingly small budget, a small team at Sina Weibo has built a model that can compete with and in some cases beat systems backed by some of the biggest names in technology.
Still, the real test will be how well it performs in everyday, real-world situations beyond benchmark scores. If it holds up, VibeThinker-3B could be an early sign of a more efficient, more accessible future for AI development.
Also Read: GSMA Gender Gap Report Highlights Pakistan’s Remarkable Progress In Digital Inclusion