Qwen's 113,000 Derivatives Are a Distribution Moat, Not a Benchmark Win
Source: huggingface
In January 2025, DeepSeek-R1 landed with enough force to move markets. The model matched OpenAI’s o1 on reasoning benchmarks, cost a reported fraction of what comparable Western models cost to train, and was released under an open license that let anyone download and run it. For a few weeks, it was the only thing the AI industry wanted to talk about.
A year later, Hugging Face published a three-part retrospective examining what actually happened to China’s open-source AI ecosystem since that moment. The conclusion in part three, originally published February 3, 2026, is worth sitting with: open source has become the institutionalized strategy for major Chinese AI organizations, and the evidence for that is less about which model dominates headlines and more about which models everyone else is building on.
The headline statistic is this. By mid-2025, Qwen, Alibaba’s model family, had accumulated over 113,000 derivative models on Hugging Face. Meta’s Llama had around 27,000. DeepSeek had 6,000. Alibaba’s derivative count was approaching what Google and Meta had combined. Qwen became the most-reused foundation model on the platform.
That number deserves unpacking, because it tells a different story than benchmark tables.
What Derivative Count Actually Measures
When a model becomes widely fine-tuned, quantized, distilled, or otherwise forked, that is not just a popularity metric. It is a signal that developers trust the model as a foundation for production work. Fine-tuning requires investing engineering time and compute. Reaching for a base model and building on top of it is a bet that the model’s underlying representations are solid enough to be worth improving rather than replacing.
Llama’s 27,000 derivatives reflect Meta releasing a capable, permissively licensed model family that the Western developer community adopted as a default starting point for the last two years. DeepSeek’s 6,000 derivatives reflect a model that made headlines but whose architecture, while technically impressive, hasn’t yet been adopted as a foundational building block at the same scale.
Qwen’s 113,000 is something else. It means a Chinese-origin model family has displaced Llama as the primary surface that the global open-source community fine-tunes against. For anyone who was watching the AI ecosystem in 2023, when Meta’s Llama 2 was the unambiguous king of open-source fine-tuning, this shift in a single year is striking.
The Architecture Behind the Adoption
Qwen’s current generation is built around the same architectural decisions that DeepSeek’s success helped popularize: Mixture of Experts. The Qwen3.5-397B-A17B, the most downloaded base model in the family with 1.59 million downloads, is a 403B parameter MoE model that activates only 17B parameters per forward pass. That is the MoE tradeoff in a single line: massive parameter count for expressiveness, selective activation for inference efficiency.
The smaller models follow the same logic. Qwen3.5-0.8B runs at 596,000 downloads; Qwen3.5-2B at 383,000. These are edge deployment numbers. Developers are pulling these for on-device inference, embedded systems, and latency-sensitive applications where a 70B dense model is simply not practical.
DeepSeek’s architecture story is technically comparable but different in strategy. DeepSeek-R1 used reinforcement learning without supervised fine-tuning as a cold-start, which was the genuinely novel contribution. It trained reasoning capabilities to emerge from RL rather than being explicitly taught through human-labeled chain-of-thought data. That R1-Zero approach, where you skip the SFT warmup entirely and let Group Relative Policy Optimization do the work, generated significant research interest and 92,000 GitHub stars. DeepSeek is also the most followed organization on Hugging Face.
But research interest and follower counts are not the same as being the model everyone else fine-tunes. DeepSeek’s distilled variants used Qwen and Llama as their base architectures, which is itself a statement about where fine-tunable foundations actually live.
Open Source as Strategy, Not Principle
The Hugging Face retrospective is careful about this distinction, and it matters. The Chinese AI organizations that have dominated Hugging Face’s trending papers over the past year, including ByteDance, DeepSeek, Tencent, and Alibaba’s Qwen team, did not embrace open source because of ideological commitment to software freedom. They embraced it because it works.
The logic is straightforward. If you release your model weights openly, you get a global research community that red-teams your model, identifies weaknesses, proposes improvements, and builds derivative applications that generate real-world signal about where the model succeeds and fails. You also get adoption. When 113,000 fine-tuning projects use your model as a base, you accumulate distribution that is nearly impossible to dislodge without a compelling reason for every one of those projects to switch.
This is what the article means by pointing toward AI+, the integration of AI foundations into applications across domains and industries. The endgame for a foundation model is not being ranked first on some benchmark; it is being embedded in so many downstream applications and workflows that it becomes the default assumption. Qwen’s derivative count suggests that for a significant slice of the global developer community, Qwen has become that assumption.
This does not mean the Western ecosystem has lost relevance. Llama’s 27,000 derivatives represent a massive installed base, and Meta has continued releasing capable models. The difference is that Llama’s dominance used to be so total that it barely needed to be questioned. That is no longer true.
The Small Model Proliferation
One thing the benchmark conversation consistently underweights is the small model competition. The download numbers for Qwen3.5-0.8B (596k) and Qwen3.5-2B (383k) reflect a use case that is genuinely different from cloud API consumption. Developers running inference at the edge, on mobile hardware, in browser contexts via WebAssembly, and in embedded systems cannot just reach for a 70B model. They need something that fits in 2-4GB of RAM and runs acceptably on CPU or consumer GPU.
In that segment, Qwen has established a clear position. The smallest capable models in the Qwen family are competitive with models twice their size, and they come from an organization that releases updates frequently and maintains a coherent family structure across sizes. That coherence matters for developers: if you fine-tune on Qwen2.5-1.5B and then want to upgrade to Qwen3-2B, the tokenizer and general architectural patterns are compatible enough that the transition is not a major lift.
The MoE trend noted in the second blog in the series plays into this as well. A 120B MoE model that activates 10B parameters at inference time effectively delivers the output quality of a much larger dense model at the inference cost of a smaller one. For deployment at scale, where inference cost directly maps to money, that tradeoff is not academic.
What the Follower Gap Between DeepSeek and Qwen Tells Us
DeepSeek is the most followed organization on Hugging Face. Qwen is fourth. That ordering makes sense because DeepSeek generated a cultural moment, a genuine shock to assumptions about what training efficiency was possible and what the gap between state-funded Western labs and smaller Chinese organizations actually was.
But Qwen has the derivatives. These two facts coexist because followership and actual usage are measuring different things. DeepSeek is followed by people who are interested in what comes next from a research standpoint. Qwen is the model those same people are often building on top of today.
For anyone building AI-integrated software, the Qwen numbers suggest that this bifurcation is worth taking seriously. The model you should be watching for research breakthroughs may not be the same as the model you should be building your fine-tuning pipeline around.
The Broader Shift
Reading the Hugging Face retrospective series across all three parts, the picture that emerges is of a global open-source ecosystem that has genuinely diversified. In 2023, open-source AI was essentially a story about what Meta was willing to release and what the community built on top of it. By early 2026, the most influential papers on Hugging Face are predominantly from Chinese organizations; the most-reused foundation model is from Alibaba; and the model that reshaped the entire conversation about training efficiency is from a Hangzhou company that most Western developers had not heard of a year earlier.
This is not a shift that makes Western research less valuable or Llama less useful. It is a shift that makes the ecosystem richer and more competitive. When the baseline expectation for an open-source model is set by DeepSeek-R1 and Qwen, Meta and Google have to respond with better releases. That dynamic has already started playing out, and the derivative counts over the next year will tell us whether the response was enough to reclaim the foundational position Llama once held without question.