Open Source as Infrastructure: The Distribution Logic Behind China's AI Ecosystem

The number that matters from HuggingFace’s three-part retrospective on the post-DeepSeek AI ecosystem is a derivative count: Qwen has over 113,000 fine-tuned or adapted models built on top of it, compared to Meta’s Llama at roughly 27,000. That gap represents a different category of ecosystem success, and understanding why it opened up tells you more about the state of open-source AI than any benchmark comparison.

Published in early February 2026 as a look back at the year since the DeepSeek Moment, the third installment in HuggingFace’s series examines where Chinese AI organizations are headed. The retrospective covers organizational strategy, ecosystem health, and what “AI+” means as a direction. The most instructive piece of the analysis is what it reveals about distribution strategy, specifically how Qwen and DeepSeek approached open source as a deliberate method for embedding themselves into as much downstream work as possible.

What the DeepSeek Moment Established

DeepSeek-R1, released January 22, 2025, matched OpenAI’s o1-1217 on reasoning benchmarks while being fully open-weight under an MIT license. The AIME 2024 score was 79.8% versus o1’s 79.2%. The Codeforces rating came in at 2029 versus 2061. The cost story was equally striking: DeepSeek-V3 required roughly 2.788 million H800 GPU hours for training, putting the compute cost somewhere around $5-6 million USD at 2024 H800 rental rates, compared to estimated GPT-4 training costs well north of $100 million.

That cost gap proved something consequential. Frontier-level performance was achievable at an order of magnitude less cost, and the resulting model could be released publicly without giving away a business-ending secret, because the secret was the research methodology, which DeepSeek published anyway. The benchmark story got the headlines; the strategic implication proved more durable: open-source labs could now compete on model quality in a way that had seemed structurally impossible a year earlier.

From Moment to Ecosystem

DeepSeek became the most-followed organization on HuggingFace globally, with 121,000+ followers and roughly 6,000 derivative models built on top of its work. Those numbers are substantial, but Qwen’s are a different order of magnitude entirely.

Alibaba’s Qwen family accumulated over 113,000 derivative models by mid-2025. HuggingFace counts more than 200,000 repositories tagging Qwen. Alibaba as an organization has nearly as many derivatives as Google and Meta combined, per the retrospective. The difference between Qwen and DeepSeek in terms of derivatives has less to do with model quality than with surface area. DeepSeek released a flagship model family with strong benchmarks. Qwen released a continuous family spanning sizes from 0.6B to 397B parameters, covering text, code, math, vision, and audio, updated frequently across both HuggingFace and ModelScope. Qwen positioned itself as the base layer that the most people build on, rather than as a product to be consumed.

MoE and the Architecture of Embeddability

The architectural shift that HuggingFace’s second installment in this series identified, Mixture of Experts becoming the default, feeds directly into this distribution strategy. DeepSeek-V3 and Qwen3.5 both use sparse MoE: 671B and 397B total parameters respectively, but only around 37B and 17B activated per token. You get large-model-quality output at a fraction of the inference cost of a dense model of equivalent total size.

This matters for derivatives because it changes who can run fine-tuned variants. A 671B dense model requires server infrastructure most practitioners don’t have. A 671B MoE that activates 37B parameters per token is within reach of multi-GPU workstations and modest cloud instances. When Qwen releases a 72B model with grouped-query attention and a 131K context window, the threshold for building on top of it drops significantly.

The six distilled variants of DeepSeek-R1 extend this logic further. The 32B distilled model outperforms OpenAI’s o1-mini on multiple benchmarks and runs on a single high-end workstation. More importantly, these distilled models used existing open-weight bases: Qwen2.5 and Llama 3 variants. DeepSeek’s reasoning capabilities flowed into the existing Qwen and Llama derivative ecosystems without requiring practitioners to change their tooling. A researcher who had built a Qwen2.5 fine-tune for a specific domain could apply the DeepSeek-R1 distillation methodology and extend it without switching architectures.

That cross-pollination is a direct consequence of releasing distilled models on top of the two most widely deployed open-weight families. The reasoning capability doesn’t sit in a silo; it propagates through whatever ecosystem the base model already occupies.

The “AI+” Framing

The HuggingFace retrospective frames the third stage of this ecosystem evolution as “AI+”: AI integration across sectors at scale. The blog’s authors are direct about the strategy: “Openly sharing artifacts from models to papers to deployment infrastructure maps to a strategy with the goal of large-scale deployment and integration.”

This is AI-as-infrastructure thinking. The competitive goal is to be embedded in enough downstream systems that the model family becomes load-bearing for the broader ecosystem. Once Qwen is the base for 113,000 other models, many of which are production deployments in enterprise and research contexts, Qwen’s architecture decisions become the ecosystem’s architecture decisions. Its tokenizer, its context length, its fine-tuning API surface, all of these propagate forward into dependent systems.

Western open-source labs have pursued a version of this with Llama, and Meta’s work deserves credit for establishing the open-weight model as a viable format. The 27,000 versus 113,000 derivative gap is a concrete signal that Alibaba’s approach has been more effective. Part of this comes down to volume of releases: Qwen has 433 models on HuggingFace compared to Meta’s significantly smaller catalog. Part of it is the ModelScope platform giving Chinese researchers a parallel distribution channel that keeps the ecosystem active. Part of it is the continuous family approach: rather than waiting for a flagship release, Qwen ships across sizes and modalities frequently, which means there’s always a current Qwen model close to whatever a researcher needs.

This is a fundamentally different product strategy than “release the best model and update annually.”

The Competitive Implication

The HuggingFace blog’s core observation is that open source has become the dominant approach for Chinese AI organizations, and the motivation is explicitly strategic. That framing matters because it explains the continuity of the effort. The value compounds directly with adoption, which creates incentives for sustained investment in the ecosystem rather than just periodic model releases.

For closed AI providers, the “closed model as moat” thesis continues to erode. DeepSeek-V3.2, released in December 2025 with 685B parameters and a new DeepSeek Sparse Attention mechanism, reportedly surpasses GPT-5 on IMO and IOI benchmarks. If that holds under independent evaluation, the primary competitive advantage of closed models has shifted to latency, reliability, and support infrastructure, not raw capability.

Building a great model and releasing it is necessary but not sufficient for the kind of ecosystem influence that Qwen has built. The ecosystems that compound are the ones that make it easy for thousands of practitioners to build on top of them, through documentation, family breadth, architectural stability, and presence on the platforms where practitioners work. Qwen’s 113,000 derivatives accumulated because Alibaba treated model releases as infrastructure deployment rather than product launches. The distinction matters more now than it did a year ago, when the field was still absorbing what the initial DeepSeek release meant.