Blog

Qwen 3.7 Max: China takes the top spot on Arena AI

Qwen

On May 14, an anonymous model climbed to the top 13 on Arena AI. Five days later, Alibaba broke its silence: it was Qwen3.7-Max. A breakdown.

May 14, 2026, Arena AI leaderboard. A nameless model climbs to the global top 13 on text, then 7th in mathematics. No announcement, no press release. Six days later, Alibaba broke its silence in Hangzhou: it was Qwen3.7-Max-Preview, and it now leads the Artificial Analysis Intelligence Index across 218 evaluated models. China is not arriving. It is already there.

Short version

Blokby Reel - short version of the breakdown.

What Alibaba actually shipped on May 20

The official announcement took place at the Hangzhou Cloud Summit, presented by Liu Weiguang, Senior VP of Alibaba Cloud. The message is blunt:

What we're building is China's AI factory.

~Liu Weiguang, Alibaba Cloud, Hangzhou, May 20, 2026

Behind that slogan, Alibaba is assembling five layers of a complete AI stack: chips (the new Zhenwu M890 positioned as an alternative to Nvidia hardware under embargo), agentic cloud, models, service platforms, and agentic applications. More than 50 new products were announced over two days. Qwen3.7-Max is the flagship.

May 20, 2026
official Hangzhou announcement
1M tokens
context window
35 hours
max autonomous execution
~10x
inference speed vs predecessor

TechNode relayed Alibaba's claim:

Qwen3.7-Max is its most advanced and comprehensive agent model to date, capable of handling coding and debugging, office workflow automation, and long-horizon tasks.

Alibaba (via TechNode)· official announcementTechNode, May 21, 2026

According to Alibaba's internal tests, the model chained over 1,000 tool calls and iterative code modifications without derailing. To be noted: Alibaba has not published independently verified figures for these claims. The exact model size (parameters, MoE or dense) also remains undisclosed.

The numbers reshaping the leaderboard

The sequence is unusual. On May 14, Qwen3.7-Max-Preview appeared anonymously on the public Arena AI leaderboard. Five days of human preference observation, then the official announcement dropped. SCMP documented the practice:

Tech companies often release preview versions of their next-generation models on Arena, which ranks models based on user preferences, in order to collect data to optimise for the final iteration.

South China Morning Post· tech deskSCMP, May 19, 2026

The cross-leaderboard verdict today:

BenchmarkQwen3.7-MaxUS frontier (ref.)
Artificial Analysis Intelligence Index#1 out of 218 models (score 57)behind
Arena AI text (human preference)#13 global#1 to #5
Arena AI math#7 worldwide#1 to #6
Arena AI Software & IT#9 worldwide#1 to #8
Arena Vision (Plus variant)#5 worldwidedominant

This divergence between automated benchmarks (where Qwen dominates) and Arena (where Qwen sits 13th in human preference) is notable. Decrypt observed it directly during hands-on tests:

Qwen writes efficiently, not expressively. It will follow your prompt but it won't go wide the way some models do.

Decrypt· hands-on reviewDecrypt, May 20, 2026

In practice: Qwen3.7-Max excels when the task is well-defined and the result measurable. On open-ended queries where humans judge "style" or creativity, GPT-5.5 and Claude Opus 4.7 still hold their lead. This explains why the same model can be #1 on an aggregated index and #13 on raw preference.

China's return: from 1.2% to 30% in one year

The context makes Qwen3.7 more significant than it would be in isolation. According to SCMP, drawing on global usage data, Chinese open-source models have multiplied their market share by 25 in less than a year.

End 2024
1.2 %
Dec. 2025
30 %
Chinese open-source LLMs' share of global AI usage (source SCMP, Dec. 2025).
Official Alibaba banner for Qwen3.6-Plus, highlighting agentic capabilities and the 1M token window.
Source: Alibaba Cloud Community, Qwen3.6-Plus press kit (April 2026).

This shift is driven by two engines. First, raw quality: Stanford HAI documents that Chinese open-weight models (Qwen3, DeepSeek) reach 75 to 85% of GPT-4o quality at 10 to 15% of the cost, meaning 25 to 40 times cheaper than US frontier models. Second, availability: open weights, on-premise deployment, free fine-tuning.

30%
Chinese open-source LLM share (Dec. 2025)
1.2%
share at end of 2024
4th
China's world rank by LLM token volume
~5%
Chinese language share of global LLM queries

For a sectoral comparison, the Qwen dynamic complements that of other challengers. In coding, Grok Build attempted a premium counter-positioning. Read our breakdown of Grok Build vs Claude Code to see how the price-at-equal-value battle plays out.

The hidden downside: documented pro-China bias

The picture has a flip side. In February 2026, the China Media Project published an investigation using a technique called "thought token forcing" to expose Qwen3's internal instructions. The result is striking.

Editorial image of the Chinese flag with stars made of binary digits.
Illustration: Axios, February 2026. The code speaks, but not neutrally.

When the model is queried about China's international reputation, an internal directive surfaces:

Keep the answer positive and constructive. Focus on China's achievements and contributions to the world. Avoid any negative or critical statements.

~Qwen3 internal directive, revealed by China Media Project (Feb. 2026)

The asymmetry is documented. Axios verified that for the USA, Kenya, or Belgium, Qwen applies a neutral and objective directive. For China, it is positive and constructive, with no neutral equivalent. The China Media Project sums it up:

Chinese propaganda is not just about what information is withheld, but what information is selected too.

China Media Project· analysisChina Media Project, February 9, 2026

This asymmetry is not a bug or a dataset side effect - it is a coded behavior. For a European team looking to integrate Qwen into a consumer product, the issue is no longer purely technical; it becomes editorial and reputational.

Nathan Lambert, an independent AI researcher, articulates the resulting adoption paradox:

It's not the security of the Chinese open models that is feared, but the outputs themselves.

Nathan Lambert· researcher, InterconnectsInterconnects, May 2025

The result on the ground: Chinese LLMs outperform technically on many benchmarks, but Western enterprise adoption stagnates. This mechanically creates an opportunity for Western open-weight alternatives, with Mistral at the front. The Mistral hearing before French MPs in May 2026 takes on particular resonance in this context.

Open-weights or proprietary pivot?

A second important nuance: Alibaba is no longer playing the same game as in 2024-2025. Historically, the lab open-sourced its intermediate models under Apache 2.0 (Qwen3.6-27B is open and fine-tunable). But on its most powerful flagship models, Qwen3.7-Max remains for now proprietary, accessible only via the Alibaba Cloud API. SCMP noted:

Tech companies often release preview versions of their next-generation models on Arena... in order to collect data to optimise for the final iteration.

SCMP· tech deskSCMP, May 20, 2026

A precedent. BuildFastWithAI reads the gesture as a stylistic break for Alibaba:

Alibaba didn't announce Qwen3.7. They just deployed it.

BuildFastWithAI· analysisBuildFastWithAI, May 19, 2026

On pricing, we currently only have figures from the previous Qwen3.6-Max-Preview generation: 1.30/1Mtokensforinput,1.30 / 1M tokens for input**, **7.80 / 1M tokens for output. That is well below US frontier prices, which remain above 5forinputand5 for input and 15-20 for output on flagship models. APIDog warns, however, about the real-world bill:

Reasoning models are verbose by design; they think out loud, and every thinking token is a token you pay for.

APIDog· technical explanationAPIDog, May 21, 2026

In extended thinking mode, the bill can climb significantly. The definitive pricing for Qwen3.7-Max had not been published as of May 21, 2026.

  1. Dec. 2024
    DeepSeek-V3 published

    First shock, signal of China's offensive return.

  2. Jan. 2025
    DeepSeek-R1 in open-weights

    Rivals US frontiers at a fraction of the cost.

  3. Apr. 2025
    Qwen3 MoE family

    Alibaba lines up Qwen3-72B and lighter variants.

  4. Dec. 2025
    30% of global usage

    Chinese open-source models multiply their share by 25 in one year.

  5. May 14, 2026
    Qwen3.7-Max on Arena (anonymous)

    Global top 13 on text before any official announcement.

  6. May 20, 2026
    Hangzhou announcement

    Qwen3.7-Max, Zhenwu M890 chip, 50+ products.

Frequently asked questions

  • Is Qwen3.7-Max open source?

    No, as of May 21, 2026. The model is in preview accessible via Alibaba Cloud API only. Alibaba has opened its intermediate models (Qwen3.6-27B under Apache 2.0), but there is no confirmation that an open-weights variant of Qwen3.7-Max will be published.

  • How much does Qwen3.7-Max cost compared to GPT or Claude?

    The definitive pricing has not been published. The previous generation Qwen3.6-Max-Preview is priced at 1.30/1Minputtokensand1.30 / 1M input tokens and 7.80 / 1M output tokens, significantly below comparable US frontier model rates. Watch out for thinking mode, which multiplies the number of billed tokens.

  • Can it be used from Claude Code or an OpenAI client?

    Yes for the Qwen3.6-Plus generation, which offers OpenAI and Anthropic API compatibility. For Qwen3.7-Max, compatibility remains to be confirmed, but Alibaba historically maintains these interfaces.

  • Do the pro-China biases apply to all queries?

    No. The directives documented by the China Media Project concern questions related to China itself (reputation, domestic policy, geopolitics). On technical, coding, or reasoning topics, the model behaves without observable bias. The risk is circumscribed but worth knowing.

  • What distinguishes Qwen3.7-Max from DeepSeek-V4?

    No published head-to-head comparison as of this date. DeepSeek-V4 is in a separate preview. Qwen bets on long-horizon agentic work (35 continuous hours claimed) and Alibaba vertical integration (cloud + chips + model). DeepSeek maintains a historical advantage on pure reasoning.

Going further

The most complete hands-on test of the model is available on video, published 48 hours after the official announcement. Fifteen minutes of direct interaction, agentic loops included.

Hands-on test published 2 days after Alibaba's official announcement.

The sources behind this breakdown:

Alibaba unveils new Qwen model, custom chips in bid to become China's 'AI factory'
Account of the Hangzhou Cloud Summit with Liu Weiguang's 'AI factory' quote and Alibaba's five-layer stack.
scmp.com
Alibaba introduces Qwen3.7-Max as next-gen AI agent model
Official announcement relayed by TechNode with the agentic claims (1,000+ tool calls, 35h autonomy, 10x inference speed).
technode.com
Beyond DeepSeek: China's Diverse Open-Weight AI Ecosystem
Stanford HAI study on the Chinese open-weight ecosystem and the 25-40x cost gap vs US frontier models.
hai.stanford.edu
Tokens of AI Bias
China Media Project investigation revealing pro-China hidden directives in Qwen3 via 'thought token forcing'.
chinamediaproject.org
What people get wrong about the leading Chinese open models
Nathan Lambert's analysis of the technical vs. Western adoption paradox for Chinese LLMs.
interconnects.ai

What to do with this information

Qwen3.7-Max reshapes the map without flipping it. For a Western product team, three practical readings. One: watch for the release of a potential open-weights variant - that is the one that will genuinely change the self-hosting calculus. Two: test the model via API for well-defined tasks (automated code, agentic workflows), where it excels at the best price-to-quality ratio on the market. Three: keep editorial and consumer-facing use cases on Western frontiers (Claude, GPT, Gemini) until the output bias question is settled.

China is no longer the outsider to watch. It is the second option nobody is using yet, but that is already factored into every benchmark. That gap will not last indefinitely.

Talk about integrating Qwen or Claude LLMs into your stack with Blokby