Blog

Grok Build: 15x the price of Claude Code, 10 points behind

Grok

On May 14, xAI launched Grok Build. More expensive, weaker on benchmarks, and a Musk testimony that reframes the whole product. A breakdown.

April 30, 2026, federal courthouse in California. Under oath, Elon Musk ranks the world's AI leaders: Anthropic first, then OpenAI, Google, Chinese open-source models. His own xAI? Last. Fourteen days later, that same xAI launches Grok Build, a command-line coding agent positioned against Claude Code, at a price five to fifteen times higher than Anthropic's. For ten points less on the benchmark. That is the gap.

What xAI actually shipped on May 14

Grok Build is a command-line coding agent, following in the footsteps of Claude Code and Codex CLI. You install it with a single command, talk to it in your terminal, it edits your files, runs shell commands, and orchestrates subtasks. The interface uses ratatui, a Rust framework for TUIs, which gives it a polished visual rendering (an xAI engineer confirmed the choice in the Hacker News thread at launch).

May 14, 2026
official launch
$99 - $299/mo
SuperGrok Heavy entry price
2M tokens
Heavy context (vs 200K Claude)
8 agents
maximum in parallel

Under the hood, two models coexist. grok-code-fast-1 is the fast engine for the majority of calls, with a window of 256,000 tokens and an API price of 0.20/0.20 / 1.50 per million tokens for input/output. Grok 4.3 Heavy, accessible via SuperGrok Heavy, raises the context to 2 million tokens and orchestrates a multi-agent architecture (officially up to 8 in parallel, spread across two models).

Four conventions inherited from the Claude Code ecosystem are supported natively: AGENTS.md for project documentation, Skills, MCP servers, and hooks. Good practice, but also an admission: xAI is not reinventing the convention, it is adopting it.

The price wall: 5 to 15 times Claude Code

This is where the product loses most candidates. Grok Build is not sold individually - it is locked behind SuperGrok Heavy, xAI's highest tier. The 99/monthintroductorypriceforsixmonthsthenclimbsto99/month introductory price for six months then climbs to 299/month. No public partial refund mechanism or retention guarantee has been communicated.

ToolMonthly priceIncluded modelSWE-bench Verified
Claude Code Pro$20Sonnet 4.679.6%
Claude Code Max100100 - 200Sonnet + Opus80.9%
Codex CLI$20GPT-5.x77.3%
Grok Build (intro)$99Grok 4.3 Heavy70.8% (xAI internal)
Grok Build (standard)$299Grok 4.3 Heavy70.8% (xAI internal)

The verdict from the indie community was immediate. DEV Community summed it up:

For most indie hackers running small-to-medium SaaS products on Claude Code today, the honest answer is: stay where you are.

DevToolPicks· DEV Communitydev.to, May 15, 2026

And a developer on Hacker News, in one line:

More expensive, as good or worse at the job, and it runs in the terminal.

~Hacker News comment, May 14, 2026

For context, Claude Code alone surpasses $2.5 billion in annualized revenue as of April 2026 at Anthropic. It is the fastest-growing product in the company's entire history. Grok Build is entering an established market, at a price that excludes its most active segment: solos, open-sourcers, and builders running demos on YouTube.

The benchmark wall: ten points behind, maybe more

On SWE-bench Verified, the reference benchmark for coding agents, xAI claims 70.8% for grok-code-fast-1. This is the figure that appeared on the official announcement page and was picked up by the entire tech press.

Claude Code (Opus 4.5)
80.9 %
Claude Code (Sonnet 4.6)
79.6 %
Codex CLI (GPT-5.x)
77.3 %
Grok Build (xAI internal)
70.8 %
Grok Build (vals.ai third-party)
57.6 %
SWE-bench Verified. Higher is better.

The problem: this 70.8% is measured internally by xAI, on their own harness, with no published third-party replication at the time of launch. The platform vals.ai, which audits foundation model benchmarks, measured 57.6% on the same test - thirteen points below the official figure. xAI has not commented on the gap to date.

xAI itself anticipated this by posting a disclaimer on its blog: "SWE-bench benchmarks don't fully reflect the nuances of real-world software engineering." An honest phrase, but also a classic tactic: warn that the score is imperfect when you already know it is low.

What is genuinely new

It is not all about price and benchmarks. Grok Build brings three things that Claude Code and Codex CLI do not have, and those deserve recognition.

Plan Mode. Before any file modification, the agent produces a complete execution plan and presents it to you. You can comment on it, modify it, or reject it before a single line is touched. xAI describes the workflow as:

Present the full execution plan first before any code changes occur, allowing developers to review, comment, and modify steps upfront with clear diffs displayed afterward.

xAI, official docs· Grok Build Documentationtechloy.com, May 15, 2026

This is a real difference from Claude Code, which acts and then shows diffs. Plan Mode reverses the order. On sensitive tasks (migrations, cross-file refactors), that is a valuable safety net.

Screenshot of Grok Build CLI in Plan Mode
Source: kingy.ai. The Grok Build terminal presents the full plan before any edit.

Local-first and air-gap. The code stays on the machine - nothing in the codebase is transmitted to xAI servers during a session. Once installed, the tool works in an isolated environment. For companies with strict confidentiality requirements (banking, defense, sensitive R&D), this is a genuine argument.

2 million token context (Heavy). Claude Code caps at 200,000, Opus 4.7 goes up to 1 million. On a monorepo with several hundred thousand lines, the 10x factor makes a concrete difference: less chunking, less RAG, more direct context.

On top of that: 8 parallel agents (announced architecture), native support for MCP servers and the AGENTS.md convention, and an Arena Mode where several agents solve the same problem in competition and their outputs are scored before presentation. Arena Mode remains announced without a public delivery roadmap.

The admission that reframes the product

On April 30, 2026, in his own lawsuit against OpenAI (Musk v. Altman, federal court in California), Musk testified under oath. Direct question from a defense attorney: did xAI use distillation techniques on OpenAI models to train Grok?

Partly.

~Elon Musk, under oath, April 30, 2026

Distillation involves querying a competing model extensively (in this case GPT-4o, GPT-5, etc.) and using its responses to train your own model. It is technically legal in some frameworks, it is prohibited by OpenAI's ToS, and it is ethically questionable when you are simultaneously suing that same company for betraying its non-profit mission. Musk justified it:

It is standard practice to use other AIs to validate your AI.

"Validate" is a word choice. The question was about training, not validation. But the word is out there.

The irony of the ranking

That same week, still under oath, Musk delivered his own ranking of global AI leaders. No marketing hierarchy, no X punchline. His answer, plain: Anthropic first, then OpenAI, then Google, then Chinese open-source models, and xAI last. He noted that xAI was "much smaller" with "only a few hundred employees."

Two weeks later, that same xAI launches Grok Build, premium at $299/month, and positions it against Claude Code - which he himself said is the product of the world leader. Read that sentence twice.

And the context is even clearer. In January 2026, Anthropic blocked access to Claude for xAI engineers after discovering they were using Claude via Cursor to develop Grok Build. That is the event that partly pushed xAI to accelerate its in-house agent. Musk's position is therefore: Anthropic is the leader, I was using their tool to build mine, they cut my access, I ship a competitor at 15x the price for 10 points less.

  1. Jan 2026
    Anthropic blocks xAI engineers

    Anthropic discovers that xAI engineers are using Claude via Cursor to develop Grok Build. Access cut.

  2. Mar 2026
    Musk admits 'xAI was not built right'

    Musk recruits Andrew Milich and Jason Ginsberg (ex-Cursor, $2B ARR) to overhaul xAI.

  3. Apr 30, 2026
    Sworn testimony

    Musk ranks Anthropic first, xAI last. Admits "Partly" to OpenAI distillation.

  4. May 14, 2026
    Grok Build launches

    SuperGrok Heavy beta, $99-299/month. Announced as "Claude Code competitor."

The trial and the launch collide within fourteen days. It is hard to believe the timing is innocent.

Verdict: who this actually speaks to

Three personas, three honest recommendations.

The solo / indie hacker. You code alone or in a pair, your $20/month Claude Code Pro subscription has been doing the job for six months, you have no air-gap requirement. Stay on Claude Code. The added cost is indefensible and you lose performance on hard tasks, where Claude Sonnet 4.6 and Opus 4.7 still shine.

The SaaS team / scale-up (3-30 devs). You use Claude Code Max or Codex CLI, you have a stable workflow. Plan Mode is interesting, but not enough to switch the whole team to $299/month/seat. Worth testing on a single seat during the six-month intro period, especially if you have a monorepo that struggles with Claude's 200K context.

The company with a giant monorepo and air-gap constraints (banking, defense, classified R&D, heavy legacy). Here the pitch holds: 2M context, local-first, code that does not leave the machine. If you operate a repo with millions of lines and a SOC that bars external cloud, Grok Build deserves a three-month POC. Not a blind rollout.

Frequently asked questions

  • Is Grok Build truly open source or self-hostable?

    No. It is a local client that calls the hosted xAI API, accessible only to SuperGrok Heavy subscribers. "Local-first" means your code stays on your machine and is not sent to xAI servers - it does not mean the model runs locally. For genuinely self-hosted AI, look at open-weight models (Mistral, Qwen, Llama).

  • Can I try Grok Build without paying $99/month?

    Not officially. The CLI is locked behind SuperGrok Heavy. Some YouTube tutorials offer workarounds via free third-party APIs, but the official client requires Heavy auth. To evaluate the underlying model (grok-code-fast-1) for free, you can use the API at 0.20/0.20/1.50 per million tokens, without the agent wrapper.

  • Is the 70.8% SWE-bench score reliable?

    As long as no independent third party replicates it, no. The vals.ai measurement of 57.6% on the same benchmark is a 13-point warning signal. xAI has not commented on the gap to date. Treat it as a marketing figure until a public replication appears.

  • Does Plan Mode exist in Claude Code?

    Not in this exact form. Claude Code offers a planning mode via /plan and pre-execution hooks, but does not formalize a mandatory "plan -> review -> approve" workflow with explicit validation before each edit. That is the genuine functional difference Grok Build offers.

Going further

If you want to form your own opinion before judging, the full video test of Grok Build against Claude Code and Codex CLI is the best entry point. It shows the Plan Mode interface in action and compares outputs on identical tasks.

Full Grok Build vs Claude Code vs Codex CLI test on concrete tasks.

The sources behind this breakdown, by weight:

Introducing Grok Build
The official xAI announcement from May 14, 2026: product description, models, pricing, capabilities. The primary source against which every article should be checked.
x.ai
Elon Musk testifies that xAI trained Grok on OpenAI models
TechCrunch's account of Musk's April 30, 2026 testimony. Source of the 'Partly' quote and the sworn ranking that places Anthropic first.
techcrunch.com
Claude Benchmarks 2026 - Sonnet 4.6, Opus 4.6, Haiku
Up-to-date collection of SWE-bench Verified scores for the Claude lineup. Essential for comparing Grok Build in numbers.
morphllm.com
Should Indie Hackers Switch From Claude Code?
Analysis from the indie hacker perspective, honest and concrete. The 'stay where you are' verdict that captures the community consensus at launch.
dev.to
Grok Build - Hacker News thread
93 points, 34 comments. The dominant developer skepticism about the price-to-quality ratio and the product's accessibility in beta.
news.ycombinator.com

What this says about the market

The dev agent war is intensifying, and xAI is entering through the side door. When the leader you acknowledge under oath sells their product at 20/monthandyoushipyoursat20/month and you ship yours at 299/month, you are not playing on the same field. You are betting that companies needing 2M context and local-first are numerous enough to pay the margin.

The bet could work. Cursor showed that a well-positioned tool on a precise segment (IDE-first, pro teams) can reach $2 billion in ARR. Grok Build has a similar angle (local-first + very large context) that is not yet occupied. What remains: deliver a stable V1, get benchmarks replicated by third parties, and explain the training provenance. We will check back in six months.

Choose the right coding agent for your team with Blokby