Leaderboard

Best AI for Reasoning

Ranked by GPQA Diamond — graduate-level questions designed to be Google-proof.

12 models ranked · Updated April 2026

1

Claude Opus achieves 88% on GPQA Diamond — strongest at grad-level scientific reasoning.

# Model GPQA Diamond Standard plan Context
1 Claude Opus Anthropic · Claude Opus 4.7 88% $100/mo 1M Compare →
2 Claude Anthropic · Claude Sonnet 4.6 84% $20/mo 1M Compare →
3 ChatGPT OpenAI · GPT-5.2 79% $20/mo 256K Compare →
4 Gemini Google · Gemini 3 Pro 78% $20/mo 2M Compare →
5 Grok xAI · Grok 4 76% $30/mo 256K Compare →
6 Microsoft Copilot Microsoft · GPT-5.2 (M365-tuned) 76% $20/mo 128K Compare →
7 Qwen Alibaba · Qwen 3 Max 74% Plus 256K Compare →
8 DeepSeek DeepSeek · DeepSeek V3 73% API only 128K Compare →
9 Meta AI Meta · Llama 4 Maverick 70% Free 1M Compare →
10 Le Chat Mistral · Mistral Large 2 68% $14.99/mo 128K Compare →
11 Perplexity Perplexity · Sonar Pro 65% $20/mo 200K Compare →
12 Pi Inflection AI · Pi 3 62% Free 32K Compare →
Not benchmarked on this metric
Methodology

GPQA Diamond — graduate-level questions in physics, chemistry, biology, designed to be Google-proof. Source: github.com/idavidrein/gpqa.

Or skip the choice

One subscription. Every frontier AI model. $14.99/month.

Perspective AI bundles ChatGPT-class, Claude, Gemini, Grok, Copilot and more in one app. Switch mid-conversation. No per-vendor logins, no separate bills.

Launch app →