Leaderboard
Launch app →
Best AI for Reasoning
Ranked by GPQA Diamond — graduate-level questions designed to be Google-proof.
1
Claude Opus achieves 88% on GPQA Diamond — strongest at grad-level scientific reasoning.
| # | Model | GPQA Diamond | Standard plan | Context | |
|---|---|---|---|---|---|
| 1 | Claude Opus Anthropic · Claude Opus 4.7 | 88% | $100/mo | 1M | Compare → |
| 2 | Claude Anthropic · Claude Sonnet 4.6 | 84% | $20/mo | 1M | Compare → |
| 3 | ChatGPT OpenAI · GPT-5.2 | 79% | $20/mo | 256K | Compare → |
| 4 | Gemini Google · Gemini 3 Pro | 78% | $20/mo | 2M | Compare → |
| 5 | Grok xAI · Grok 4 | 76% | $30/mo | 256K | Compare → |
| 6 | Microsoft Copilot Microsoft · GPT-5.2 (M365-tuned) | 76% | $20/mo | 128K | Compare → |
| 7 | Qwen Alibaba · Qwen 3 Max | 74% | Plus | 256K | Compare → |
| 8 | DeepSeek DeepSeek · DeepSeek V3 | 73% | API only | 128K | Compare → |
| 9 | Meta AI Meta · Llama 4 Maverick | 70% | Free | 1M | Compare → |
| 10 | Le Chat Mistral · Mistral Large 2 | 68% | $14.99/mo | 128K | Compare → |
| 11 | Perplexity Perplexity · Sonar Pro | 65% | $20/mo | 200K | Compare → |
| 12 | Pi Inflection AI · Pi 3 | 62% | Free | 32K | Compare → |
Not benchmarked on this metric
Methodology
GPQA Diamond — graduate-level questions in physics, chemistry, biology, designed to be Google-proof. Source: github.com/idavidrein/gpqa.
See other rankings
Open-Source AIFree AIWritingCodingResearchMathCheapest AI APIStudentsMarketing Composite Intelligence Index Compare side-by-side Or skip the choice
One subscription. Every frontier AI model. $14.99/month.
Perspective AI bundles ChatGPT-class, Claude, Gemini, Grok, Copilot and more in one app. Switch mid-conversation. No per-vendor logins, no separate bills.