Anthropic Claude 3.5 Sonnet Launched; Beats ChatGPT 4o

Jun. 21, 2024



Just after releasing theClaude 3 modelsthree months back, Anthropic has nowintroduceda much improved Claude 3.5 Sonnet model. It’s not the largest model from Anthropic’s lab, yet it beatsChatGPT 4o and Gemini 1.5 Pro, at least in several benchmarks. Claude 3.5 Sonnet is a mid-tier model and it brings 2x faster speed than the largest Claude 3 Opus model.

Anthropic has kept the API price the same for the Sonnet 3.5 model with a context window of 200K tokens. For general users, it’s available for free on claude.ai (visit) and supports both image and document uploads. Keep in mind that there is a rate limit for free users.

Coming to benchmarks, Claude 3.5 Sonnet beats GPT-4o in nearly all benchmarks except MMLU and MATH, but the difference is very marginal. In HumanEval that tests coding abilities, Claude 3.5 Sonnet scores 92% whereas GPT-4o scores 90.2%. In GPQA Diamond which evaluates graduate-level reasoning, the new Sonnet model achieves a score of 59.4% whereas GPT-4o stands at 53.6%.

With 0-shot prompting in the MMLU test, Claude 3.5 Sonnet gets 88.3% and OpenAI’s GPT-4o model gets 88.7%. From the table, you can infer that Anthropic has developed a highly capable model that outranks both GPT-4o and Gemini 1.5 Pro models.

Next, Claude 3.5 Sonnet is also a powerful vision model and again does better than GPT-4o in various visual reasoning tests. It’s very good at understanding and transcribing texts from illegible images. It’s also excellent at interpreting charts, graphs, and illustrations.

Moreover, Anthropic has announced a new Artifacts tool for Claude which works like OpenAI’sCode Interpreter tool. The Artifacts tool generates the code and creates AI-generated content in a separate interface. It’s not just limited to Python as it can work with other programming languages as well. For example, I created an SVG image of the Taj Mahal with the Artifacts tool on Claude Chat.

Anthropic says Claude 3.5 Haiku and Claude 3.5 Opus are coming later this year. Overall, I am very impressed with Claude 3.5 Sonnet’s speed and intelligence. It seems I can finally replace ChatGPT 4o with Anthropic’s new model for my everyday tasks.

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.