ChatGPT 4o vs Gemini 1.5 Pro: It’s Not Even Close

May. 15, 2024

Note:To ensure consistency, we have performed all our tests on Google AI Studio and Gemini Advanced. Both host the latest Gemini 1.5 Pro model.

1. Calculate Drying Time

1. Calculate Drying Time

We ran the classic reasoning test on ChatGPT 4o and Gemini 1.5 Pro to test their intelligence. OpenAI’s ChatGPT 4o aced it while the improved Gemini 1.5 Pro model struggled to understand the trick question. It dabbled into mathematical calculations, coming to a wrong conclusion.

Winner: ChatGPT 4o

In the magic elevator test, the earlier ChatGPT 4 model had failed to correctly guess the answer. However, this time, the ChatGPT 4o model responded with the right answer. Gemini 1.5 Pro also generated the right answer.

Winner: ChatGPT 4o and Gemini 1.5 Pro

In this test, Gemini 1.5 Pro outrightly failed to understand the nuances of the question. It seems the Gemini model is not attentive and overlooks many key aspects of the question. On the other hand, ChatGPT 4o correctly says that the apples arein the box on the ground. Kudos OpenAI!

Winner: ChatGPT 4o

In this commonsense reasoning test, Gemini 1.5 Pro gets the answer wrong and says both weigh the same. But ChatGPT 4o rightly points out that the units are different, hence, a kg of any material will weigh more than a pound. It looks like the improved Gemini 1.5 Pro model has gotten dumber over time.

Winner: ChatGPT 4o

I asked ChatGPT 4o and Gemini 1.5 Pro to generate 10 sentences ending with the word “mango”. Guess what? ChatGPT 4o generated all 10 sentences correctly, but Gemini 1.5 Pro could only generate 6 such sentences.

Prior to GPT-4o, onlyLlama 3 70Bwas able to properly follow user instructions. The older GPT-4 model was also struggling earlier. It means OpenAI has indeed improved its model.

Winner: ChatGPT 4o

François Fleuret, author of The Little Book of Deep Learning, performed a simple image analysis test on ChatGPT 4o andsharedthe results on X (formerly Twitter). He has now deleted the tweet to avoid blowing the issue out of proportion since he says, it’s a general issue with vision models.

That said, I performed the same test on Gemini 1.5 Pro and ChatGPT 4o from my end to reproduce the results. Gemini 1.5 Pro performed much worse and gave wrong answers for all questions. ChatGPT 4o, on the other hand, gave one right answer but failed on other questions.

It goes on to show that there are many areas where multimodal models need improvements. I am particularly disappointed with Gemini’s multimodal capability because it seemed far off from the correct answers.

Winner: None

In another multimodal test, I uploaded the specifications of two phones (Pixel 8a and Pixel 8) in image format. I didn’t disclose the phone names, and neither the screenshots had phone names. Now, I asked ChatGPT 4o to tell me which phone should I buy.

It successfully extracted texts from the screenshots, compared the specifications, and correctly told me to get Phone 2, which was actually the Pixel 8. Further, I asked it to guess the phone, and again, ChatGPT 4o generated the right answer — Pixel 8.

I did the same test on Gemini 1.5 Pro via Google AI Studio. By the way, Gemini Advanced doesn’t support batch upload of images yet. Coming to results, well, it simply failed to extract texts from both screenshots and kept asking for more details. In tests like these, you find that Google is so far behind OpenAI when it comes to getting things done seamlessly.

Winner: ChatGPT 4o

Now to test the coding ability of ChatGPT 4o and Gemini 1.5 Pro, I asked both models to create a game. I uploaded a screenshot of the Atari Breakout game (of course, without divulging the name), and asked ChatGPT 4o to create this game in Python. In just a few seconds, it generated the entire code and asked me to install an additional “pygame” library.

I pip installed the library and ran the code with Python. The game launched successfully without any errors. Amazing! No back-and-forth debugging needed. In fact, I asked ChatGPT 4o to improve the experience by adding a Resume hotkey and it quickly added the functionality. That’s pretty cool.

Next, I uploaded the same image on Gemini 1.5 Pro and asked it to generate the code for this game. It generated the code, but upon running it, the window kept on closing. I couldn’t play the game at all. Simply put, for coding tasks, ChatGPT 4o is much more reliable than Gemini 1.5 Pro.

Winner: ChatGPT 4o

It’s evidently clear that Gemini 1.5 Pro is far behind ChatGPT 4o. Even after improving the 1.5 Pro model for months while in preview, it can’t compete with the latest GPT-4o model by OpenAI. From commonsense reasoning to multimodal and coding tests, ChatGPT 4o performs intelligently and follows instructions attentively. Not to miss, OpenAI has made ChatGPT 4o free for everyone.

The only thing going for Gemini 1.5 Pro is the massive context window with support for up to 1 million tokens. In addition, you can upload videos too which is an advantage. However, since the model is not very smart, I am not sure many would like to use it just for the larger context window.

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.