1. Find the Bigger Number
On the other hand, Llama 3.1 got it wrong, surprisingly. I ran the prompt twice on HuggingChat, but it gave a wrong answer on both runs.
Winner: ChatGPT 4o
Winner: ChatGPT 4o
Next, I presented a complex puzzle and asked both AI models to find the apple. Well, ChatGPT 4o got it right and clearly said that “The apples would remainin the box on the ground“.
Winner: None
ChatGPT 4o also did a great job and took no time to find the needle. So for long context memory, both models are remarkable.
Winner: ChatGPT 4o
Lately, AI companies are chasing benchmark numbers and trying to outrank the competition based on the MMLU score. However, in practical tests, they rarely show some spark of intelligence.
Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.