OpenAI recently released its next flagship modelGPT-4oand demonstrated some cool demos. The human-like voice chat has become the headline feature, but there is more to it. OpenAI didn’t highlight many cool things that ChatGPT 4o is capable of. These details are available on OpenAI’s page and I went through all of them. On that note, let’s find out the cool new capabilities of ChatGPT 4o.
1. Accurate Text Generation in Images
We know that Diffusion models struggle with generating texts on images. Dall -E 3 still fails to generate images with the given text. However, theChatGPT 4omodel which is an end-to-end multimodal model, can render texts accurately. OpenAI didn’t mention this in the presentation. However, you can find the example on OpenAI’spagewhere the company explores its capabilities.
It can generate and add text to images effortlessly. The consistency in many samples is remarkable. You can also attach images and ask it to generate images from different angles of the same character, and it maintains consistency across all scenarios. It can also generate a 3D view of objects which you can combine to create a 3D render. Not to mention, it can generate fonts too.
Keep in mind that these capabilities are not available on ChatGPT yet. It still uses Dall -E 3 to generate images. OpenAI may unlock these features in the near future.
In a presentation with Khan Academy’s Sal Khan, OpenAI showcased a fascinating demo using the GPT-4o model. Basically, on an iPad, you can share your screen with ChatGPT 4o, and it can see everything on your screen.
You can now ask it to explain and help you find solutions to a problem. Be it mathematics, sciences, charts, maps, or anything else, ChatGPT 4o will be your personal teacher guiding you throughout your study session. That’s such a great application of AI, powered by GPT-4o’s multimodal vision capability. By the way, it also works with the ChatGPT desktop app for macOS.
In one of the demos, OpenAI showcased that you can have ChatGPT 4o as your live companion during meetings. You can share the screen with ChatGPT 4o, and it can see and hear all the participants. It can also give inputs and participants can also ask questions to the GPT-4o model. It replies spontaneously and stays engaged in the conversation. At the end, you can ask it to summarize the meeting as well. How cool is that?
OpenAI has not just improved the performance of GPT-4o in the English language but also improved performance in regional languages. It has significantly improved the tokenizer that allows the model to compress non-English languages to fit more tokens.
To give some examples, Gujarati language takes up 4.4x fewer tokens, Hindi 2.9x fewer tokens, Telugu 3.5x fewer tokens, Urdu 2.5x fewer tokens, Russian 1.7x fewer tokens, and more. Basically, for regional languages, ChatGPT 4o has become even more powerful.
OpenAI didn’t discuss the benchmark numbers and focused on delivering new experiences. However, ChatGPT 4o’sbenchmark numbersovershadow all other AI models from Google, Anthropic, Meta, etc. In fact, it performs better than its own GPT-4 Turbo model which was released a few months back.
From MMLU to HumanEval, GPQA, and DROP, ChatGPT 4o outranks both proprietary and open-source models. In theLMSYS arenatoo, the mysterious im-also-a-good-gpt2-chatbot model (which is actually the ChatGPT 4o model) got an overall ELO score of 1310, much higher than other AI models.
Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.