ChatGPT Advanced Voice was demoed months ago during theGPT-4o launch, however, OpenAI kept delaying the release due to some safety issues. Then a controversy erupted over the ‘Sky’ voice which sounded strikingly similar toScarlett Johansson’s voice. Finally, five months later, OpenAI is now rolling out Advanced Voice to all ChatGPT Plus and Team users. OpenAI says the rollout will be completed this week.
In case you are unaware, Advanced Voice is a huge upgrade over the standard voice chat that is available to free ChatGPT users. Advanced Voice uses the multimodal capability of the GPT-4o model to deliver a natural free-flowing conversation with support for interruptions.
Advanced Voice in ChatGPT might sound similar toGoogle’s Gemini Live, but there is a key distinction. Gemini Live uses TTS/STT (text-to-speech) engines in between to extract responses from an LLM and reply back, but ChatGPT Advanced Voice supports audio input/output directly. Gemini Live also supports interruptions but doesn’t offer a truly multimodal experience.
Nevertheless, in my brief testing, ChatGPT Advanced Voice seems to have lost many multimodal features. During the demo, OpenAI showcased that it can sing for you, identify your mood/emotion by your speech, detect different sounds, do accents, and a lot more. However, currently, Advanced Voice says that it can’t identify the speech. Also, camera input is not supported yet.
It seems OpenAI has removed some of the features to avoid embarrassing conversations with ChatGPT. Nevertheless, are you excited to use ChatGPT Advanced Voice? Let us know in the comments below.
Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.