Multimodality for AI chatbots is definitely the new big thing, and we’ve already lost count of the number of such models that show up onGitHubevery now and then. Now, Meta AI, in line with its open-source approach, has launched the new Spirit LM model in an attempt to address some multimodal challenges. And, from the looks of it, it’s quite impressive.
Currently, you can go wild withChatGPT’s Advanced Voice Modeand get some pretty expressive human-like responses out of it. You have probably come across those viral videos of ChatGPT flirting with humans better than you ever could.
While it’s still not there where we expected it to be, it’s better than whatGemini Livecan do right now. Well, turns out, Meta has been silently making observations, and Spirit LM is meant to take things up a notch and offermore natural-sounding speech.
As per Meta, Spirit LM is based on a“7B pretrained text language model.”Meta also notes in its X post that most of the multimodal AI models that exist right now use ASR (Automatic Speech Recognition) to identify voice inputs and convert them to text. However, according to Meta, this results in the AI losing a whole lot of expression. So, Meta notes:
Using phonetic, pitch and tone tokens, Spirit LM models can overcome these limitations for both inputs and outputsto generate more natural sounding speech while also learning new tasks across ASR, TTS and speech classification.
The officialSpirit LMrelease page details theresearch(PDF warning) that went behind making Spirit LM see the light of day. At the bottom, there are some generation samples that give us an idea of what to expect.
From the sound of it, Spirit LM certainly does a good job of landing those vocal modulations byusing tone and pitchtokens well. However, it’s very similar to howGoogle’s Notebook LM’sAI hosts run the surprisingly impressive show.
Meta’s Spirit LM is out for developers and researchers to try out and build upon. However, we have dropped anaccess request, and hopefully, we’ll get to try out the tool soon enough. When we do, you know where to find us.
Meanwhile, there’s no denying that we’re looking at a future where AI models that are more expressive than Jarvis will be surrounding and helping us get through our daily chores. Scarily exciting, isn’t it?
What do you think about Meta’s new Spirit LM? Cry your heart out in the comments down below!
Sagnik is a tech aficionado who can never say “no” to dipping his toes into unknown waters of tech or reviewing the latest gadgets. He is also a hardcore gamer, having played everything from Snake Xenzia to Dead Space Remake.