Google Open-Sources SynthID to Watermark AI-Generated Text

Oct. 24, 2024



Last year, Google DeepMind announced SynthID, a watermarking technology that can watermark and detect AI-generated content. Now, Google has open-sourced SynthID Text via the Google Responsible Generative AI Toolkit for developers and businesses. Google’s open-sourced tool currently works for text only.

With the abundance of AI models at our disposal, it’s getting increasingly hard to tell what’s real and what’s not. As a result, it’s high time that advanced watermarking tools like Google’s SynthID are open-sourced and used by AI companies.

Earlier this month, we reported thatGoogle Photos may detect AI-generated photosusing a Credit tag attribute. SynthID being open-sourced could only suggest more such applications will gain the ability to detect AI-generated text and other content.

Pushmeet Kohli, the VP of Research at Google DeepMind, tellsMIT Technology Review,

Now, other [generative] AI developers will be able to use this technology to help them detect whether text outputs have come from their own [large language models], making it easier for more developers to build AI responsibly.”

For those unaware of how SynthID works, I’ll try to make it easier for you to understand. To give you an analogy, remember those magical invisible pens where the writing could only be seen under UV light? Well, consider SynthID to be this light source that can see those invisible marks or watermarks on AI-generated images, videos, and text.

LLMs while generating texts look at the possible tokens and give each one a score. The score represents the probability of the token being chosen. During this process, SynthID adds extra information by “modulating the likelihood of tokens being generated.”

However, the biggest limitation for SynthID right now is that itcan only detect content generated by Google’s proprietary AI models. Additionally, SynthID also starts to falter when someone heavily alters or rewrites AI-generated text like translating it from a different language altogether.

Soheil Feizi, an associate professor at the University of Maryland who hasresearchedon this topic, tells MIT,

“Achieving reliable and imperceptible watermarking of AI-generated text is fundamentally challenging, especially in scenarios where LLM outputs are near deterministic, such as factual questions or code generation tasks.”

From the sound of it, we still have a long way to go before reliably detecting AI-generated content. Still, Google open-sourcing SynthID is a great step towards AI transparency.

The tool can be an absolute game-changer in standardizing thedetection of AI-generated content. Most importantly, Google states that SynthID doesn’t interfere with the “quality, accuracy, creativity or speed of the text generation.”

Sagnik is a tech aficionado who can never say “no” to dipping his toes into unknown waters of tech or reviewing the latest gadgets. He is also a hardcore gamer, having played everything from Snake Xenzia to Dead Space Remake.