According to a blog post by Google, DolphinGemma utilises Google’s SoundStream tokeniser to encode dolphin whistles, clicks, and squawks into machine-readable units
In a groundbreaking advancement for interspecies communication, Google has unveiled a new artificial intelligence (AI) model, DolphinGemma, designed to help scientists understand and generate dolphin vocalisations. Developed in collaboration with Georgia Tech and the Wild Dolphin Project (WDP), the model represents a significant step forward in decoding the sophisticated communication systems of dolphins.
DolphinGemma has been trained extensively on the WDP’s vast acoustic database of wild Atlantic spotted dolphins. Functioning as an “audio-in, audio-out” model, it processes sequences of natural dolphin sounds to identify structural patterns and predict the likely next sounds in a sequence. This methodology mirrors how large language models predict the next word or token in human language processing.
According to a blog post by Google, DolphinGemma utilises Google’s SoundStream tokeniser to encode dolphin whistles, clicks, and squawks into machine-readable units. These are then processed using a model architecture tailored for complex audio sequences, enabling the model not only to analyse but also to generate novel dolphin-like sound sequences. This technology is rooted in the same framework as Google’s Gemini AI model and is engineered to run efficiently on mobile devices—including Google Pixel smartphones used in the field by the WDP.
“Our goal is to better understand the structure and potential meaning within these natural sound sequences,” Google stated, adding that DolphinGemma marks an important milestone in AI’s role in wildlife research and conservation.
In tandem with DolphinGemma, the Wild Dolphin Project and Georgia Tech have also been developing the Cetacean Hearing Augmentation Telemetry (CHAT) system. Initially piloted on the Pixel 6, CHAT aims to establish a shared vocabulary between humans and dolphins using synthetic sounds. The system links unique artificial whistles—distinct from natural dolphin sounds—to specific objects that dolphins find engaging, such as sargassum, seagrass, or colourful scarves used by researchers.
The strategy is to first demonstrate the object-whistle association through human interaction, sparking the dolphins’ natural curiosity. Researchers hope dolphins will then begin to imitate these synthetic whistles as a way to request the objects, forming a rudimentary two-way communication system.
Over time, the team plans to integrate the dolphins’ natural vocalisations into the system, potentially enabling more complex exchanges. “This approach could revolutionize how we interact with and understand one of the most intelligent marine species on the planet,” said a WDP spokesperson.
Google DeepMind has confirmed that the DolphinGemma model will be released as an open model this summer, inviting researchers worldwide to build upon the technology. The CHAT system, meanwhile, is being upgraded for use with the forthcoming Pixel 9 smartphones, promising better performance and expanded capabilities in the field.
The DolphinGemma project has been widely hailed as a major leap in AI-powered interspecies communication research. It not only represents a new frontier in the understanding of animal intelligence but also opens up future possibilities for conservation efforts, ethical wildlife interaction, and perhaps one day, meaningful cross-species dialogue.
As Nidhi Hebbar, a lead researcher at Google DeepMind, put it: “AI has transformed our ability to process and understand complex data. With DolphinGemma, we’re harnessing that power to connect more deeply with the natural world.”