Can AI models trained on human speech help us understand dogs?

What’s that Lassie? Our nefarious AI overlords are about to take over the world? You must be barking

People who want to understand their dogs might be about to be given a helping paw by AI, according to the latest study.

Researchers from the University of Michigan are developing tools that can identify whether a dog’s bark conveys playfulness or aggression, which should already be apparent to many a Reg reader.

Sperm whales in a social gathering, Indian Ocean, Mauritius

ML suggests all that relaxing whale song might just be human-esque gossiping


At the same time they hope to understand other information from animal vocalizations, such as the animal’s age, breed and sex. Working with Mexico’s National Institute of Astrophysics, Optics and Electronics Institute, the Michigan team found AI models originally trained on human speech can be used as a starting point to train new systems that target animal communication.

The results were presented at the Joint International Conference on Computational Linguistics, Language Resources and Evaluation.

“By using speech processing models initially trained on human speech, our research opens a new window into how we can leverage what we built so far in speech processing to start understanding the nuances of dog barks,” said Rada Mihalcea, Collegiate Professor of Computer Science and Engineering, and Janice M. Jenkins, director of U-M’s AI Laboratory which carried out the work.

“There is so much we don’t yet know about the animals that share this world with us. Advances in AI can be used to revolutionize our understanding of animal communication, and our findings suggest that we may not have to start from scratch,” she said.

But while human language AI models are trained on a huge corpus of written text, dogs are less well known for typing, and their voices are recorded less often than humans.

To overcome the problem, the researchers are repurposing an existing model that was originally designed to analyze human speech. The foundation from various voice-enabled technologies has been trained to pick out important features of human speech, such as tone, pitch, and accent.

“These models are able to learn and encode the incredibly complex patterns of human language and speech,” said Artem Abzaliev, lead author and doctoral student.

With the human speech model Wav2Vec2, researchers built a dataset of dog vocalizations recorded from 74 dogs of varying breed, age and sex, in a variety of contexts.

They found that Wav2Vec2 improved on other models trained specifically on dog bark data — with accuracy figures up to 70 percent — and succeeded at four classification tasks.

How is dog bark data gathered, you ask? The researchers record dog barks from a number of situations: playing, aggression etc. Then they test the model's ability to identify bark examples: very aggressive barking at a stranger; normal barking at a stranger; negative squeal; negative grunt (in the presence of a stranger). So the researchers know which is which because they already know the context of the barks.

"This is the first time that techniques optimized for human speech have been built upon to help with the decoding of animal communication,” Mihalcea said. “Our results show that the sounds and patterns derived from human speech can serve as a foundation for analyzing and understanding the acoustic patterns of other sounds, such as animal vocalizations.” ®

More about


Send us news

Other stories you might like