Researchers at Tokyo Institute of Technology and Riken, Japan, have gained new insight into how we perceive and interact with the voice of various machines. The team performed a meta-synthesis, and their findings provide new information about human preferences, which engineers and designers can use to develop future voice technologies.
Humans primarily communicate vocally and aurally, conveying everything from linguistic information to emotional states and personalities. The perception of our voice greatly depends on tone, rhythm, and pitch.
Combining Various Fields to Create Framework
Our interactions are now expanding to computer agents, interfaces, and environments due to advances in AI technologies. Research often takes place in the fields of human-agent interaction (HAI), human robot interaction (HRI), human-computer interaction (HCI), and human-machine communication (HMC).
The team of researchers sought to compare the findings from several studies in the different fields, looking to create a framework to guide future design and research on computer voice.
Associate Professor Katie Seaborn from Tokyo Tech was lead researcher.
“Voice assistants, smart speakers, vehicles that can speak to us, and social robots are already here,” Prof. Seaborn says. ”We need to know how best to design these technologies to work with us, live with us, and match our needs and desires. We also need to know how they have influenced our attitudes and behaviors, especially in subtle and unseen ways.”
The team’s survey looked at peer-reviewed journal papers and proceedings-based conference papers with a focus on the user perception of agent voice. The source materials included a variety of agent, interface, and environment types and technologies, with most being “bodiless” computer voices, computer agents, and social robots.
Most of the users were university students and adults, and the researchers were able to observe and map patterns before drawing conclusions on the perceptions of agent voice in a variety of interaction contexts.
The Team’s Findings
The study’s results demonstrated that users anthropomorphized the agents they interacted with, and they preferred interactions with those that were similar to their personality and speaking style. The study also showed that human voices were preferred over synthetic ones, and the interaction was improved by adding vocal fillers.
According to the survey, individuals preferred human-like, happy, empathetic voices with higher pitches. However, some of the user preferences changed over time. For example, user preference for voice gender changed from masculine voices to femnine voices.
The researchers took these findings and created a high-level framework to classify different types of interactions across computer-based technologies.
Another finding was that users often perceived agents better when the agents were embodied, with the voice “matching” the body of the agency.
The team’s survey could be used for the creation of new and existing technologies in voice-based human-agent interaction (vHAI).
“The research agenda that emerged from this work is expected to guide how voice-based agents, interfaces, systems, spaces, and experiences are developed and studied in the years to come,” Prof. Seaborn says.