Humanizing robots with humor detection

Thanks to the work of three computer science students, Siri could one day sound less like a robot and more like a human. In an effort to make virtual assistants more relatable, seniors Kate Park, Annie Hu and Natalie Muenster developed technology that can detect humor in spoken language and respond with laughter. Their research garnered them an award at a recent conference in Singapore sponsored by the Institute of Electrical and Electronics Engineers, or IEEE.

Conversational agents like Siri and Alexa are growing in popularity thanks to their ability to carry out basic tasks like playing music and scheduling calendar invites. But the students say that the agents are not living up to their full potential. “We thought we could push them to the next level – humanize them and equip them with humor detection,” says Park. “Imagine if Siri laughed at your jokes!”

The program they built is called Laughbot. When launched, it prompts the user to speak into the computer’s microphone. The bot transcribes the audio files using Google’s speech application programming interface. It then runs the transcription and the original audio file through a model called a recurrent neural network. This step is what Muenster called “the meat of the project.”

Students Park, Hu and Muenster posing with their award for Best Student Paper

Seniors Kate Park, Annie Hu and Natalie Muenster won the award for Best Student Paper at the Future of Information and Communication Conference in Singapore in April. (Image credit: Courtesy of Natalie Muenster)

Neural networks are algorithms that are inspired by the way a brain functions and enable a computer to learn a task by analyzing training examples. The system does this by finding patterns in the data that consistently correlate with a label, in this case, funny or not funny. Typically, a neural network does not explicitly consider the ordering of input features. But when it comes to understanding speech, such as a funny statement, the model needs to understand words as a sequence, which is where recurrent neural networks come in.

“A neural network is powerful because it has this concept of weights that we multiply into each input to tell the network how important the input is,” says Hu. “Recurrent neural networks are special because they have ‘memory,’ which lets them take into account feedback loops.”

To train their model, Park, Hu and Muenster used a dataset called the Switchboard Corpus, a collection of 3,000 phone conversations. Their neural network model converted the transcript and the audio data into a long sequence of numbers. In converting the transcript, it extracted certain features such as sentiment, parts of speech and the length of sentences. “It’s really important to extract it the correct way so that you’re actually representing the original file well, which is what we spent a lot of time on,” says Hu.

Once those transcript features were identified, they were assigned a number, for example, the length of a sentence could be assigned the number eight. “Then that feature would be compared across all of the speech snippets, so the model could compare it to the relevance of the training set to distinguish if it was more similar to the non-laughing examples or the laughing examples,” says Muenster.

The final output results in the bot deciding if what the user said was funny and responding with a pre-recorded laugh – or not.

Although successful, there have been instances where Laughbot struggled to correctly predict a humorous statement, such as when features in the transcript conflicted with the audio data, which had been labeled for qualities like pitch and inflection. “If you say something that’s really sad but you say it in a happy way, then it can’t really distinguish what is more important,” says Muenster.

The students’ research began last spring as a project for CS224S Spoken Language Processing. After successfully testing Laughbot for their professors, Dan Jurafsky and Andrew Maas, the trio submitted their research paper to the Future of Information and Communication Conference in Singapore, where it was accepted. In April, they presented their paper at the conference and won the award for Best Student Paper.

The research is an innovative step in the evolution of artificial intelligence and natural language processing.

So, could we soon be practicing our stand-up routines for Siri and Alexa?

“Maybe with our technology!” says Park.

Go to the web site to view the video.

Along with Stanford news and stories, show me:

University News

Research & Scholarship

On Campus

Student Experience