Machines Arguing with Humans and IBM’s Project Debater

Project Debater and debate champion Harish Natarajan.

At Deepnews we love a good argument. Part of looking for quality writing and reporting is looking for quality arguments, implicit or explicit, made with solid supporting evidence. 

Our model of course uses machine learning to search through thousands of sources to find the articles that we highlight, though others are looking to use the advances in technology to create excellent debate. 

The team at IBM is doing exactly that, and a couple weeks ago published a paper in Nature that presents the results of their Project Debater, which created a robot that was capable of debating an expert human in a formal debate setting. You may have gotten a glimpse of the work led by Noam Slonim back in 2019 when it debuted, though the recent paper explains its system architecture and shows that the machine’s opening statements were more highly rated than those written by non-expert humans (but less highly rated than expert debaters). 

It is all very impressive, but what exactly is Project Debater doing that is new? I spoke to Prof. Chris Reed, the director of the Centre for Argument Technology at University of Dundee, who wrote a commentary piece in Nature alongside the IBM paper. 

“I think that the thing that’s actually rather underplayed by, by, even by them is, is the feat of engineering or getting all the pieces to work together,” Reed said. 

“Being able to go from the audio of listening to somebody give an opening speech. And then pulling through 400 million news articles to identify snippets of text that are appropriately relevant. And then editing and splicing effectively those bits of text together to render grammatically correct sentences and then organizing those grammatically correct sentences into something that bears some kind of resemblance to a coherent narrative flow. That is astonishingly difficult.”

Part of the core of this work is something called argument mining, where the machine finds relevant arguments and counter arguments from within its massive corpus of text. In a textbook this would be shown with a simple example like “Socrates is man. All men are mortal. Therefore Socrates is mortal.”

“Except nobody ever provides arguments like that Except that’s not what arguments look like. They look like: 

‘Let’s go and get a beer now,’ 

‘No, I’m a bit tired this evening.’ 

‘Can we do it tomorrow?’ 

‘Okay, let’s do it tomorrow because they’re gonna have a special.’” 

Reed says. 

So what does it mean that Project Debater can parse through text for arguments and then use them? What does being able to debate, a classic example of reasoning, show? We previously talked about the work of Dr. Henry Shevlin, who has worked on comparing machines to intelligences from animals to humans. 

Reed says that Project Debater is still nowhere even close to human cognition. Part of this is evidenced in Project Debater’s “rebuttal” to the arguments put forward by its debate opponent. It takes the text spoken by the opponent, translates it into text, and then compares that text to arguments it has pulled from its corpus, the knowledge base it has been fed, and a specific database full of debate topics.

“It’s absolutely not creating a structure of the opponent’s argument, and then looking at that structure and reasoning about where the weak points are in it to formulate an argument. There is none of that,” Reed says.

In the same way that we need to avoid thinking about GPT-3 as intelligent, it is easier to understand what machines like Project Debater are doing when we focus on the work that they are doing, thinking back to the origin of the word “robot” as coming from words for labor. GPT-3 is able to generate passable text and Project Debater is able to accomplish the impressive task of debating by bringing together different systems. 

At the same time Project Debater, because of its goal, is doing something different from its IBM ancestor Deep Blue playing Gary Kasparov in chess with the narrow goal of winning. Slonim’s team points out that debate is harder to judge, with more subjectivity, and that pushing a machine to use more advanced forms of human language pushes artificial intelligence outside of its “comfort zone.” 

The question then is not about intelligence but about how advances in being able to work with more advanced human language can be useful, particularly to humans. 

One of those ways may be in parsing through text to find out what’s in it. That’s something that Deepnews is working on in terms of journalistic quality for entire articles, but being able to engage with individual arguments within an article would be a step forward. Reed’s team a couple years ago teamed up with the BBC to look at the challenge of fake news and use argument mining to help aid humans’ reasoning. Ultimately, they can increase the quality of debate.

What we’re starting to think about is ways in which AI systems are able to understand the arguments of humans and contribute to the debates as a whole. These teams can be expanded to have some human members, and some AI members, and the team as a whole can then produce better quality decisions,” Reed said.