“It’s like an information collage:” Orestis Papakyriakopoulos on conspiracy theories and content moderation

The COVID pandemic has highlighted the power of conspiracy theories

Editor’s note: Deepnews.ai is a technology company, though we also like to look at conversations happening around the use of algorithms to highlight information online. This is one of the occasional posts we do speaking to someone with something to say on topics that we find interesting.

By Christopher Brennan

Conspiracy theories have become an unavoidable part of the information landscape, and something that it is not just a worry for readers but for platforms that are in many ways playing host to fringe ideas.

There are some good, in-depth reads on conspiracy theories such as QAnon themselves in a recent Digest, though this past week I was interested in the tech angle. I had earlier stumbled across an article from Orestis Papakyriakopoulos in the Harvard Misinformation Review which looks in closer detail at the results of the wave of content moderation changes that we began seeing on tech platforms this spring.

The full paper goes into much more depth than is presented here, though below is an edited transcript of my recent chat with Papakyriakopoulos, a researcher who is now a fellow at Princeton after he was a visiting scholar at the MIT Media Lab.

Christopher Brennan: For the first question I was wondering if you could talk about where this idea came from, monitoring both the movement of conspiracy theories around different platforms but also content moderation at the same time?

OP: So, I started looking at conspiracy theories related to the origin of the virus. And usually, scientists study one platform and they say that this conspiracy theory is either popular or not popular on that platform. But social media is an ecosystem and what’s important to see is how the different platforms interact and how news diffuses. So I tried to analyze Reddit, Facebook, Twitter and of course 4chan, because generally it’s cited as a conspiracy theory source. So that was one part from the start, to understand how conspiracy theories related to the origin of the coronavirus are diffused in this ecosystem. And then what was happening was that at that time, at the end of March, all the big companies could not moderate all this content.

So they said “We are going to use more automated methods. There might be mistakes. It’s a challenge for us.” Everybody said it. Facebook, YouTube, etc. made announcements. So I wanted to see how content moderation works and to what extent they are doing it. To locate possibilities and issues because to me, removing content might have the counterintuitive effect that someone thinks because you can polarize the user, for example.

In terms of what you found, your paper in the Misinformation Review focuses a lot on the types of sources, as well as the movement between platforms. Could you just briefly summarize that?

So one of the things that we found out was very interesting and was happening a lot. We manually labeled thousands of URLS, and we started seeing a lot of the conspiracy theories were from credible news sources. Not only new sources but for example there was one from the New York Post. I think an article comes to my mind that was really pushing conspiracy theories that the Chinese made the virus. And it was also really interesting that in a lot of conspiracy theories, evidence was based on what Wikipedia says or patenting coronavirus types and so on. So we started seeing that actually the conspiracy theories are not always made by InfoWars, for example, this type of website or The Blaze, or even more fringe media. We started to see that mainstream sources played a role, so we tried to quantify and see the effect.

What we found was really interesting because although alternative sources like these websites I mentioned or similar ones were producing more conspiracy theory-related content, when the high credibility sources cited a conspiracy theory, or supported one, they created much more reaction, much more interaction. They’re becoming much more popular. And in our data set, more than 60% actually of the user interactions in the end that were made with conspiracy theories or evidence supporting conspiracy theories, this was coming from the high credibility sources. Usually it makes sense that people trust sources more so they will believe more easily what they say or they will post or repost these sources more easily. But it was interesting to see.

We talked before about credible journalism and junk news and fake news. But most of the algorithms that classify content take the type of source as a variable, if they are credible or not, to classify content. The same applies also for manual labor (content moderators). They also look at that. It’s interesting because actually that’s the wrong thing to do. You should really look at the story, and that when we looked at how content is filtered we also found a bias. If there’s an article containing a conspiracy theory it would be removed with lower probability if it was coming from a credible source.

That’s interesting to me because from a Deepnews.ai perspective it’s much more about the content. When you’re looking at content moderation, were you able to parse anything or find anything about whether the kind of the content itself mattered that much in terms of content moderation? Whether it be sentiment or whether it be other measures of the content around the conspiracy theories that played into whether it was moderated or not?

What we saw is that, in general, each platform used a different tactic. Even within the platform, there were differences. For example, Facebook when they moderate the content themselves they remove it. If they give it to a third party fact-checker the content is flagged. So we already have a difference within a platform. And also by country, third party fact-checkers have different criteria. It’s not so uniform, how it works. So that’s very interesting.

The other is that, because I’ve talked with people from Facebook, is how this usually works when filtering is that they have a big book with the rules, and they try to see which criteria of each submission fulfills it or not and then they decide whether it gets removed or not. But of course, what exactly is in this book and these rules it’s not public so I don’t know exactly. I don’t think there was a special criteria that more provocative content was removed more than others. So a lot of the posts removed on Facebook, for example, contain this book that was created 30 years ago which said that the virus in 2020 will spread to the world starting from Wuhan. And it was just a picture of the book …

We saw a lot of things and we saw a lot of articles for example were removed later from media agencies when they realized that it was supporting a conspiracy theory. So they also did self-moderation in a sense. In them you can still see that it’s impossible to filter everything out. No matter how rich someone is and how many resources they have, they will all always leave something unmoderated, so I don’t expect that any company, state or institution can filter everything. And I believe it’s wrong to expect that. I would say that they need to find a different mechanism where they can inform the users that they might encounter this stuff and prime with them with that so they are conscious of what they are doing and what they are reading and what they are sharing. Which they are not doing. For example, Google does it in Google search. It says “COVID-related information.” ArXiv, that preprint website, now does it: “This information hasn’t been peer reviewed.” But if you go to get to YouTube, Facebook and so on, it doesn’t happen.

You believe that priming is the main thing which networks can do? It’s part of your paper as well that articles debunking claims and neutral articles have a much lower virality than conspiracy theories themselves.

Priming and also transparency, They need to say, “Hey, these are the criteria of how we filter the article so that is why your article might get filtered.” You should also say, “Hey, this content might not be credible.” Make a discussion actually with a user, in some sense. Also in terms of banning users. When you ban the user or remove the content there are also usually backlash effects. So, this makes the user even more keen to either to post this content or try to promote this claim in other ways, or even leave the platform and go somewhere else to do the same thing. So that’s why I say there is a trade off between controls and personal space and personal freedom there. And this tradeoff can make the discourse on moderation healthy.

There has also been talk, maybe touching a little bit on Deepnews again, of the algorithms themselves: what they’re promoting and if they are pushing content towards extremes by virtue of the engagement that it generates. Is switching that model something you think could be useful?

Definitely. So one of the solutions is not having a recommendation algorithm that tries to optimize engagement in the sense that is tried today, which is more likes, more clicks. Because of course the more provocative something is, the more people will click it. Even if they have different views, they get mobilized to say, “hey, this is crap” and they write more. So the way that Facebook functions, or its algorithm does, they optimize active interactions like clicks, comments and so on. This automatically leads to the diffusion of lots more content like that. And they know it as well. But this is their business model. So definitely, if you asked me, it would be nice to say we are a news agency that optimizes news in order to optimize your deliberative way of thinking, or something else, a social value that matters.

That’s a great point. I was also wondering about a bit of the paper talking at the end about mainstream reporting being used for the furthering of conspiracy theories and I was wondering what you meant by that. Do you mean small bits of information within articles being used or do you mean selective angles on an article being used as potential support for conspiracies if the article is framed in a certain way?

Usually it’s not how the article is framed. And that’s the interesting thing about conspiracy theories. The reasoning in itself doesn’t follow a logical rule directly. It’s like an information collage. They take one piece from there, one piece from here, one piece from there and then the logic is external to the actual evidence. Just because a book wrote about a virus, there is no causality that it caused the virus, you know? So, for example, there were articles in mainstream sources about bioweapon programs. Three years ago. This article was resurfaced now and shared a lot.

It’s interesting because, from the Deepnews perspective as well, I like to think about the internal logic of a piece and if the piece is comprehensive and it looks at different viewpoints and uses a bit of a method, you’re going to get a fairly comprehensive picture of what’s going on. But if people aren’t paying attention to the totality of an article then … 

Exactly. I don’t know you know the whole story about this “Cuties” film from Netflix. There is this French-Senegalese film production that has won an award at the Sundance festival. And it is now part of the QAnon conspiracy theory, this film. And if you see the film it has nothing to do with the QAnon conspiracy, like nothing at all. But they’ll take it and put inside this universe of ideas. So, it’s really sometimes even unpredictable how information can be integrated. But still I believe journalists can be more conscious and pre-cautious. For example, in the US now because of political things there are frictions with China. So a lot of articles might be more open to positions about what is happening. These things can easily used for conspiracy theories.