What is algorithmic “pink slime” journalism and why is it growing ahead of the US election?

Editor’s note: Deepnews.ai is a technology company, though we also like to look at conversations happening around the use of algorithms to highlight information online. This is one of the occasional posts we do speaking to someone with something to say on topics that we find interesting.

By Christopher Brennan

With so much out there online, how do you know what’s good journalism, how do you know who is backing it and how do you know if a human even wrote it?

Following the previous blog posts’ exploration of algorithmic answering of questions on Reddit and the decline of local news, this week combines those two topics together by looking at a corner of the internet you may not be aware of: “pink slime” journalism.

The term broadly refers to journalism, or maybe a more appropriate word is content, that is automatically generated or aggregated and passed off as real work. Discussion of it has started again in the last couple weeks after the New York Times ran a piece on a group of interrelated networks of more than 1,000 sites in the US that are largely designed to look like local news and are filled with algorithmically generated content based on available data.

However, the Times piece explores the human created pieces on these sites as well, and demonstrates how some of them are paid for by political operatives to support conservative viewpoints or candidates. There are also sites that remain largely dormant except for algorithmic content, such as the Kenosha Reporter in Wisconsin, that spring into action with original and sometimes political pieces when there is an event that attracts attention to the area, such as the police shooting, protests and shooting of demonstrators that took place in the swing state this summer. 

While the Times piece was widely shared, it actually builds on the earlier work of other reporters and researchers such as Priyanjana Bengani, a fellow at Columbia’s Tow Center for Digital Journalism. Starting last year, she has documented how the network of networks began as a group of sites in Illinois with shared ties to one businessman, and how it has expanded to include a huge amount of similar local sites across the U.S ahead of this year’s elections.

Below is a transcript of the discussion I had with her this week, edited for length and clarity.

Brennan: So your work on pink slime, looking at these networks that are all largely related to each other, started with a big investigation last year. I was wondering how you got interested in that or how you started following it?

Bengani: So I think what happened was the Lansing State Journal broke the story that there were 34 or so sites in Michigan that had dubious ownership and dubious content. And I remember reading that story and saying “this is interesting” and then clicking through to the links of one of the web sites in this network and reading the About Us page which said they intended to have over 1,000 sites over the course of the next year. And that asked the question: are there more? The Lansing State Journal found 34 which is fantastic. But what about other states? What about other regions? What’s going on?

So there are now more than 1,000 sites currently in these different networks. If you could summarize briefly what you found out about them, first of all in terms of what they’re putting out there and then maybe we can talk about the ownership.

It’s a little bit all over the place because there are multiple different threads going on right now and it’s hard enough for me to be coherent when I’m writing. It’s much harder to be coherent when I’m speaking. But if we go back in time to 2016, you had the Illinois network come up then, which was done by Brian Timpone in partnership with Dan Proft, who at the time, ran a conservative super PAC called Liberty Principles.

And the super PAC was paying the news organization, Newsinator, to run ads for it. And eventually Newsinator rebranded to LGIS, or Local Government Information Services, and they were predominantly just Illinois. So that’s kind of the backdrop of what happened, and kind of goes to show that there was content that appears to look like news, but it was actually politically backed. The other thing that’s probably worth saying is that back in 2016, they were actually printing out physical newspapers and mailing it to people’s homes so you would be able to step on your doorstep one morning and see a copy.

Now, we fast forward four years, and I think what we’re seeing is that you have a similar set of networks for every single state. You also have some other parallel networks. One is around business and the other one that’s really interesting is the Catholic Tribune and it exists in I want to say in seven states right now …

A map showing where there are sites from Metric Media and LGIS. Courtesy of Tow Center for Digital Journalism / Columbia Journalism Review

And then what we’re seeing happening is that a lot of these sites are remaining dormant, the activity is predominantly algorithmically generated stories. They aren’t breaking news, they aren’t making a huge impact. But they are filling the [content] pipelines I think that’s kind of where the origin of “pink slime” comes from.

But what you have at the same time, is, if there’s a big local incident that happens in one area, then that site suddenly picks up a lot, and starts doing more human reported stories. And Kenosha is a great example of that. The other thing that’s happening right now, that was reported on by The Compass News [a website for news about the Catholic diocese of Green Bay, Wisconsin], is that the Wisconsin Catholic Tribune is now sending physical mailers, similar to what the Illinois network was doing back in 2016. And those stories, at least from what I’ve gleaned, in the physical version of this paper, are driven by polls. And these polls are effectively created as Facebook ads. And these ads then hit people who are interested in whatever the advertisers deem essential to their audience.

Have you been looking at the Facebook ads? I imagine there has probably a lot of activity with the election?

Yes, we’ve been collecting data. We’ll probably publish something on that in the next few weeks on Facebook ads, Facebook in general, and these networks.

Back to pink slime. We can talk a little bit about what it is. A lot of these stories are based on data, and I was wondering what the goal is? Other people have used algorithmic stories for local news, the Press Association in the UK being an example, So what is the point of putting out these data stories, or is it a point that is tied to something like Kenosha where they then change the narrative later?

I think it’s twofold. And obviously some of this is pure speculation. You know we don’t have confirmed answers from anyone involved in this at all. But I think there have been surveys done. I’m pretty sure there is a Pew survey where people say that they trust news with data a lot more. Now, if you can use structured data examples out there that you could pull in and just churn out stories, then the numbers  give you a shield of credibility. And then intermingled with the numbers-driven story you can have pretty much anything you want. You could have the pay for play stuff, you can push your own agenda, you can lobby for whatever you want, etc. So that’s one part of it.

The second part of it is gaining authority. And again, this is pure speculation. But if you have a site that exists for enough time and these networks link to each other. So, you have the Wisconsin networks linked to the Wisconsin Business Daily or the Wisconsin Catholic Tribune pulling stories from the Wisconsin local news network, etc. But they kind of give each other credibility by linking to each other a lot, even though they may not be part of the same network.

And finally, the thing is people trust local news a lot more. And especially people in smaller towns and villages. Again, there have been surveys done about this. So while local news is in demise, while news deserts are sadly a thing, it seems like an opportune moment to capitalize on that.

And that’s of course what these networks say that their aim is: revitalizing local news, something that a lot of people care about … We can talk a little bit about the ownership of it. I know it’s complicated, with all the different strands of ownership and the web of connections. But this is a network of networks that deal with each other editorially but also on the back end?

So what seems to be happening is you seem to end up with multiple small networks for various definitions of small, like I think the Record network has seven or eight sites or the Catholic Tribune network has seven or eight sites all in all. Or the business network has 51 sites in the US but then it even seems to have stuff in Manilla, Mexico, the Balkans, etc. And then you have Metric Media which seems to be the main local news effort.

So you seem to have these sites that are dedicated to certain topics or covering certain themes. But then the bylines span the sites so you have the same bylines at Metric Media pages on the Record network or the Catholic Tribune, etc . But then if you start looking at the actual technical backend you see that they share IP addresses. These sites are running on the same servers.  Analytics identifiers like Google Analytics, Facebook Pixel. New Relic, etc. So the technical underpinnings are uniform.

And then what unites them is also this is sort of political point of view?

It’s a political point of view but I think it’s also, for example, the hotel guys. [Here’s the link to the relevant section in NYT piece] They’ve been spending a lot of money with these guys to get positive coverage. I think this was the Ashford hotels. So it’s not necessarily only political. It can also be just various other efforts.

I think one of the more interesting questions about all of this is less on political bent than on what happens when there’s a lot of content. When there’s more content than you can really deal with. And if a lot of this content is algorithmic or PR at three bucks an article for someone writing it, then it’s going to be fairly low quality. What does that do to an information ecosystem?

This is a zero sum game, if you’re reading one of these articles you’re not reading an article somewhere else that may be slightly more legitimate.

But it’s also one of those things where with the Facebook algorithm changes, for a lot of publishers their destiny isn’t quite in their own hands. We saw the story on Mother Jones, I think the Wall Street Journal broke it last week, about how Facebook was de-prioritizing left-leaning websites to make sure the more conservative websites didn’t get dinged when they were trying to pivot away from making the newsfeed less news-oriented and more friends and family-oriented.

Is Facebook the worry for stuff like this though? Is there a a worry about something like this, whether it’s happening now or could happen in the future, getting some sort of algorithmic wave, or, or being prioritized by something?

I think the thing that does worry me is that with Facebook ads right you can pay less than $100 and get a reach of over 100,000 people, sometimes over a million people. And $100 isn’t that much money. So if you have folks paying these guys a lot of money, and $100 is not going to make any difference to their bottom line, the reach is enormous.

And to the point about what happened with the NYU Ad Observatory last week, Facebook discourages any kind of activity that attempts to hold them to account. You wouldn’t quite call them a transparent company.

So when the stakes are high, I really, I don’t know what Facebook would do. I know on the back of our research Facebook said that they’re going to not treat these news organizations as news organizations, but instead, take away the publisher-specific functions and features that Facebook provides. But we have yet to see how all of this is going to play out in the long run.

I was thinking what the issue is here. Is the issue the algorithms? But it’s more that this is an example of money with a lack of transparency and then when you couple that with aspects of the media world, or social media world, that touch on money then you could have problems. And then there’s too much content.

A zero-sum game.

Beyond just Metric Media and the other networks, I was wondering if this is going to have a larger impact as more people do it. I mean is this just the first of potentially a larger trend?

I’ve been thinking about this a lot because some of this is relatively easy to do. If you have structured data scraping, putting it into a template and creating 1,000 stories isn’t necessarily hard. I’m not sure it adds value but that isn’t really hard. I wonder what it really means in terms of the contacts you have, the social circle, the political circles you’re involved in and who’s willing to pay to have something like that exist. Now you already do have different manifestations of this right now in terms of the “Baby Breitbart” network or the Courier network which is more on the progressive side. These are either deeply, deeply political or funded by a political PAC. But they aren’t doing the same “pink slime” work of algorithmically generating stories and just flooding the pipeline with frankly, useless content.

I touched on this earlier, but this again brings up questions about what are the benefits are of pushing out a bunch of algorithmic news?

I think it depends on who you are. You have places like Reuters, Bloomberg, the AP, who have automated a ton of the quarterly results coverage and automated a ton of sports coverage. You’re taking these heavily complex SEC filings and you’re making it into something that a normal person, someone like me, can read and understand what’s going on. And there’s immense value in that. I don’t want to read an SEC filing because you know, who does?  But if I am interested in the company there is value to just generating it so you know as soon as quarterly results are announced a story gets created and pumped out. I see value in something like that.

I see less value in “John Smith donated $5 to a specific campaign.” Who cares if someone donated $5? If someone donated $50,000, $500,000, $5 million, sure, that is genuinely interesting I would like to know about it. But these stories, they’ll pull in any kind of table and create one story for a row. I genuinely struggle to see the value If you synthesize it, get some aggregate data to say on average, this happens, or, historically, this has happened. Yeah, I can see the value in that. But just taking one row of data and translating it into 100-word story. I struggle a lot more with that.

Another topic of interest for Deepnews is of course, GPT-3, which writes algorithmically but writes oftentimes in a more prosaic way. You can maybe think of this and GPT-3 as two sides of a coin of algorithmically generated content. Going forward do you think there’s going to be a divide on the internet, between algorithmically generated content and human content that hopefully has more value?

I think that’s inevitable. I think the other thing that does worry me greatly is that you seem to have an entire set of people out in Silicon Valley who think they’re significantly smarter than journalists and journalists aren’t doing a ton of field work and everything that journalists are doing can be automated and crowd-sourced. And I genuinely worry that you’re going to have people who think like that trying to make GPT-3 spout out news stories on the back of what’s happening on Twitter. And I think at that point we start getting into really interesting and perhaps not great territory.

I think there are good things to be derived from algorithmic news, algorithmic processes and automating things. I think the problem is: where’s the line between what should definitely always be done by humans? What should be done as a cyborg activity, human plus machine? And what should be purely automated and where do you put those lines?