This article originally appeared on the Monday Note.
By Girish Gupta and Christopher Brennan
Deepnews has transformed.
We’re excited to announce an entirely new architecture — a new algorithm, API and online interface — that, for the first time, makes Deepnews accessible to anyone wanting to score a news article.
Over the last four months, we’ve overhauled our algorithm to better identify quality journalism. We’ve evolved from using a convolutional network to a transformer-based model, a more complex type of deep learning network.
You may have read about transformer models. GPT-3 captured the world’s attention with its ability to generate human-like text. A similar type of algorithm from OpenAI called DALL-E generates images based on text prompts.
Transformer-based language models are massive. GPT-3, which is trained on data scraped from all over the web, has 175 billion parameters. That huge volume of data is part of the other reason you might have heard about large language transformer models recently; they have come under increased scrutiny for being difficult to audit and for potentially bringing in all sorts of biases.
Input data comes from all over, or as James Vincent at The Verge put it, from “not only things like news articles, recipes, and poetry, but also coding manuals, fanfiction, religious prophecy, guides to the songbirds of Bolivia, and whatever else you can imagine.” The mechanics here of how language is understood by neural networks is fascinating. One beautifully-titled paper which explains transformers is, “Attention Is All You Need.” Essentially, the model first needs to understand language and realize that kings and queens are somehow related. Then it needs to grasp that they differ by a factor that represents gender. And then it needs to understand what differences to pay attention to so it can distinguish between things such as Deepnews’ categories.
What do we at Deepnews want to pay attention to? The factors that make up high and low quality journalism. Journalists could argue for hours about these but there are certain planks: original and on-the-ground reporting, investigations based on documents rather than anonymous sources, and smart analysis based on facts rather than pure opinion. So, once we have our language model (ours is based on RoBERTa), we must show it articles of varying degrees of quality in order to allow it to work out which factors are most relevant.
So we scored articles, based on nothing but quality, on a discrete scale between 1 and 5. Deep learning requires a lot of input data and, ultimately, we built a training set of some 50,000 news articles. Now, this wasn’t easy. Ideally, we’d sit down for days on end and score journalism ourselves using skills developed over our decades of combined experience — but that’s impossible at the sort of scale required for this task. So, in addition to the human-labeled data, we used imperfect proxies such as datasets containing misinformation or articles selected by Pulitzer judges. We also created a datasheet that discusses what is in our dataset.
Training a model follows a simple procedure. Split your initial data — in our case, some 50,000 rows — into data you’ll use for training and that you’ll use for testing. Train the model, using considerable compute power for many hours or even days, and the model will spit out hundreds of millions of parameters finely tuned to what it believes, in our case, correlates articles to their one-to-five score. We then test this on our testing data, which the model hasn’t yet seen, and see to what extent it got things right. And then repeat.
With our current V2, we are getting accuracy between 80% and 90% on the unseen test set. Now, what are the features our model picked up on? We don’t know and that’s a problem, one we want to get closer to solving as we iterate. But, for now, we have a model which works far better than we could have hoped. We are able to give a score to a news article that broadly correlates with what an educated human would give it. We’re good on investigations, business and politics, bad on sport and tabloid. But that’s just a reflection of the training data fed in — less sport and tabloid coverage.
In the coming months, we hope to train new, similar models on different types of articles so our final algorithm will be multi-layered. We’ll first identify the type of article and then push it to the model most suited to it — and then reveal a quality score and the category. Our model will also help us develop further models to analyze articles in other languages, with French as the first target.
A positive impact on subscriptions
With V2, we are also now ready to offer our score to the world. Not only can anyone now test the model, but we are developing pilot programs with publishers and platforms who have the problem of way too much content. Beyond just pulling quality articles from the ether like we do for the Digest, we think that our score can also be helpful because it:
- Saves money — Up until this point sorting articles for quality, similar to content moderation on social networks, has required teams of human editors reading articles one-by-one.
- Saves time — The process of discovery, going out and searching for articles that are interesting and well-reported, can now happen almost instantaneously.
- Makes money — People are more likely to engage and subscribe when shown articles that are quality rather than cheap fluff.
These discussions get at why journalistic quality is important beyond the abstract benefit of finding quality, which is out there waiting to be read but does not always translate it into cash. We looked at 17,000 articles from four local newspapers of various sizes around the U.S. We compared our score against their analytics on what is being read, who is reading it, and, perhaps most importantly, the proportion of people converting to paying customers.
A higher proportion of articles that scored a 5 led to conversions than articles that scored a 4. This is true all the way down to 1.
These are early findings and we look forward to digging deeper so we can share more insights later.
Subscriptions have become the go-to strategy for many publishers. Using something like Deepnews (rather than just focusing on clicks) will help them become sustainable and succeed. You can imagine an editor at a newspaper choosing between articles to highlight in a newsletter or in a smart paywall. If that editor is interested in driving subscriptions, selecting an article that has a 4 or a 5 from Deepnews seems like the way to go — for the paper’s bottom line and for society.
It is the latest evidence we have that focusing on quality can actually help in the real world. The case is relatively straightforward for readers (who get to read quality) publishers (who are incentivized to produce and highlight quality reporting) or advertisers (who get to put ads next to the good stuff). But we are also interested in other ways that people could use the Deepnews quality score that we haven’t even thought of yet, so are now letting you score a few articles yourself and offering free trials of our API up to 1000 articles scored.
So, give it a go!
*Girish Gupta is Deepnews’ CTO and Christopher Brennan is Editor-in-chief