The Dataset

The training set is an essential part of any machine learning system. It comprises the examples from which the algorithm will learn.

This is best understood through image classification. If you had 10,000 images of cats and 10,000 images of dogs, you’d use that 20,000-strong training set to teach an algorithm the characteristics that define those two animals in an image. It then uses features it finds—such as that  a cat’s ear always has the same shape at its tip, for example—to make its prediction on new images. 

Unfortunately, there was no such thing as a labeled training set of data for news articles, so we had to build our own. We now have more than 60,000 labelled articles in our training set. 

The dataset for V2 of the Deepnews Scoring Model was built with the idea of getting a diverse range of articles that span from the lowest quality, 0, to the highest quality, 5. Many of these scores came from journalism students, who read and scored thousands of news articles using a strict testing protocol while evaluating V1. Each story was evaluated by three people to get a reliable result. For areas where the human labelling did not provide enough data, Deepnews also used other sources, such as sets of low-quality “fake news” articles and high quality articles that had received recognition through awards.

While none of these methods are perfect, they produced a strong, broad training set that we have found works well at ascertaining quality in unseen articles.

The Model

Our V2 algorithm is a transformer-based neural network with some 150 million parameters. Those parameters are applied to nothing but the body text of a news article and output a score between 0 and 5. Compared to our V1 model, which employed a convolutional neural network, V2 is producing far more accurate results across a broader range of subjects.

Measuring the Accuracy

To assess the performance of the model, we compared the score returned by the machine to the score given by a human expert. The model was tested on unseen articles from our training set as well as unseen articles that were pulled from the web and scored by a human editor. It achieved between 80% and 90% accuracy in general news categories.

This is not bad for a deep learning model. A sophisticated analysis of news articles is a challenging task for artificial intelligence. While perfectly clean datasets of images will yield nearly 100 percent accuracy, deep learning has a more difficult time dealing with fuzzy and subjective material such as news.

In the Future

A deep learning model is not a static thing. It requires endless refinements in terms of raw scoring performance and its ability to generalize. This is in order to be able to return a consistent score across a wide variety of articles. In the future, we might take advantage of the ever-evolving field of deep learning with new approaches, new tools, and new algorithms as we expand V2 of our model to cover categories beyond general news, such as sports and entertainment.