Deepnews.ai, progress report #3

Frederic Filloux
Monday Note
Published in
6 min readFeb 25, 2019

--

by Frederic Filloux

We have built a series of models to understand the characteristics of in-depth, quality journalism as opposed to commodity news. Here is what we found so far.

Previously on Deepnews.ai…

The project started during the academic year 2016–2017 when I was a John S. Knight Journalism fellow at Stanford University. At the time, the “News Quality Scoring” project was based on a deterministic approach: what kind of “signals” could be deciphered and measured from a news story that would differentiate a value-added article from a commodity news piece? The following year, the JSK program awarded me a Research fellowship to pursue this journalistic exploration. By that point, the project had become 100 percent artificial-intelligence driven, thanks to the advice of my old friend Dennis Allison, a computer science professor at Stanford. Hence the name Deepnews.ai.

When my research fellowship ended last summer, I moved the project to Paris with the help of two Deepnews engineers, Mathieu Delcluze and Victor d’Herbemont. Developing a project in Silicon Valley is fantastic, but it is also horrendously expensive. In France, pursuing a tech endeavor costs a fraction of what it costs in the Bay Area. France is known for its remarkable mathematics programs in prestigious engineering schools.

Our vision remains unchanged

  • We believe that the news ecosystem needs a reliable gauge to measure the quality of its production.
  • This assessment needs to be done at scale, in real time and automatically.
  • We want to build the critical component of the gauge for the entire news ecosystem: publishers and the advertising industry (media buyers, adtech, marketers).
  • Applications will be numerous for advertising, recommendation and personalization engines, and subscriptions systems. Deepnews.ai could make a decisive contribution to the economics of the news ecosystem.
  • Deepnews.ai is part of a vast network of initiatives aimed at fighting misinformation. These initiatives must work collaboratively.

Where we are now

Our main model is the Deepnews Scoring Model (DSM). Its approach is quite straightforward: it evaluates a story on a 1 to 5 scale. As I explained in a previous progress report, the system works — for the most part. In about 80 percent of cases, the model is able to categorize a story correctly. We classify stories from basic news to sophisticated, value-added pieces that required a great deal of journalistic legwork.

We are currently testing the model against human testers by submitting different batches of articles to English-speaking journalism students. To guarantee some statistical reliability, three of them score one article; simultaneously, the model does the same. So far, the deviation between humans and machine ranges from 0.5 to 0.8 points, depending on the type of article.

Under the hood of the Deepnews system is a convolutional neural network. This type of deep learning model is usually employed for image recognition. (Read this good explanation by a UCLA student, or, if you are math freak, watch this series of lectures from Stanford). The Deepnews.ai model is made with 360 filters and 22~25 million parameters (depending on the version). Roughly speaking, these weights form the grid created by the model to look at a story. As Mathieu, our lead engineers says: “The ConvNet detects and classifies the features, creates links between them, organizes a library of weights for each of them and infers probabilities on their interactions, mutual influence, and meanings.”

A weight (or parameter), looks like this:

To put these interactions in concrete terms, our model has the equivalent of about 70,000 pages of weights. We built and tested 55 versions of the model for more than 1300 hours on Google Cloud, during which hundreds of thousands of articles have been crunched.

So what does 80 percent accuracy mean?

To put things in perspective, the famous fake news generator developed by Open AI (the non-profit AI research outlet sponsored by Elon Musk and a group of Silicon Valley luminaries) is 60x larger than ours with 1.5 billion parameters (it is also based on different architecture). Despite its mammoth-size, GPT-2, as it is known, is said to work 50 percent of the time (but the “too-dangerous-to-release” tagline was a real marketing boost).

The truth is that sophisticated analysis of news articles is a challenging task for artificial intelligence. While perfectly clean datasets of images, for instance, will yield nearly 100 percent accuracy, deep-learning has a difficult time dealing with fuzzy and subjective material such as news.

Sometimes, our scoring model goes awry, granting a mundane story an outstanding score, or trashing a Pulitzer-worthy piece. Countering these aberrations is obviously difficult. A.I. is, in essence, a black box. No one understands exactly what is going on inside.

To illustrate the level of complexity, here are two anecdotes:

A few years ago at Stanford, a group of A.I. students was asked to work on a bicycle price predictor: a computer would be shown a picture, then the deep learning model was supposed to estimate the price of the bike, ranging from a $50 kid’s bike to a $5,000 racer. Students were flabbergasted to see that the first thing considered by the model was the rear wheel of the bike, more precisely the lower part of it. They ran test after test only to discover that the model was actually trying first to determine if the bike was equipped with training wheels, which would immediately classify it as a bike’s kid. The model had found a way to quickly narrow down its choice.

Similarly, a Stanford A.I. professor once told me that engineers working on an early version of Google’s autonomous car never quite understood why the deep learning model used to steer the Prius was always looking first at the data coming from the same right rear wheel of the car and not at other more compelling data coming from the LIDAR (the rooftop laser radar) or other proximity sensors of the car.

The mysterious and dangerous beauty, so to speak, of A.I. models is they are rarely fully understood by their creators.

The idea of a black box might seem intellectually appealing but it doesn’t help to sell a product. That’s why we want to find out what the Deepnews Scoring Model actually sees in a story when it decides to put a piece in either the low or high value-added class. To try and achieve this understanding, we’ve thrown multiple twists at the computer, such as removing all of the punctuation in the text to understand how sentence segmentation affects the scoring. We have also removed all of the named entities (people, places, company names, etc.) to come up with text that is as neutral as possible. We’ve tried to detect the degree to which a highly opinionated sentence will tilt the model, or how quotations affect the score (direct with “-” marks or undirect with trigger words). The goal is to understand the biases the model has created. So far we have identified about 30 unexpected “triggers” that make the model react in different ways.

These tests have another goal. An important question I’m often asked is whether the system is fool-proof. Basically, can the model be gamed or tricked by a publisher who is trying to systematically increase the ranking of a piece? We explained our approach to a Stanford A.I. instructor. According to him, gaming a deep learning model is a cost vs. benefit calculus and a model like Deepnews would require too many resources to be deceived.

In the next Monday Note, I will explain how we stacked different models to improve the performance of the whole system. We will also look at how the Deepnews system analyzes the sample of articles we are now scoring each week. This analysis allows us to measure publishers’ footprint in the news cycle, and rank it… Stay tuned.

frederic.filloux@mondaynote.com
with
mat@deepnews.ai and victor@deepnews.ai

--

--