Scoring stories to make better recommendation engines for news

Frederic Filloux
Monday Note
Published in
6 min readOct 23, 2017

--

by Frederic Filloux

An early version of Netflix recommendation engine. Photo: Reed Hastings (and Nasa)

News media badly need improved recommendation engines. Scoring the inventory of stories could help. This is one of the goals of the News Quality Scoring Project. (Part of a series.)

For news media, recommendation engines are a horror show. The NQS project I’m working on at Stanford forced me to look at the way publishers try to keep readers on their property — and how the vast majority conspire to actually lose them.

I will resist putting terrible screenshots I collected for my research… Instead, we'll look at practices that prevent a visitor from continuing to circulate inside a website (desktop or mobile):

— Most recommended stories are simply irrelevant. Automated, keyword-based recommendations yield poor results: merely mentioning a person's name, or various named entities (countries, cities, brands) too often digs up items that have nothing to do with the subject matter. In other words, without a relevancy weight attached to keywords in the context of a story, keyword-based recommendations are useless. Unfortunately, they're widespread.

Similarly, little or no effort is made to disambiguate possibly confusing words: in a major legacy media, I just saw an op-ed about sexual harassment that referred to Harvey Weinstein connected to… a piece on Donald Trump’s dealings with Hurricane Harvey; the article is also linked to Amazon's takeover of the retail industry… only because of a random coincidence: the articles happened to mention Facebook.

— Clutter. Readers always need a minimum of guidance. Finding the right way to recommended stories (or videos) can be tricky. Too many modules in a page, whatever those are, will make the smartest recommendation engine useless.

— Most recommendation systems don’t take into account basic elements such as the freshness or the length of a related piece. Repeatedly direct your reader toward a shallow three-year-old piece and it’s highly likely she might never again click on your suggestions.

— Reliance on Taboola or Outbrain. These two are the worst visual polluters of digital news. Some outlets use them to recommend their own production. But, in most cases, through “Elsewhere on the web” headers, they send the reader to myriads of clickbait sites. This comes with several side-effects: readers go away, so are their behavioral data, and it disfigures the best design. For the sake of a short-term gain (these two platforms pay a lot), publishers give up their ability to retain users, and leak tons of information in the process — that Taboola, Outbrain and their ill ilk resell to third parties. Smart move indeed.

I could mention dozens of large media brands afflicted with those ailments. For them, money is not the problem. Incompetence and carelessness are the main culprits. Managers choose not to invest in recommendation engines because they simply don’t understand their value.
. . . . .

Multibillion businesses are based on large investment in competent recommendation engines: Amazon (both for its retail and video businesses); YouTube and, of course, Netflix.

The latter is my favorite. Four years ago, I realized the size and scope of Netflix's secret weapon, its suggestion system, when reading this seminal Alex Madrigal piece in The Atlantic.

Madrigal was first in revealing the number of genres, sub-genres, micro-genres used by Netflix's descriptors for its film library: 76,897! This entails the incredible task of manually tagging every movie and generating a vast set of metadata ranging from “forbidden-love dramas” to heroes with a prominent mustache.

Today, after a global roll-out of its revamped recommendation engine (which handles cultural differences between countries), the Netflix algorithm is an invaluable asset, benefiting viewership and subscriber retention. In his technical paper “The Netflix Recommender System: Algorithms, Business Value, and Innovation” (pdf here), Carlos Gomez-Uribe, VP of product innovation at Netflix says (emphasis mine):

Our subscriber monthly churn is in the low single-digits, and much of that is due to payment failure, rather than an explicit subscriber choice to cancel service. Over years of development of personalization and recommendations, we have reduced churn by several percentage points. Reduction of monthly churn both increases the lifetime value of an existing subscriber and reduces the number of new subscribers we need to acquire to replace canceled members. We think the combined effect of personalization and recommendations save us more than $1B per year.

Granted, Netflix example is a bit extreme. No news media company is able to invest $15M or $20M in just one year and have 70 engineers working to redesign a recommendation engine.

For Netflix it was deemed as a strategic investment.

Media should consider that too, especially given the declining advertising performance, and the subsequent reliance on subscriptions. Making a user view 5 pages per session instead of 3 will make a big difference in terms of Average Revenue per User (ARPU). It will also increase loyalty and reduce churn in the paid-for model.

How can scoring stories change that game? Powered by data science, the News Quality Scoring Project is built on a journalistic approach to the quantitative attributes of great journalism. (This part is provided by a great team of French data scientist working for Kynapse, which deals with gigantic datasets of the energy or health sectors.)

Let’s consider the ideal attributes of good recommendation engines for news, and see how they can be quantified.

—Relevancy: meaning, how it relates to the essence of the referential article, as opposed to an incidental mention (which should rule out a basic keyword system that generates so many and embarrassing false positives).

—Freshness: The more recent, the better. Sending someone who just read a business story about the digital economy to an old piece make no sense as that environment changes fast. Practically, it means that an obsolescence weight should be applied to any news items. Except that we need to take into account the following attribute…

—…“Evergreenness”: The evergreen story is the classic piece that will last (nearly) forever. A good example is the Alex Madrigal piece mentioned above: its freshness index (it was published in January 2014), should exclude it from any automated recommendation, but its quality, the fact that very few journalistic research rivals the author’s work, also the resources deployed by the publisher (quantified by the time given by The Atlantic editors to Madrigal, the number of person-hours devoted to discuss, edit, verify the piece), all of it contribute to a usually great value for the piece.

—Uniqueness: It’s a factor that neighbors the “evergreeneess”, but with a greater sensitivity to the timeliness of the piece; the uniqueness must also be assessed in the context of competition. For example: 'We crushed other media with this great reportage about the fall of Raqqa; we did because we were the only one to have a writer and a videographer embedded with the Syrian Democratic Force'. Well… powerful and resource-intensive as this article was, its value will inexorably drop over time.

—Depth: a recommendation engine has no business digging up thin content. It should only lift from archives pieces that carry comprehensive research and reporting. Depth can be quantified by length, information density (there is a variety of sub-signals that measure just that) and, in some cases, the authorship features of a story, i.e. multiple bylines and mentions such as “Additional reporting by…” or “Researcher…” This tagging system is relatively easy to implement in the closed environment of a publication but, trust me, much harder to apply to the open web!

The News Quality Scoring platform I’m working on will vastly improve the performance of recommendation engines. By being able to come up with a score for each story (and eventually each video), I want to elevate the best editorial a publication has to offer.

=> Next week, we’ll look at the complex process of tagging large editorial datasets in a way that is comparable enough to what Netflix does. This will shed light on the inherent subjectivity of information and on the harsh reality of unstructured data (unlike cat images, news is a horribly messy dataset). We'll also examine how to pick the right type of recommendation engine.
Stay tuned.

frederic.filloux@mondaynote.com

To get regular updates about the News Quality Scoring Project and participate in various tests we are going to make, subscribe now:

--

--